GEOMETRIC MECHANICS
RICHARD TALMAN
Wiley-VCH Verlag GmbH & Co. KGaA
This Page Intentionally Left Blank
GEOMETRIC ...
500 downloads
2882 Views
23MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
GEOMETRIC MECHANICS
RICHARD TALMAN
Wiley-VCH Verlag GmbH & Co. KGaA
This Page Intentionally Left Blank
GEOMETRIC MECHANICS
This Page Intentionally Left Blank
GEOMETRIC MECHANICS
RICHARD TALMAN
Wiley-VCH Verlag GmbH & Co. KGaA
All books published by Wiley-VCH are carefully produd. Nevertheless, authors, editors, and publishex do not WBITBnt the information contained in these books,including this book, to be fieof arors. Readers are advised tokeepinmind that statements,data, illustrations, procedural details or other items may inadvertently be inaccurate.
Library of Congress Card No.: Applied for British Ilbmry Cataloging-ib.PublictionData: A catalogue record for this book is available fkom the British Library
Bibliographic information pubhhed by Me Deutsche BibUothek Die Deursche Bibliothek lists this publication in the Deutsche Natidbibliogdie; detailed bibliographic data is available in the Internet at with(p1ots): with(lina1g): withbtudent):
norm trace
Warning: new definition for Warning: new definition for >
U := A*( exp(-l*alpha*x)
-
P*exp(-alpha*x)
u := A(ef-2’”x) >
>;
-2e(-@”))
F := -diff(U,x);
F := - A ( - ~ u ~ ( - ~ O ” )+ 2 , y e ( - u X ) 1
=.
x-0 := aolve(F,x) ; x-0 := 0
WORKED EXAMPLES
7
> # S u b s t i t u t e f o r x t o avoid s e t t i n g x-0 while obtaining k >
F 1;
F := subs( x'xx,
F := - ~ ( - 2 a e ( - 2 ' ~ *+~2)f f e ( - * ' ) ) > k := -diff(F,xx);
>
F := subs(
F 1:
XX=X,
> xx := x-0:
k := eval(k);
k := 2A a2 > small-amp-period
:= 2*Pi*sqrt (m/k) ;
smallampperiod := IT > #
Must not use "E" a s t o t a l energy; maple i n t e r p r e t s it
# a s 2.718 >
Aff
...
t p t s := solve( U-EE, x> ;
In
(:
)
2A +2A
tpts := -
- l n ( 1 2 A - 2 m A
a
ff
>
d e l t a := 0.0000001: # Fudge f a c t o r t o help i n t e g r a t i o n a t end of range
>
t p t l := t p t s [ l ] + d e l t a : t p t 2 := tpts[2]-delta:
> I n t ( l / s q r t (EE-U)
( 1- (
In 1/2 2A-2-)
,x
=
t p t l . .tpt2
;
A -.110-6
1
,/EE- A(e(--2@X)-2ef-wXf)
In 1 / 2 2A+c=E) A
-
dx
+.110-6
a
> period := s q r t (2*m) *changevar( exp(-alpha*x)=y
,
'I,
y) ;
[ [ "("'7 ,1 104))
Q
period := JZJG
le In(
2 A+2
> A := 1: alpha := 1: period;
4
~
)
1
- A (y2 - 2 y ) y
dY
8
REVIEW OF SOLVABLE SYSTEMS
(la(l-JiTE)+.110-6)
>
TO := evalf(small-amp-period) ;
TO := 4.442882938 f i >
EE := -0.9:period; .6837723024
1
-
- y 2 -+ 2 y y
J-.9
dY
TI := 4.680867535
p l :=
EE
:=
Jm ( -.9
-0.7:period: T2 := evalf(
- e( - 2 x
)
+ 2 e('-X ) )
'I
1: p2 := sqrt(2*m*(EE-U)):
>
EE := -0.5:period: T3
:= evalf (
'I
):
p3 := sqrt(2*m*(EE-U))
:
>
EE
:= -0.3: period:
T4
:= evalf (
'I
):
p4 := sqrt(2*m*(EE-U))
:
>
EE
:= -0.1:period: T5 := evalf(
I*
1: p5 := sqrt(2*m*(EE-U)):
>
plotsetup(ps,plotoptions='noborder');
plotsetup: warning plotoutput filename set to postscript.out interface(plotoutput='Shape.ps');
plot(U,x=-1. .5, title='Morse Potential, U(x),
1
A=alpha=l');
>
interface(p1otoutputt'Force .pa') ;
>
plot(F,x=-1. -5, title='Force, F(x),
>
m := 1:
>
interface(plotoutput= ' Period .ps ' ;
>
plotpoints := [ [-1,TO] ,I-. 9,T1], [- .7,T2] ,I-. 5,T3] , [- .3,T41,I-. 1,T511;
A=alpha=lO;
plotpoints := [[ -1,4.442882938], [ -.9,4.680867535]. [ -.7,5.308304409], [ -.5,6.281216813], [ -.3,8.109331364], [ -.1, 14.0462873211 >
plot( plotpoints, style=line, title='Period vs. Energy'); interface(plotoutput='PhaseSpace .ps') ;
>
plot (Ipl,p2,p3,p4,p5), x=- 1. .3,title- 'Momentum vs position' ;
WORKED EXAMPLES
;
>
interface(plotoutput='Flou.ps')
>
f ieldplot ( [p/m, F] ,x=-0.5..3,p=O. . 1.3
;
Maple Results 1.2.1
Morse Potential, U(x), A=alpha=l
1
5
FIGURE 1.2.1. Graph of the Morse potential.
Force, F (x), A=alpha=l
FIGURE 1.2.2. The force resultingfrom the Morse potential.
9
10
REVIEW OF SOLVABLE SYSTEMS
Period vs. Energy 14-
12.-
10-
8-
6-
FIGURE 1.2.3. The period of oscillation about the equilibrium point of the Morse potential, as a function of amplitude.
Momentum vs. position t
t
-1
I
0
I
1
x
2
3
FIGURE 1.2.4. Phase space plot, momentum p versus position x for the Morse potential, for E = -.9, -.7, -5,-.3, -.1. The smooth curves are contours of equal total energy. 'Turning points" are the (extrapolated) intersections of these curves with the x-axis (about which the plot is symmetric.)
WORKED EXAMPLES
11
FIGURE 1.2.5. Arrows represent the flow (plrn, F ) plotted with ( x , p) axes; the Morse potential.
Problem 1.2.2: Two potential energy functions expressible in terms of elementary functions and leading to bounded motion are as follows: (a) U ( x ) = -Uo/ cosh’(ax). (b) U ( x ) = Uo tan’(ax). Sketch these functions, find ranges of energy E for which bounded motion occurs, give a formula determining the turning points, and find the frequency of small oscillations.
Problem 1.2.3: Motion always “back and forth” between two limits, say a and b, in one dimension due to force derivable from a potential energy function V ( x ) is known as “libration.” Conservation of energy then implies
li’ = ( x
- a )(b - x ) @ ( x ) ,
or li = * J ( x
- a)(b - x ) @ ( x ) ,
where @ ( x ) =. 0 through the range a 5 x 5 b but is otherwise an arbitrary function of x . It is necessary to toggle between the two f choices depending on whether the particle is moving to the right or to the left. Consider the change of variable x -+8 defined by x = a - ,9 cos 8,
where a - ,9 = a ,
a
+ ,9 = b.
Show that ( x - a)(b - x ) = /?’sin’ 8 and that energy conservation is expressed by
e = J@(a- j3 cost?), where there is no longer a sign ambiguity because 8 is always positive. The variable 8 is known as an “angle variable.” One-dimensional libration motion can always be
12
REVIEW OF SOLVABLE SYSTEMS
expressed in terms of an angle variable in this way, and then can be “reduced to quadrature” as
re This type of motion is especially important in the conditionally periodic motion of multidimensional oscillatory systems. This topic is studied in Section 14.6.
1.2.2. Particle in Space Problem 1.2.4: The acceleration of a point particle with velocity v(t) is given by
Show that Ivl is constant. For the case that A is independent of time and position, give the motion of the particle in terms of its initial position ro and velocity vo. 1.2.3. Weakly Coupled Oscillators
Already in two dimensions there is a rich variety of possible system motions. No attempt will be made here to survey this variety, but some of it can be inferred by studying the pictures accompanying the following problem. In particular the final part exhibits the ubiquitous phenomenon of “avoided line-crossing” where “line” refers to the graph of a normal mode frequency plotted as a function of a parameter of the problem. A theory of nonlinear perturbations to systems like this is described in the final chapter.
Problem 1.2.5: The Lagrangian
with la I uith(p1ots):
with(lina1g): plotsetup(ps,plotoptions=‘noborder‘):
WORKED EXAMPLES
13
Warning: new definition for norm Warning: new definition for trace plotsetup: warning plotoutput filename set to postscript.out >
Digits := 5:
>
alias( x=x(t)
>
V
1: alias( y=y(t)
):
:= 1/2*(omega_1^2*~^2 + ornega_2^2*y^2)
-
alpha*x*y;
v := -1 omega-12 x 2 + -1 omega22 y 2 - a x y 2
2 >
T := m/2*(diff (x.t)-2 + diff (y,t)-2);
L := m 2 > # >
L
((i + (i x)’
y12) -
ornega-12x2 - -1 omega22 y2 2
+ax y
Substitute for x and y since they have been aliased := subs( { x=xx, y=yy, diff(x,t)=xp,
diff(y.t)=yp
1, L);
1 L := - rn (xp2 + y p 2 ) - - omega-I2x2 - - omegaJ2yy2 +axryy 2 2 2 1
>
1
dL-dxp := diff(L,xp); d L d r p := m xp
>
dL-dyp := diff (L,yp); d L d y p := m y p
>
dL-dx := diff (L,xx) ; d L d x := --ornega-I2 XT
+ a yy
d L d y := -omega2 2 y y
+a w
dL-dy := diff (L,yy);
>r
a-dxp := subs( Cxx=x, yy=y, xp=diff (x ,t) , yp’dif f(y ,t) 3 , dL-dxp) ; dLAxp := m
>
(i
x)
dl-dyp := subs( {xx=x, yy=y, xp=diff (x, t) , yp=diff (y ,t) 1, dL-dyp) ; d L d y p := m
(i
y)
14
REVIEW OF SOLVABLE SYSTEMS
d L d r := -omega-12 x
+
a! y
> dL-dy := subs( Cxx=x, yy=y, xp=diff (x,t), yp-diff (y,t)).
d L d y := -ornegaJ2 y > alias( x=x
: alias( y=y
dL-dy);
+ a! x
1; 1
z #
The only remaining alias is I = sqrt(-1)
> eqnx := diff(dL-dxp.t)
eqnx := m >
-
&-dx
($
eqny := diff(dL-dyp,t)
-
x( t ))
= 0 ;
+ omega-] 2 x( t
- a! y ( t ) = 0
dL-dy = 0 ;
> omega-1 := 1: omega-2 := 1: alpha := 0.1: > eqnxl := subs(Cx(t>=xt(t),
y(t)=yt(t)),
m := 1:
eqnx);
> ODE := eqnx1,eqnyl;
> initvals := xt(O)=l.
yt(O)=O,
D(xt) ( O ) = O , D ( y t ) (O)=O:
> funcs := Cxt(t),yt(t));
funcs := { xt( t ), yt( t ) } >
sols := dsolve(
C
ODE, initvals),
funcs, 'laplace');
sols := { x t ( t ) = .5()00coS(.94869t)
+ . ~ ~ O O O C O1.0488t), S(
Yt( t ) = SoOOO COS( .94869 t ) - So000COS( 1.0488 t )) >
assijp(so1s);
=.
interface(plotoutput='xyCpldOsc.ps'):
WORKED EXAMPLES
plot( xt(t>. t=O..100 1:
>
plot1
>
plot2 := plot( yt(t),
>
display({plotl,plot2},
>
omega-1 := 'omega';
:=
15
t=0..100, thickness=l 1: title='x(t)
and y(t) '1;
omega-1 := w >
ODE := eqnx,eqny; +y(t)-.lx(t)=O
X(t)--.ly(t)=O, >
initvals := x(O)=l, y(O)=O,
>
funcs
:=
D(x) ( O ) = O , D(y) (O)=O:
{x(t>,y(t>}; funcs := { x( t ). y (
>
11
sols := dsolve( { ODE, initvals}, funcs, 'laplace');
(
%1 := RootOf l002?
+ (100 + 100w2)
Z 2- 1
>
assign(so1s) ;
>
interface (plotoutput=' CpldOsc-xVsOmega.ps' ) ;
>
plotdd
>
interface(plotoutput='CpldOsc-yVs0mega.p~') ;
>
plot3d
> #
(
(
+ 100w2)
x(t), omega=0.8..1.2,t=O..40, numpoints = 4000, style=hidden, orientation=[-lO,45], axes=BOXED, title = 'x(t>' 1;
y(t), omega=0.8..1.2. t=O..40, numpoints = 4000, style=hidden, orientation=[-10,451, axes=BOXED, title = 'y(t)' ) ;
Unassign alpha and omega-2. i.e. Let them be variables again---they assigned numerical values above.
# were >
alpha
:=
'alpha'; omega-2 := 'omega-2'; (Y
:= (Y
omega2 := omega2 >
M
:= matrix(2,2, [omega-2, alpha, alpha, omega-2^2]
1;
16
REVIEW OF SOLVABLE SYSTEMS
M:=[ >
w,’
“ 1
omegaz’
eigs := eigenvals(M); eigs := -1 m2
2 1
1
2
2
+ -1 omega22 + -’ J o 4 2
- 2w2 omega22 + omega24 + 4 a 2 ,
2
- 0 2 + - omega22 - 1J 0 4 - 2 0 2 omega22 + omega24 + 4 a2 2
whattype(eig6);
exprseq > omplus := sqrt (eigs C11) ;
omplus := >
1 J2 0 2 + 2 omega22 + 2 J04 2
- 2 a 2 omega22 + omega24 + 4a2
omminus := sqrt (eigs C21) ;
> alpha := 0.1: omega-2 := 1: >
interface(plotoutput=‘ModeFreqs.ps‘);
>
p~ot(Complus,omminus3, omega=o..l.b, title=‘Nonnal Mode Frequencies‘, numpoints=400, axes=BOXED
;
Maple Results 1.2.5
x ( t ) and y ( t )
FIGURE 1.2.6. Sloshing of energy between two weakly coupled oscillators of equal natural frequency, w = y = 1, a = 0.1.
WORKED EXAMPLES
17
t
FIGURE 1.2.7. The displacement x of the originally displaced mass is shown; holding the coupling strength and the other natural frequency constant, (Y = 0.1, y = 1, its natural frequency w is allowed to vary over the range y - 0.2 < 0 c y 0.2.
+
1. 1.
0. 0. 0. 0.
omega FIGURE 1.2.8. Holding the coupling strength and one natural frequency constant at y = 1, a = 0.1, the other natural frequency is allowed to vary over the range 0 < w < 1.5. The eigenfrequencies 521 and Q2 lie on separate branches. For low o the pair (521, 522) x (y. w) but for high 61 (B,, Q2) (w. y). That is, the identificationreverses as one passes from o c w2 to y < w. As a result the two branches never intersect.
18
REVIEW OF SOLVABLE SYSTEMS
Problem 1.2.6: The approximate Lagrangian for an n-dimensional system with coordinates ( 4 1 ~42. . . . , 4”). valid in the vicinity of an equilibrium point (that can be taken to be (0, 0, . . . ,0)) has the form
(1.2.1) with T positive definite. It is common to use the summation convention for summations like this, but in this text the summation convention is reserved for tensor summations. When subscripts are placed in parentheses (as here) it indicates they refer to different variables or parameters (as here) rather than different components of the same vector or tensor. However, for the rest of the problem the parentheses will be left off, while the summation signs will be left explicit. It is shown in Section 2.5.3 that a linear transformation qi + y, can be found such that T takes the form 1 ” T =-Cmrj;, 2 r=l where, in this case, each “mass” m, is necessarily positive because T is positive definite. By judicious choice of the scale of the yr each “mass” can be adjusted to 1: 1 ” T =5
(1.2.2)
r=l
For these coordinates yr the equation (1.2.3) r=l
defines a surface (to be called a hypersphere). From now on we will consider only points y = (y1, . . . , y,,) lying on this sphere. Also two points u and v will be said to be “orthogonal” if the “quadratic form” Z(u, v) defined by
vanishes. Being linear in both arguments Z(u, v) is described as being “bilinear.” We also define a bilinear form V(u, v) by
WORKED EXAMPLES
19
where coefficients kr, have been redefined from the values given above to correspond to the new coordinates yr so that 1 V(Y) = 2 W Y , Y). The following series of problems (adapted from Courant and Hilbert [ 11) will lead to the conclusion that a further linear transformation yi --f z j can be found that, on t!e one hand, enables the equation for the sphere in Eq. (1.2.3) to retain the same form,
r=l
and, on the other, enables V to be expressible as a sum of squares with positive coefficients:
Pictorially the strategy is, having deformed the scales so that surfaces of constant T are spherical and surfaces of constant V ellipsoidal, to orient the axes to make these ellipsoids erect. In the jargon of mechanics this process is known as “normal mode” analysis. The “minimax” properties of the “eigenvalues” to be found have important physical implications, but we will not go into them here. (1) Argue, for small oscillations to be stable, that V must also be positive definite. (2) Let z1 be the point on “sphere” (1.2.3) for which V ( ‘g ~ K I is) maximum. (If there is more than one such point pick any one arbitrarily.)Then argue that
(3) Among all points that are both on sphere (1.2.3) and orthogonal to z1,let 22 def. 1 be the one for which V( = 9 ~ 2 is ) maximum. Continuing in this way show that a series of points 21, z2,. . . ,z,,,each maximizing V consistent with being orthogonal to its predecessors, is determined, and that the sequence of values, V(zr) = Z1 K ~ , r = 1,2, ... ,n,is monotonically nonincreasing. (4) Consider a point z1 €6 which is assumed to lie on surface (1.2.3) but with 6 otherwise arbitrary. Next assume this point is “close to” z1 in the sense that 6 is arbitrarily small (and not necessarily positive). Since z1 maximizes V it follows that
+
Show therefore that
20
REVIEW OF SOLVABLE SYSTEMS
This implies that U ( z l , z r )= O
for r > 1,
because, other than being almost orthogonal to 2 1 . 4 is arbitrary. Finally, extend the argument to show that
where the coefficients /cr have been shown to satisfy the monotonic conditions of Eq. (1.2.4) and is the usual Kronecker-6 symbol. Taking these Zr as basis vectors, an arbitrary vector z can be expressed as
In these new coordinates show that Eqs. (1.2.1) become
L ( z , ~=) T - V , T =
--xi:.v
1 " 2 =-CKrzr. 2 r=l
1 " 2 r=l
(1.2.5)
Write and solve the Lagrange equations for coordinates 2,.
Problem 1.2.7: Continuing with the previous formula, using a more formal approach, the Lagrange equations resulting from Q. (1.2.1) are ( 1.2.6)
These equations can be expressed compactly in matrix form as
Mq + Kq = 0,
(1.2.7)
or, assuming the existence of M-' , as
+ M-'Kq
= 0.
(1.2.8)
Seeking a solution of the form qr = ~
~
r= 2 I, 2 ~ , . . . ,n, '
the result of substitution into Eq. (1.2.6) is
( M - ~ K- 0 2 i )=~0.
(1.2.9)
WORKED EXAMPLES
21
These equations have nontrivial solutions for values of w that cause the determinant of the coefficients to vanish:
I M - ~ K- w2i1= 0.
( 1.2.10)
Correlate these “eigenvalues” with the constants K~ defined in the previous problem.
Problem 1.2.8: Particles of mass 3m, 4m, and 3m are spaced at uniform intervals h along a light string of total length 4k stretched with tension 7 and rigidly fixed at both ends. To legitimize ignoring gravity the system is assumed to lie on a smooth horizontal table so the masses can oscillate only horizontally. Let the horizontal displacements be X I , x2, and x3. Find the normal mode frequencies and the corresponding normal mode oscillation “shapes.” Discuss the “symmetry” of the shapes, their “wavelengths,” and the (monotonic)relation between frequency and number of nodes. See Fig. 1.2.9. Already with just 3 degrees of freedom the eigenmode calculations are sufficiently tedious to make some efforts at simplifying the work worthwhile. In this problem, with the system symmetric about its midpoint it is clear that the modes will be either symmetric or antisymmetric and, since the antisymmetric mode vanishes at the center point, it is characterized by a single amplitude, say y = X I = -x3. Introducing “effective mass” and “effective strength coefficient,” the kinetic energy of the mode, necessarily proportional to j , can be written as T2 = $meff)i2 and the potential energy can be written as V2 = $keffy2. The frequency of this mode is then given by w = d-, which, by dimensional analysis, has to be have been given proportional to Q = J?/(mh). (The quantities T2, V2, and subscript 2 because this mode has the second highest frequency.) Factoring this expression out of Q. (1.2.lo), the dimensionless eigenvalues are the eigenfrequencies in units of q . Complete the analysis to show that the normal mode frequencies are (wi ,w ,w3) = (1, and find the corresponding normal mode “shapes.”
m,m).
Problem 1.2.9: Though the eigenmode/eigenvalue solution method employed in solving the previous problem is the traditional method used in classical mechanics, equations of the same form, when they arise in circuit analysis and other engineering fields, are traditionally solved using Laplace transforms-a more robust method, it seems. Let us continue the solution of the previous problem using this method. Individuals already familiar with this method or not wishing to become so should 3m
3m
h
h
h
h
FIGURE 1.2.9. Three beads on a stretched string. Tha transverse displacements are much exaggerated. Gravity and string mass are negligible.
22
REVIEW OF SOLVABLE SYSTEMS
skip this section. Here we use the notation
(1.2.11) as the formula giving the Laplace transform X(s), of the function of time x ( t ) . T(s) is a function of the “transform variable” s (which is a complex number with positive real part). With this definition the Laplace transform satisfies many formulas, but for present purposes we use only dx
= sx - x(O), dt
(1.2.12)
which is easily demonstrated. Repeated application of this formula converts time derivatives into functions of s and therefore converts (linear) differential equations into (linear) algebraic equations. This will now be applied to the system described in the previous problem. The Lagrange equations for the beaded string shown in Fig. 1.2.9 are
(1.2.13)
Suppose the string is initially at rest but that a transverse impulse I is administered to the first mass at t = 0; as a consequence it acquires initial velocity u10 = f(0) = 1/(3m). Transforming all three equations and applying the initial conditions (the only nonvanishing initial quantity, u10, enters via Eq. (1.2.12)).
Solving these equations yields XI
=-
+-
1
(1.2.15)
-
x3 = -
+--s 2 +1 q 2
s2+2q2/3
WORKED EXAMPLES
23
It can be seen, except for factors hi, that the poles (as a function of s) of the transforms of the variables are the normal mode frequencies. This is not surprising as the determinant of the coefficients in Eq. (1.2.14) is the same as the determinant entering the normal mode solution, but with w2 replaced with -s2. Recall from Cramer’s rule for the solution of linear equations that this determinant appears in the denominators of the solutions. For “inverting” Eq. (1.2.15) it is sufficient to know just one inverse Laplace transformation, ( 1.2.16)
but it is easier to look in a table of inverse transforms to find that the terms in Eq. (1.2.15) yield sinusoids that oscillate with the normal mode frequencies. Furthermore, the “shapes” asked for in the previous problem can be read off directly from (1.2.15) to be (2 : 3 : 2), (1 : 0 : -l), and (1 : -1 : 1). When the first mass is struck at t = 0 all three modes are excited and they proceed to oscillate at their own natural frequencies, so the motion of each individual particle is a superposition of these frequencies. Since there is no damping, the system will continue to oscillate in this superficially complicated way forever. In practice there is always some damping and, in general, it is different for the different modes; commonly, damping increases with frequency. In this case, after a while, the motion will be primarily in the lowest frequency mode; if the vibrating string emits audible sound, an increasingly pure tone will be heard as time goes on.
Problem 1.2.10: Damped and Driven Simple Harmonic Motion. The equation of motion of mass m, subject to restoring force -wimx, damping force -2Amf, and external drive force f cos y r is 5
f cos y t . + 2 A i + w; = m
(1.2.17)
(a) Show that the general solution of this equation when f = 0 is x ( t ) = ae-At cos(wt
+ $),
(1.2.18)
where a and $ depend on initial conditions and w = d m . This “solution of the homogeneous equation” is also known as “transient” because when it is superimposed on the “driven” or “steady-state” motion caused by f it will eventually become negligible. (b) Correlate the stability or instability of the transient solution with the sign of A. Equivalently, after writing the solution (1.2.18) as the sum of two complex exponential terms, Laplace transform them, and correlate the stability or instability of the transient with the locations in the complex s-plane of the poles of the Laplace transform. (c) Assuming x(0) = f(0) = 0, show that 1,aplace transforming Eiq. (1.2.17) yields
24
REVIEW OF SOLVABLE SYSTEMS
X(s) =-f
1 2as
S
s2
+ y 2 ~2 + + w;‘
( 1.2.19)
This expression has four poles, each of which leads to a complex exponential term in the time response. To neglect transients we need only drop the terms for which the poles are off the imaginary axis. (By part (b) they must be in the left half-plane for stability.) To “drop” these terms it is necessary first to isolate them by partial fraction decomposition of Fq.(1.2.19). Performing these operations, show that the steady state solution of Eq. (1.2.17) is
where wo 2 - y2
- 2 ~ y?= i J (w2 - y2)2 + 4~ y eis.
(1.2.21)
(d) The response is large only for y close to OXJ. To exploit this, defining the “small” “frequency deviation from the natural frequency” E=Y-wg,
show that y 2 - o2x
~ E O and that
(1.2.22)
the approximate response is (1.2.23)
Find the value of E for which the amplitude of the response is reduced from its maximum value by the factor 1/a.
1.2.4. Conservationof Momentum and Energy It has been shown previously that the application of energy conservation in onedimensional problems permits the system evolution to be expressed in terms of a single integral-this is “reduction to quadrature.” The following problem exhibits the use of momentum conservation to reduce a two-dimensional problem to quadratures, or rather, because of the simplicity of the configuration in this case, to a closed-form solution.
Problem 1.2.11: A point mass m with total energy E, starting in the left half-plane, moves in the ( x , y) plane subject to potential energy function
The “angle of incidence” to the interface at x = 0 is 0, ,and the outgoing angle is 8. Specify the qualitatively different cases that are possible, depending on the relative
WORKED EXAMPLES
25
values of the energies, and in each case find 8 in terms of 8i . Show that all results can be cast in the form of “Snell’s Law” of geometric optics if one introduces a factor ,/-, analogous to index of refraction.
1.2.5. Effective Potential Since one-dimensional motion is subject to such simple and satisfactory analysis, anything that can reduce the dimensionality from two to one has great value. The “effective potential” is one such device.
Problem 1.2.12: The Kepler Problem. No physics problem has received more attention over the centuries than the problem of planetary orbits. In later chapters of this text the analytic solution of this so-called “Kepler problem” will be the foundation on which perturbative solution of more complicated problems will be based. Though this problem is now regarded as “elementary” one is well-advised to stick to traditional manipulations as the problem can otherwise get seriously messy. The problem of two masses m 1 and m2 moving in each other’s gravitational field is easily converted into the problem of a single particle of mass m moving in the gravitational field of a mass mo assumed very large compared to m,that is, F = - K f / r 2 , where K = Gmom and r is the distance to m from mo. Anticipating that the orbit lies in a plane (as it must), let x be the angle of the radius vector from a line through the center of mo; this line will later be taken as the major axis of the elliptical orbit. The potential energy function is given by
K U ( r ) = --,r
(1.2.24)
and the orbit geometry is illustrated in Fig. 1.2.10.
FIGURE 1.210. Geometric constructiondefining the ‘true anomaly“ x and “eccentricanomaly“ u in terms of other orbit parameters.
26
REVIEW OF SOLVABLE SYSTEMS
W o conserved quantities can be identified immediately: energy E and angular momentum M . Show that they are given by E = -m(r 1 - 2 + r 2 x. 2 ) - -, K 2 r M = m r 2i .
(1.2.25)
One can exploit the constancy of M by eliminating x from the expression for E, 1 E = -mf2 2
+ UeR(r),
M2 K where Ue&) = -- -. 2mr2 r
(1.2.26)
The function Uefi(r).known as the “effective potential,” is plotted in Fig. 1.2.11. Solving both the expression for E and the expression for M for the differential dt
mrL dt = - d x , M (1.2.27) and equating the two expressions yields a differential equation that can be solved by “separation of variables.” This has permitted the problem to be “reduced to quadratures,”
x(r)=
Ir
Mdr’lrf2 J2m ( E
+ $) - M 2 / r f 2 ’
(1.2.28)
Note that this procedure yields only an “orbit equation,” the dependence of x on r (which is equivalent to, if less convenient than, the dependence of r on x .) Though a priori one should have had the more ambitious goal of finding a solution in the
FIGURE 1.2.11. The effective potential U,ft for the Kepler problem.
WORKED EXAMPLES
27
form r ( r ) and x (t), no information whatsoever is given yet about time dependence by Eq.(1.2.28). (a) Show that all computations so far can be carried out for any central force that is radially directed with magnitude dependent only on r. At worst the integral analogous to (1.2.28) can be performed numerically. (b) Returning to the Kepler problem, perform the integral (1.2.28) and show that the orbit equation can be written ECOSX
where E
=
Jm..
M2 1 + 1= mK r’
(1.2.29)
(c) Show that (1.2.29) is the equation of an ellipse if 6 < 1 and that this condition is equivalent to E < 0. (d) It is traditional to write the orbit equation purely in terms of “orbit elements” E, which can be identified as the “eccentricity” and the “semimajor axis” a: (1.2.30) The reason a and E are special is that they are intrinsic properties of the orbit, unlike, for example, the orientations of the semimajor axis and the direction of the perpendicular to the orbit plane, both of which can be altered at will and still leave a “congruent” system. Derive the relations
E = --,K 2a
M 2 = (1 - E2)mKa,
(1.2.3 1)
so the orbit equation is a -= r
~+ECOSX
1-€2
(1.2.32) *
(e) Finally, derive the relation between r and t : (1.2.33) An “intermediate” variable u that leads to worthwhile simplication is defined by r=a(l-ccosu).
(1.2.34)
The geometric interpretation of u is indicated in Fig. 1.2.10. If (x, z) are Cartesian coordinates of the planet along the major and an axis parallel to the minor axis
28
REVIEW OF SOLVABLE SYSTEMS
through the central mass, they are given in terms of u by x =acosu -a€,
z = a d 1 - € 2 sinu,
(1.2.35)
since the semimajor axis is a,/and the circumscribed circle is related to the ellipse by a z-axis scale factor d n . The coordinate u, known as the “eccentric anomaly,” is a kind of distorted angular coordinate of the planet, and is related fairly simply to t: I
t = J gK( u - c sinu).
(1.2.36)
This is especially useful for nearly circular orbits, since then u is nearly proportional tot. Analysis of this Keplerian system is continued using Hamilton-Jacobi theory in Section 112 . 8 , and then again in Section 14.6.3 to illustrate actiodangle variables, and then again as a system subject to perturbation and analyzed by “variation of constants” in Section 16.1.1. Problem 1.2.13: The effective potential formalism has reduced the dimensionality of the Kepler problem from two to one. In one dimension, the linearization (to simple harmonic motion) procedure, illustrated above, for example in Problem 1.2.1. can then be used to describe motion that remains close to the minimum of the effective potential (see Fig. 1.2.11). The radius ro = M 2 / ( m K )is the radius of the circular orbit with angular momentum M.Consider an initial situation for which M has this same value and f ( 0 ) = 0, but r(0) # ro, though r(0) is in the rsgion of good parabolic fit to U e ~Find . the frequency of small oscillations and express r ( t ) by its appropriate simple harmonic motion. Then find the orbit elements a and E , as defined in Problem 1.2.12, that give the matching two-dimensional orbit.
1.2.6. Multiparticie Systems Solving multiparticle problems in mechanics is notoriously difficult; for more than two particles it is usually impossible to get solutions in closed form. But the equations of motion can be made simpler by the appropriate choice of coordinates, as the next problem illustrates. Such coordinate choices exploit exact relations such as momentum conservation and thereby simplify subsequent approximate solutions. For example, this is a good pre-quantum starting point for molecular spectroscopy.
Problem 1.2.14: The position vectors of three point masses, m l , m2, and m3, are rl ,r2, and r3. Express these vectors in terms of the alternative configuration vectors sc, s;, and 512 shown in the figure. Define “reduced masses” by mn=mi+rn2,
M=mi+rnz+m3,
cLl2
=
m1m2
m12
m3mn
P3 = M
.
Calculate the total kinetic energy in terms of s, sl,,and s12 and interpret the result.
WORKED EXAMPLES
29
FIGURE 1.2.12. Coordinates describing three particles. C is the center of mass and sc its position vector relative to origin 0. C12 is the center of mass of rnl and 4 and s; is the position of m3 relative to C12.
Defining corresponding partial angular momenta 1, I;, and 112, show that the total angular momentum of the system is the sum of three analogous terms. SOLUTION: In Fig. 1.2.12 the origin 0 is at an arbitrary location relative to which the center of mass C is located by radius vector sc. Relative to particle 1, particle 2 is located by vector s12. Relative to the center of mass at C12 mass 3 is located by vectors;. In terms of these quantities the position vectors of the three masses are
Substituting these into the kinetic energy of the system
T = -mIrf 1 2
+ -m& 1 + -m3r3, 1 -2 2
2
the “cross terms” proportional to sc . si, sc . s12, and s; the result
.s12
all cancel out, leaving
where uc = IscI, v; = IsiI, and u12 = JS12).The angular momentum (about 0)is given by
L = rl x
(mli.1)
+ r2 x (m2i.2) + 1-3 x (m3r3).
Upon expansion the same simplificationsoccur, yielding 1 L = - M rc x vc 2
1 1 + -p3 r3‘ x v3’ + -p12 r12 x v12. 2 2
30
REVIEW OF SOLVABLE SYSTEMS
Problem 1.2.15: Determine the moment of inertia tensor about center of mass C for the system described in the previous problem. Choose axes to simplify the problem initially and give a formula for transforming from these axes to arbitrary (orthonormal) axes. For the case m3 = 0, find the principal axes and the principal moments of inertia.
SOLUTION: Setting sc = 0, the particle positions are given by
Since the masses lie in a single plane it is convenient to take the z-axisnormal to that plane. Let us orient the axes such that the unit vectors satisfy
si = 9, s12 = a i + bf,
(and hence a = s;
fs12).
so the particle coordinates are
x3
m 12 = -, M
y3 = 0.
In terms of these, the moment of inertia tensor I is given by
For the special case m3 = 0 these formulas reduce to
The formulas derived in this solution will be used again in Section 9.3. Problem 1.2.16: A uniform solid cube can be supported by a thread from the center of a face, from the center of an edge, or from a comer, In each of the three cases the system acts as a torsional pendulum, with the thread providing all the restoring torque and the cube providing all the inertia. In which configuration is the oscillation period the longest? (If your answer involves complicated integrals you are not utilizing properties of the inertia tensor in the intended way.)
BIBLIOGRAPHY
31
1.2.7. DimensionaVScalingConsiderations Problem 1.2.17: Suppose the potential energy of a central field is a homogeneous function of degree v : V ( a r )= a ” U ( r ) , for any a > 0. (a) Starting with a valid solution of Newton’s law, making the replacements r 3 a r and t + pr, and choosing p = a’-”/2,show that the total energy is modified by factor a” and the equation of motion is still satisfied. (b) For the case u = 2, derive from this the isochronicity of simple harmonic motion. (c) For the case v = - 1, derive Kepler’s third law.
This sort of argument is introduced from a Lagrangian point of view in Landau and Lifshitz [2], where, among other applications, the viriul theorem is proved. Arnold introduces the argument from a Newtonian point of view and gives other interesting applications of similarity.
BIBLIOGRAPHY
References 1. R. Courant and D. Hilbert, Methods of Mathematical Physics, VoI. 1, Interscience, New
York, 1953,P. 37. 2. L. D. Landau and E. M. Lifshitz, Classical Mechanics, Pergamon, Oxford, 1976. p. 22.
This Page Intentionally Left Blank
THE GEOMETRY OF MECHANICS
Since the subject of mechanics deals with the motion of particles in space, the phrase “geometric mechanics” might seem redundant. In this text the phrase is intended mainly to imply that more consideration will be paid to geometric ideas than was once considered appropriate. Close to a century ago the subject of general relativity sprang almost entirely from a deep contemplation of geometry, and yet the appreciation of geometry as physics has accelerated only recently. Until recently an intuitive grasp of high school geometry has been considered adequate as a pedagogical basis for classical mechanics, with the result that high school geometry is pretty firmly fixed in the intuition of most physicists. Generally this is good, or at least satisfactory, but it can impede the learning of physics that employs more abstract geometric methods. As an example of the way restricted intuition can retard assimilation of new physics, one need only recall the mental cortortion that was required before the (counterintuitive)formula for the composition of velocities in special relativity could be accepted. The strategy of this part of the text is to contemplate geometry with mechanics kept in the background, to better understand geometric ideas on their own before they are folded back into the physics. Of course, the geometric ideas important in mechanics will be the ones to be emphasized.
This Page Intentionally Left Blank
GEOMETRY 0F MECHANICS I: LINEAR
2.1. INTRODUCTION Even before considering geometry as physics, one can try to distinguish between geometry and algebra, starting, for example, with the concept of “vector.” The question “What is a vector?’ does not receive a unique answer. Rather, two answers are perhaps equally likely: “an arrow” or “a triplet of three numbers ( x , y, z).” The former answer could legitimately be called geometric, the latter algebraic. Yet the distinction between algebra and geometry is rarely unambiguous. For example, experience with the triplet ( x , y , z) was probably gained in a course with a title such as “Analytic Geometry” or “Coordinate Geometry.” For our purposes it will not be necessary to have an ironclad postulational basis for the mathematics to be employed, but it is important to have a general appreciation of the ideas. Again, that is a purpose of this chapter. Since the immediate goal is unlearning almost as much as learning, the reader should not expect to find a completely self-contained, unambiguous development from first principles. To make progress in physics, it is usually sufficient to have only an intuitive grasp of mathematical foundations. For example, the Pythagorean property of right-angle triangles is remembered even if its derivation from Euclidean axioms is not. Still, some mulling over of “well-established” ideas is appropriate, as they usually contain implicit understandings and definitions, possibly different for different individuals. Some of the meanings have to be discarded or modified as an “elementary” treatment metamorphoses into a more “abstract” formulation. Faced with this problem, a mathematician might prefer to “start from scratch,” discard all preconceived notions, define everything unambiguously, and proceed on a 35
36
GEOMETRY OF MECHANICS I: LINEAR
firm postulational basis.‘ The physicist, on the other hand, is likely to find the mathematician’s approach too formal and poorly motivated. Unwilling to discard ideas that have served well, and too impatient or too inexperienced to follow abstract argument, when taking on new baggage, he or she prefers to rearrange the baggage already loaded, in an effort to make it all fit. The purpose of this chapter is to help with this rearrangement. Elaborating on the metaphor, some bags are to be removed from the trunk with the expectation they will fit better later, some fit as is, some have to be reoriented; only at the end does it become clear which fit and which must be left behind. While unloading bags it is not necessary to be fussy, when putting them back one has to be more careful. The analysis of spatial rotations has played a historically important part in the development of mechanics. In classical (both with the meaning nonrelativistic and the meaning “old-fashioned”) mechanics courses this has largely manifested itself in the analysis of rigid body motion. Problems in this area are among the most complicated for which the equations can be “integrated” in closed analytic form in spite of being inherently “nonlinear,”a fact that gives them a historical importance. But since these calculations are rather complicated, and since most people rapidly lose interest in, say, the eccentric motion of an asymmetric top, it has been fairly common, in the teaching of mechanics courses, to skim over this material. A “modern” presentation of mechanics has a much more qualitative and geometric flavor than the “old-fashioned” approach just mentioned. From this point of view, rather than being just a necessary evil encountered in the solution of hard problems, rotations are the easiest-to-understandprototype for the analysis of motion using abstract group theoretical methods. The connection between rotational symmetry and conservation of angular momentum, both because of its importance in quantum mechanics and again as a prototype, provides another motivation for studying rotations. It might be said that classical mechanics has been left mainly in the hands of mathematicians-physicists were otherwise occupied with quantum questions-for so long that the language has become nearly unintelligible to a physicist. Possibly unfamiliar words in the mathematician’s vocabulary include bivectors, multivectors, differential forms, dual spaces, Lie groups, irreducible representations, pseudoEuclidean metrics, and so on. Fortunately all physicists are handy with vector analysis, including the algebra of dot and cross products and the calculus of gradients, divergences, and curls, and in the area of tensor analysis they are familiar with covariant (contravariant)tensors as quantities with lower (upper) indices that (for example) conveniently keep track of the minus sign in the Minkowski metric of special relativity. Tools like these are much to be valued in that they permit a very compact, very satisfactory formulation of classical and relativistic mechanics, of electricity and magnetism, and of quantum mechanics. But they also leave a physicist’s mind unwilling to jettison certain “self-evident” truths that stand in the way of deeper levels of abstraction. Perhaps the simplest example of this is that, having treated ‘Perhaps the first treatment from first principles, and surely the most comprehensive text to base mechanics on the formal mathematical theory of smooth manifolds, was Abraham and Marsden, [l]. Other editions with new authors have followed.
PAIRS OF PLANES AS COVARIANT VECTORS
37
vector cross products as ordinary vectors for many years, one’s mind has difficulty adopting a mathematician’s view of cross products as being quite disimilar to, and certainly incommensurable with, ordinary vectors. Considerable effort will be devoted to motivating and explaining ideas like these in ways that are intended to appeal to a physicist’s intuition. Much of this material will be drawn from the work of Elie Cartan, which, though old, caters to a physicist’s intuition.* To begin with, covariant vectors will be introduced from various points of view and contrasted with the more familiar contravariant vectors.
2.2. PAIRS OF PLANES AS COVARIANT VECTORS The use of coordinates ( x , y , z)-shortly we will switch to ( x ’ , x 2 , x3)-for locating a point in space is illustrated in Fig. 2.2.1. Either orthonormal (Euclidean) or skew (Carte~ian)~ axes can be used. It is rarely required that skew axes be used rather than the simpler rectangular axes but, in the presence of continuous deformations, skew axes may be unavoidable. Next consider Fig. 2.2.2, which shows the intersections of a plane with the same axes as in Fig. 2.2.1. The equation of the plane on the left, in
FIGURE 2.2.1. Attaching coordinates to a point with Euclidean (or orthogonal) axes (on the left) and Cartesian (or possibly skew) axes (on the right). One of several possible interpretationsof the figure is that the figure on the right has been obtained by elastic deformation of the figure on the left. In that case the primes on the right are superfluous since the coordinates of any particular point (such as the point P) the same in both figures, namely (1, 1, 1). *Cartan is usually credited as being the “father” (though I think not the inventor) of differential forms as well as the discoverer of spinors (long before and in greater generality than) Pauli or Dirac. That these early chapters draw so much from Cartan simply reflects the lucidity of his approach. Don’t be intimidated by the appearance of spinors; only elementary aspects of them will be required. 3Many authors use the term “Cartesian”to imply orthogonal axes, but we use “Euclidean” in that case and use “Cartesian” to imply (possibly) skew axes.
38
GEOMETRY OF MECHANICS I: LINEAR
FIGURE 2.2.2. Intersection of a plane with orthogonal axes on the left and a “similar“ figure with skew axes on the right. The equations of the planes are ‘the same:’ though expressed with unprimed and primed coordinates.
terms of generic point (x, y , z) on the plane, is ax
+ by + cz = d ,
(2.2.1)
and, because the coordinates of the intersections with the axes are the same, the equation of the plane on the right in terms of generic point (XI,y’,2 ’ ) is also linear, with the same coefficients (a, b, c , d ) , ax’
+ by‘ + cz‘ = d .
(2.2.2)
The figures are “similar,” not in the conventional sense of Euclidean geometry, but in a newly defined sense of lines corresponding to lines, planes to planes, and intersections to intersections and of the coordinates of the intersections of the plane and the axes being numerically the same. The unit measuring sticks along the Euclidean axes are ex, e y ,e,, and along the skew axes exl, eyt, e,!. The coordinates ( d / a ,d / b , d / c ) of the intersection points are determined by laying out the measuring sticks along the respective axes. Much as (x, y , z) “determines” a point, the values (a, b, c ) , along with d , “determine’’ a plane. Commonly the values ( x , y , z) are regarded as projections onto the axes of an arrow that is allowed to slide around the plane with length and direction held fixed. Similarly, any two planes sharing the same triplet (a, b, c) are parallel. (It would be wrong though to say that such planes have the same normals since, with the notion of orthogonality not yet having been introduced, there is no such thing as a vector normal to the plane. Saying that two planes have the same “direction” can only mean that they are parallel-that is, their intercepts are proportional.) The analogy between plane coordinates (a, b , c) and point coordinates ( x , y, z) is not quite perfect. For example, it takes the specification of a definite value d in
PAIRS OF PLANES AS COVARIANT VECTORS
ax+ b y =d
39
+1
ax
ax
\
+by= 0
FIGURE 2.2.3.
vector x crosses +3 line-pairs number of line-pairs crossed = a x + b y line-pair
Parallel planes. How many plane-pairs are crossed by vector x?
the equation for the plane to pick out a definite plane, while it takes three values, say the (xo, yo, zo) coordinates of the tail, to pick out a particular vector. Just as one regards ( x , y , z) as specifying a sliding vector, it is possible to define a “planerelated” geometric structure specified by ( a , b, c ) with no reference to d. To suppress the dependence on parameter d , observe first that the shift from d to d+ 1 corresponds to the shift from plane ax +by +cz = d to plane ax +by cz = d 1. Each member of this pair of unit-separated planes is parallel to any plane with the same (a, b, c ) values. The pair of planes is said to have an “orientation,’4 with positive orientation corresponding to increasing d. This is illustrated in Fig. 2.2.3.Since it is hard to draw planes, only lines are shown there, but the correspondence with Fig. 2.2.2 should be clear-and the ideas can be extended to higher dimensionality as well. In this way the triplet (a,b, c ) - o r ( a l ,a2, aj), a notation we will switch to shortly-stands for any oriented, unity-spaced pair of planes, both parallel to the plane through the origin ax by cz = 0. Without yet justifying the terminology we will call x a contravariant vector, even though this only makes sense if we regard x as an abbreviation for the three numbers ( x ’ , x 2 , x 3 ) ; it would be more precise to call x a true vector with contravariant components (x, y , z ) E ( X I , x 2 ,x 3 ), so that x E xlel +x2e2 + x 3 e 3 . In some cases, if it appears to be helpful, we will use the symbol 2, instead of just x, to emphasize
+
+
+ +
4The orientation of a pair of planes is said to be “outer.” The meaning of outer orientation is that two points related to each other by this orientation must be in sepamte planes. This can be contrasted to the inner orientation of a vector, by which two points can be related only if they lie in rhe same line parallel to x. An inner orientation for a plane would be a clockwise or counterclockwiseorientation of circulation within the plane. An outer orientation of a vector is a left- or right-handed screw-sense about the vector’s mow.
40
GEOMETRY OF MECHANICS 1: LINEAR
its contravariant vector nature. Also we symbolize by z, an object with components (a, b, c) = ( a l ,a2, as) that will be called c o v a r i ~ n t . ~ In Fig. 2.2.3 the (outer) orientation of two planes is indicated by an arrow (wavy to indicate that no definite vector is implied by it.) It is meaningful to say that contravariant vector x and covariant vector have the same orientation; it means that the arrow x points from the negative side of the plane toward the positive side. Other than being able to compare their orientations, is there any other meaningful geometric question that can be asked of x and Z? The answer is yes; the question is, “How many plane-pairs %doesx cross?” In Fig. 2.2.3 the answer is “3.” Is there any physics in this question and answer? The answer is yes again. Visualize the right plot of Fig. 2.2.3 as a topographical map, with the parallel lines being contours of equal elevation. (One is looking on a fine enough scale that the ground is essentially plane and the contours are straight and parallel.) The “trip” x entails a change of elevation of three units. This permits us to anticipate/demand that the following expressions (all equivalent)
s
(2.2.3) have an intrinsic, invariant significance, unaltered by deformations and transformations (if and when they are introduced.) This has defined a kind of “invariant x ) ~ of, ~a covariant and a contravariant vector. It has also introduced product”:, the repeated-index summation convention. This product is clearly related to the “dot product” of elementary vector analysis a . x, but that notation would be inappropriate at this point because nothing resembling an angle between vectors, or the cosine thereof, has been introduced. The geometric interpretation of the sum of two vectors as the arrow obtained by attaching the tail of one of the arrows to the tip of the other is well known. The geometric interpetationof the addition of covariant vectors is illustrated in Fig. 2.2.4. As usual, the natural application of a covariant vector is the determination of the number of plane-pairs crossed by a general contravariant vector. Notice that the lines %ew” notation is to be discouraged but, where there appears to be no universally agreed-to notation, our policy will be to choose symbols that cause formulas to look like elementary physics, even if their meanings are more general. The most important convention is that multicomponent objects are denoted by boldface symbols, as for example the vector x. This is more compact, though less expressive, than I?. Following Schutz [2], we use an overhead tilde to distinguish a one-form (or covariant vector, such as Z), from a (contravariant) vector, but we retain also the boldface symbol. The use of tildes to distinguish between covariant and contravariantquantitieswill break down when mixed quantities enter. Many mathematicians use no notational device at all to distinguish these quantities, and we will be forced to that when encountering mixed tensors. When it matters, the array of contravariant components will be regarded as a column vector and the array of covariant components as a row vector, but consistency in this regard is not guaranteed. 6The left-hand side of Eq. (2.2.3).being homogeneous in (x, y , z ) , is known as a “form” and, being first-order in the x i , as a “one-form:’ The notations n i x i , 6, x), and Z(x) are interchangeable. ’In this elementary context, the invariant significanceof Z(x) is utterly trivial, and yet when the same concept is introduced in the abstract context of tangent spaces and cotangent spaces, it can seem obscure (see, for example, Arnold, [ 1, p. 203, Fig. 1661).
41
PAIRS OF PLANES AS COVARIANTVECTORS
,
z
FIGURE 2.2.4. Geometric interpretation of the adgtion of covariant vectors, + E. The solid arrow crosses two of the g plane-pairs, one of the b plane-pairs, and hence three of the + b plane-pairs. Tips of the two dotted arrows necessarily lie on the same g b plane.
z
+ of Z + L are more closely spaced than the lines of either Z or L. The geometry of
this figure encapsulates tJe property that a general vector x crosses a number of planes belonging to Z b e T a l to the number of planes it crosses belonging to Z plus the number belonging to b. The reader should also pause to grasp the geometric interpretations of H - b and b - H. There are various ways of interpreting figures like Fig. 2.2.2. The way intended so far can be illustrated by an example. Suppose you have two maps of Colorado, one having lines of longitude and latitude plotted on a square grid, the other using some other scheme-say a globe. To get from one of these maps to the other, some distortion would be required, but one would not necessarily say there had been a coordinate transformation, as the latitude and longitude coordinates of any particular feature would be the same on both maps; call them ( x , y). One can consider the right figure of Fig. 2.2.2 to be the result of a deformation of the figure on the left-both the physical object (the plane or planes) and the reference axes have been deformedpreserving the coordinates of every particular feature. The map analog of the planes in Fig. 2.2.2 are the equal-elevation contours of the maps. By counting elevation contours one can, say, find the elevation of Pike’s Peak relative to Denver. (It would be necessary to break this particular trip into many small segments in order that the ground could be regarded as a perfect plane in each segment.) With local contours represented by Z(j) and local transverse displacements vectors by x ( i ) , the overall change in elevation is obtained by summing the contributions (X(j), x(i)) from each segment.* Clearly one will obtain the same result from both maps. This is a virtue of the form (X, x) . As stated previously, no coordinate transformation has yet occurred, but when one does, we will wish to preserve this feature-if x + x’ we will insist that Z + Z’ such that the value of the form is preserved. That’s an invariant. Fig. 2.2.2 can also be interpreted in terms of “transformations” either active or passive. The elements of this figure are redrawn in Fig. 2.2.5 but with origins superimposed. In part (a) the plane is “actively” shifted. Of course its intersections with the coordinate axes will now be different. The coefficients (covariant components) in the equation of the shifted plane expressed in terms of the original axes are al-
+
8As always in this text, subscripts (i) are enclosed in parentheses to protect against their being interpreted as vector indices. There is no implied summation over repeated parenthesized indices.
42
GEOMETRY OF MECHANICS I: LINEAR Z
FIGURE 2.2.5. “Active” (a) and “passive” (b) interpretationsof the relations between elements of Fig. 2.2.2 as transformations. In each case, though they were plotted separately in Fig. 2.2.2, the plots are here superimposed with common origin 0.
tered from their original values. The new coefficients are said to be the result of an active transformation in this case. Part (b) of Fig. 2.2.5 presents an alternative view of Fig. 2.2.2 as a “passive” change of coordinates. The plane is now unshifted but its covariant components are still transformed because of the different axes. Similar comments apply to the transformation properties of contravariant components9 From what has been stated previously, we must require the form (%,x) to be invariant under transformation. This is true whether the transformation is viewed actively or passively.
2.3. DIFFERENTIAL FORMS 2.3.1, Geometric Interpretation
There is a formalism which, though it seems curious at first, is in common use in modern mechanics. These so-called differentialforms will not be used in this chapter, but they are introduced at this point, after only a minimal amount of geometry has been introduced, in order to emphasize that the concepts involved are very general, independent of any geometry yet to be introduced. In particular there is no dependence on lengths, angles, or orthogonality. Since the new ideas can be adequately illustrated by considering functions of two variaJbles, x_and y, we simplify accordingly and define elementary differential forms dx and dy as functions (of a vector) ’While it is always clear that two possible interpretations exist, it is often difficult to understand which view is intended. A certain fuzziness as to whether an active or a passive view is intended is traditionala tradition this text will regrettably continue to respect. In many cases the issue is inessential, and in any case it has nothing to do with the contravariantlcovariant distinction.
DIFFERENTIAL FORMS
43
satisfying
these functions take displacement vector Ax = x - xo as argument and produce c_omponEntsAx = x - xo and Ay = y - yo as values.” A linear superposition of dx and d y with coefficients a and b is defined by” (a&
+ b&)(Ax) = a Ax + b Ay.
(2.3.2)
In practice Ax and Ay will always be infinitessimal quantities and the differentials will be part of a “linearized” or “first term in Taylor series” procedure. Consider a scalar function h ( x , y)-for concreteness let us take h ( x , y ) to be the elevation above sea level at location (x, y). By restricting oneself to a small enough region about some reference location (xo, yo), h ( x , y) can be linearized-i.e., approximated by a linear expansion
In the language of differential forms this same equation is written as
& = a d7U + b &,
(2.3.4)
where, evidently,
(2.3.5)
l h s shows that & is closely connected to the gradient of ordinary vector analysis. It is not the same thing, though, since the ordinary gradient is orthogonal to contours of constant h and the concept of orthogonality has not yet been introduced. Note that & is independent of h(x0, yo). (Neglecting availability of oxygen and dependence of g on geographic location, the difficulty of climbing a hill is independent of the elevation at its base.) Returning to the map of Colorado, imagine a trip made up of numerous path intervals x ( i ) . The change in elevation h(i) during the incremental path interval x ( i ) is given by
+
h(i) = a ( i ) x ( i ) b(i)y(i)= (a(i)&
u
+ b(i)dy)(x(i))= &i)(X(i)).
(2.3.6)
l’Though the value of a differential form acting on a vector is a real number, in general it is not a scalar. A possibly helpful mnemonic feature of the notation is that to produce a regular-face quantity from a boldface quantity r e q u F the Ese of another boldface quantity. “That the symbols dx and dy are no? ordinary di$erentials is indicated by the boldface type and the overhead tildes. They are being newly defined here. Unfortunately, a more common notation is to represent a differential form simply as d x ; with this notation it is necessary to distinguish by context between differential forms and ordinary differentials. A converse ambiguky in our terminology is that it may not be clear whether the term differentialform means adx bdy or dh.
+
44
GEOMETRY OF MECHANICS 1: LINEAR
Since this equation resembles Eq.(2.2.3),it can also be written as h(i) = (&),
x(i)).
(2.3.7)
The total change of elevation, h, can be obtained by summing over the incremental paths: (2.3.8) 1
As usual, such a summation becomes an integral in the limit of small steps
h=
lE(&
(2.3.9)
dx);
the lower and upper limits of integration correspond to the beginning B and end E of the trip. Though the notation is highly abbreviated, it has an unambiguous, coordinate-free meaning that makes it clear that the result is invariant, in the sense discussed above. The formula has a seeming excess of differentials but, when expanded in components, it takes on a more customary appearance: rE
h=
jB (a(x) d x + b ( x ) d y ) .
(2.3.10)
Example 2.3.1: Three points Pi, i = 1,2, 3, with coordinates ( x ( i ) ,y ( i ) ,q j ) ) , are fixed in ordinary space. (1) Dividing Eq. (2.2.1) by d , find the coefficients in the equation a d
b d
x- + y - +z-
c =1 d
(2.3.1 1)
of the plane passing tkough the points. (2) Defining h ( x , y ) as the elevation z at point (x, y ) , evaluate dh at the point P I . (3) For a general point P whose horizontal displacements relative to PI are given by ( A x = x - ~ ( 1 1 ,A y = y - y(1)),find its elevation Ah = z - z(1) relative to P I . SOLUTION: (a) Ratios of the coefficients ( a ,b, c, d ) are obtained by substituting the known points into Eq. (2.3.11) and inverting:
(":) = (i;:) (:::; c:: ;:)-' =
C'
c/d
X(3)
Y(3)
Z(3)
(i)
.
(2.3.12)
(b) Replacing z by h and defining hl = d/c - ( a / c ) x ( l ) - ( b / c ) y ( l ) ,Eq. (2.3.3) becomes a b h ( x , Y ) = hl - - ( x - X ( 1 ) ) - -(Y - Y ( 1 ) ) . C
C
DIFFERENTIAL FORMS
45
Since the ratios a'/c' and b'/c' are available from Eq.(2.3.12), the required (2.3.4): differential form is given by
a.
-
-
a' b' dh = - - d ~ - -dy. N
C'
a' Ah = - - A X C'
C'
b'
- -Ay. C'
Problem 2.3.1: (a) For points P I , P2, P3 given by (l,O,O), (O,l,O), (O,O,l), check the formula just derived for A h by applying it to each of P I , P2, and P3. (b) The coordinates of three well-known locations in Colorado, Denver, Pike's Peak, and Colorado Springs, are, respectively, W. longitudes 105.1°, 105.1°, and 104.8'; N. latitudes 39.7', 38.8', and 38.8'; and elevations 5280 feet, 14,100 feet, and 5280 feet. Making the (thoroughly unwarranted) assumption that the town of Golden, situated at 105.2'W, 39.7"N, lies on the plane defined by the previous three locations, find its elevation. At this point we can anticipate one implication of these results for mechanics. Recall the connection between elevation h and potential energy U = mgh in the earth's gravitational field. Also recall the connection between work W and potential energy U . To make the equation traditionally expressed as A U = A W = F . Ax meaningful, the vectorial character of force F has to differ from that of the displacement Ax. In particular, since Ax is a contravariant vector, the force F should be a covariant vector (meaning its symbol should be F) for the work to be coordinate independent. In the traditional pedagogy of physics, covariant and contravariant vectors are usually differentiated on the basis of the behavior of their components under coordinate transformations. Note, though, that in our discussion the quantities g and x have been introduced and distinguished, and meaning was assigned to the form 6,x) before any change of coordinates has even been contemplated. This, so far, is the essential relationship between covariant and contravariant vectors. l 2 Since the ideas expressed so far, though not difficult, may seem unfamiliar, recapitulation in slightly different terms may be helpful. It has been found useful to associate contravariant vectors with independent variables, like x and y, and covariant vectors (or one-fonns) with dependent variables, like h . Knowing h ( x , y ) . one can prepare a series of contours of constant h, separated from each other by one unit '*Because the components of vectors vary in such a way as to preserve scalar invariants, a common though somewhat archaic terminology refers to vectors as invarianrs or as invariant vectors, in spite of the facts that (1) their components vary and (2) the expression is redundant anyway. (Note especially that invariant here does not mean constant.) Nowadays the term tensor automatically carries this connotation of invariance. In special relativity the phrase manifestly covariant (or simply covariant) means the same thing, but this is a different (though related) meaning of our word covarianr. Our policy, whenever the invariant aspect is to be specially emphasized, is to use the term true vector, even though it is redundant.
46
GEOMETRY OF MECHANICS I: LINEAR
of “elevation,” and plot them on the (x, y ) plane. For a (defined by a contravariant vector) change (Ax, A y ) in independent variables, by counting the number of contours (defined by a covariant vector) crossed, the change in dependent variable can be determined. We have been led to a somewhat unconventional and cumbersome notation (with d7U being the function that picks out the x-component of an arbitrary vector) so that the symbol dx can retain its traditional physics meaning as an infinitesimal deviation of x. In mathematical literature the symbol dx all by itself typically stands for a differential one-form. Furthermore, we have so far only mentioned one-forms. When a two-form such as dx d y is introduced in mathematical literature, it is taken implicitly (roughly speaking) to pick out the area defined by dx and d y rather than the product of two infinitesimals. We will return to these definitions shortly. There is an important potential source of ambiguity in traditional discussion of mechanics by physicists and it is one of the reasons mathematicians prefer different terminology for differentials: a symbol such as x is used to stand both for where a particle is and where it could conceivably be.13 This is arguably made clearer by mathematician’s notation. Since we wish to maintain physics usage (not to defend it, only to make formulas look familiar) we will use differential forms as much to demystify them as to exploit their power.
2.3.2. Examples Illustratingthe Calculus of Differential Forms Even more than the previous section, since the material in this section will not be required for some time, the reader might be well-advised only to skim over it, planning to address it more carefully later-not because the material is difficult but because its motivation may be unclear. Furthennore the notation here will be far from standard as we attempt to metamorphose from old-fashioned notation to more modem notation. (In any case, since there is no universally accepted notation for this material, it is impossible to use “standard” notation.) For the same reason, there may seem to be inconsistencieseven internal to this section. All this is a consequence mainly of our izsistence on maintaining a distinction between two types of “differential,” dx and dx. Eventually, once the important points have been made, it will be possible to shed some of the notational complexity. A notation we will use temporarily for a differential form such as the one defined in Eq. (2.3.4) is
+
G[dl = f ( x , Y )ZX g(x, Y )&.
(2.3.13)
The only purpose of the “argument” din square brackets here is to correlate i 3 with the particular coordingte diffcentials dx and as contrasted, say, with two independent differentials 6x and 6y:
&,
I3If constraints are present x can also stand for a location where the mass could not conceivably be.
DIFFERENTIAL FORMS
47
The S symbol does not signify some kind of differential operator other than d ; it simply allows notationally for the later assignment of independent values to the differently named differentials. Square brackets are used to protect against interpretation of d or S as an ordinary argument of G. One can develop a calculus of such differential forms. Initially we proceed to do this by treating the differentials as if they were the “old-fashioned” type familiar from physics and freshman calculus. Notationally we indicate this by leaving off the overhead tildes and not using boldface symbols; hence w[dl = f ( x , Y ) dx
+ g ( x , Y )d y .
(2.3.15)
The differential Sw[d] = S(w[d])is defined by
(2.3.16) Since these are ordinary differentials, if f and g were force components, Sw[d] would be the answer to the question, “How much more work is done in displacement ( d x , d y ) from displaced location P SP = (x Sx, y Sy) than is done in displacement ( d x , d y ) from point P = ( x , y)?’ S o [ d ] is not the same as dw[S] but, from the two, the combination
+
+
B.C.[w]
+
Sw[d] - dw[S],
(2.3.17)
can be formed; it is to be known as the “bilinear covariant” of w. After further manipulation it will yield the “exterior derivative” of w .
Example 2.3.2: Consider the example w [ d ] = y dx
+x dy.
(2.3.18)
Substituting into Eq. (2.3.16),we obtain Sw[d] = Sy dx
+ SX d y ,
dw[S] = d y S X
+ dx Sy.
(2.3.19)
Notice, in this case, that the bilinear covariant vanishes, h [ d ] - d o [ & ]= 0.
This is not always true, however; operating with d and operating with S do not “commute”-that is, 6w[d]and d o [ S ] are different. But products such as d x Sy and 6 y dx are the same; they are simply the products of two (a physicist might say tiny) independently assignable coordinate increments.When its bilinear covariant does, in fact, vanish, w is said to be “closed.” In the case of Eq. (2.3.18), w[d] is “derivable from” a function of position h ( x , y ) = xy according to w [ d ] = dh(x, y ) = y dx
+x dy.
(2.3.20)
48
GEOMETRY OF MECHANICS I: LINEAR
In this circumstance (of being derivable from a single-valued function), o is said to be “an exact differential.” Problem2.3.2: Show that the bilinear covariant B.C.[o] of the differential oneform, w [ d ] = d h ( x , y ) . vanishes for arbitrary function h ( x , y ) .
Example 2.3.3: For the differential form w [ d ]= y d x ,
(2.3.21)
one sees that
h [ d ] - dw[S] = Sy dx - d y SX, which does not vanish. But if we differentiate once again (introducing D as yet another symbol to indicate a differential operator) we obtain D(Sy dx - d y SX) = 0,
since the coefficients of the differentials being differentiated are now simply constants.
Problem 2.3.3: For o [ d ]given by Eq. (2.3.15),with f ( x , y ) and g(n,y) being general functions, show that its bilinear covariant does not vanish in general. We have been proceeding as if our differentials were “ordinary,” but to be consistent with our “new” notation Eq.(2.3.16) should have been written
with the result being a two-form-a function of two vectors, say A x ( l ) and Ax(2). Applying Eq. (2.3.1),this equation leads to
Except for the renaming of symbols, Sx + Ax(1),d x + AxQ), Sy -+ Ay(l),and d y + Ay(2),this is the same as Eq. (2.3.16). Hence, though the original symbols had different meanings, Eqs. (2.3.16) and (2.3.22)have equivalent content. For this to be true we have implicitly assumed that the first of the two arguments Ax(1) and Ax(2)is acted on by the 6differential form and the second by the d form. Since these appear in the same order in every term we could as well say that the first form acts on A x ( [ )and the second on Ax(2).Furthermore, since Eq. (2.3.13) made no distinction
DIFFERENTIAL FORMS
49
between S and d forms, we might as well have written Eq. (2.3.22) as
&:(
Cj[dl[d] =
+
(2.3.24)
as long as it were understood that in a product of two differential forms the first acts on the first argument and the second on the second. Note though that in spite of the fact that it is legitimate to reverse the order in a product of actual displacementEli&e Ax(l)Ay(z),it is illegitimate to reverse the order of the terms in a product like dxdy; that is,
&& # Zy&.
(2.3.25)
The failure to commute of our quantities, which will play such an important role in the sequel, has entered here as a simple consequence of our notational convention specifying the meaning of the differential of a differential. How then to express the bilinear covariant, without using the distinction between d and S? Instead of antisymmetrizingwith respect to d and S, we can antisymmetrize with respect to the arguments. A “new notation” version of Eq. (2.3.17), with C j still given by Eq. (2.3.13), can be written
This can be reexpressed by defining the “wedge product” &A&
= && - &&.
(2.3.27)
&,
(2.3.28)
Note from its definition, that
& A & = -gy
A
and zx A gx = 0.
We obtain (2.3.29) which can be substituted into Eq.(2.3.26). Since the arbitrary increments Ax(1) and Ax(2) then appear as common arguments on both sides of Eq. (2.3.26) they can be suppressed as we define a two-form E.D.[Z], which is B.C.[Z](AX(I),Ax(2)) with its arguments unevaluated: (2.3.30)
50
GEOMETRY OF MECHANICS I: LINEAR
When operating on any two vector increments,E.D.[&]generates the bilinear covariant of & evaluated for the two vectors. This newly defined differential two-form is known as the “exterior differential” of the differential one-form G.From here on this will be written ad 4
(2.3.31) Note that this relation is consistent with the rule d(f&) = ZfAZX. The vectors Ax(1) and Axp) can be said to have played only a “catalytic” role in the definition of the exterior derivative since they no longer appear in Eq.(2.3.31). From its appearance, one might guess that the exterior derivative is related to the curl operator of vector analysis. This is to be pursued next.
2.3.3. Connections between Differential Forms and Vector Calculus Like nails and screws, the calculus of vectors and the calculus of differential forms can be regarded as essentially similar or as essentially different depending on one’s point of view. Both can be used to hold physical theories together. A skillful carpenter can hammer together much of a house while the cabinet maker is still drilling the holes in the kitchen cabinets. Similarly the physicist can derive and solve Maxwell’s equations using vector analysis while the mathematician is still tooling up the differential form machinery. The fact is, though, that some structures cannot be held together with nails and some mechanical systems cannot be analyzed without differential forms. There is a spectrum of levels of ability in the use of vectors, starting from no knowledge whatsoever, advancing through vector algebra, to an understanding of gradients, curls, and divergences, to a skillful facility with the methods. The corresponding spectrum is even broader for differential forms, which can be used to solve all the problems that vectors can solve plus others as well. In spite of this, most physicists remain at the “no knowledge whatsoever” end of the spectrum. This is perhaps partly due to some inherent advantage of simplicity that vectors have for solving the most commonly encountered problems of physics, but the accidents of pedagogical fashion probably also play a role. According to Arnold, in Mathematical Methods of Classical Mechanics (1989), “Hamiltonian mechanics cannot be understood without differential form^."'^ It behooves us therefore to make a start on this subject. But in this text only a fairly superficial treatment will be included (the rationale being that the important and hard thing is to get the general idea, but that following specialized texts is not so difficult once one has the general idea). The whole of advanced calculus can be formulated in terms of differential forms, as can more advanced topics, and there are several texts I4The notation of Eq. (2.3.31) is still considerably bulkier than is standard in the literature of differential forms. There, the quantity (exterior derivative) that we have called E.D.[G] is often expressed simply as d o , andEq. (2.3.31)becomesdo = (-af/ay+ag/ax) d x h d y o r e v e n d o = (-af/ay+ag/ax)dxdy. I5It might be more accurate to say that “without differentialforms one cannot understand Hamiltonian mechanics as well as Arnold,” but this statement would be true with or without differential forms.
DIFFERENTIAL FORMS
51
concentrating narrowly yet accessibly on these subjects. Here we are more interested in giving the general ideas than in either rigorous mathematical proof or practice with the combinatorics that are needed to make the method compete with vector analysis in compactness. The purpose of this section is to show how formulas that are (assumed to be) already known from vector calculus can be expressed using differential forms. Since these results are known, it will not be necessary to prove them in the context of differential forms. This will permit the following discussion to be entirely formal, its only purpose being to show that definitions and relations being introduced are consistent with results already known. We work only with ordinary, three-dimensional, Euclidean geometry, using rectangular coordinates. It is far from true that the validity of differential forms is restricted to this domain, but our purpose is only to motivate the basic definitions. One way that Eq. (2.3.13) can be generalized is to go from two to three dimensions: $1)
= f ( x , y , z) dux
+ g(xt y , z) & + h ( x , y . z) &,
where the superscript (1) indicates that leading to Eq. (2.3.31) yield
(2.3.32)
is a one-form. Calculations like those
Next let us generalize Eq. (2.3.27) by defining
(2.3.34)
& A & A &(Ax(l), AX(^), Ax(3)) = det
4 1 )
Ay(l)
AX(2) 4 3 ) Ay(2) Ay(3)
.
(2.3.35)
where the superscript (2) indicates that Zi(2) is a two-form. At first glance this may seem to be a rather ad hoc and special form, but any two-form that is antisymmetric in its two arguments can be expressed this way.16 We then define the exterior 161n most treatments of differential forms the phrase “antisymmetric two-form” would be considered redundant, since “two-forms” would have already been defined to be antisymmetric.
52
GEOMETRY OF MECHANICS I: LINEAR
differential,
These definitions are only special cases of more general definitions, but they are all we require for now. From Eq. (2.3.37),using Eqs. (2.3.28), we obtain
(2.3.38) Let us recapitulate the formulas that have been derived, but using notation for the coefficients that is more suggestive than the functions f ( x , y, z), g ( x , y , z ) , and h ( x , y, z) used so far.
Then Eqs. (2.3.4),(2.3.33), and (2.3.38) become
-
d
a4-
= -dX
ax
a++ -a$d y + -dZ, aY az
2
+ (--+-
3;)-
-
dZAdx,
(2.3.40) We can now write certain familiar equations as equations satisfied by differential forms. For example,
--
d d2) = 0, is equivalent to
V B = 0.
(2.3.41)
a
The three-form zG(2) is “waiting to be evaluated” on coordinate increments as in Eq. (2.3.32);this includes the “Jacobean factor” in a volume integration of V .B. The = 0 therefore represents the “divergence-free” ngure of the vector B. equation d While V . B is the integrand in the integral form of this law, d includes also the Jacobean factor in the same integral. When, as here, orthonormal coordinates are used as the variables of integration, this extra factor is trivially equal to 1, but in other coordinates the distinction is more substantial. However, since the Jacobean factor cannot vanish, it cannot influence the vanishing of the integrand. This discussion of integrands is expanded upon in Section 4.2. Here are some other examples of familiar equations expressed using differential forms:
z(2)
z(2)
53
DIFFERENTIAL FORMS N
= -d$O) , equivalent to E = -Vd,
&(I)
(2.3.42)
yields the “electric field” as the (negative) gradient of the potential. Also
Z P = 0,
v x E = 0,
equivalent to
(2.3.43)
states that E is “irrotational” (that is, the curl of E vanishes). The examples given so far have been applicable only to time-independent problems such as electrostatics. But let us define $’)
= [J,(x, y, z, t)&
& + J y ( x ,y , z, t)& A & + Jz(X, A &’ A &.
A
- p(X, y, Z, t)&
y, z, t)&
A
&‘]
A
&
(2.3.44)
Then
Zcj‘’)
= o is equivalent to
aP v x J+= 0, at
(2.3.45)
which is known as the “continuity equation.” In physics such relations relate “fluxes” and “densities.” This is developed further in Section 4.4.2. Another familiar equation can be obtained by defining
~ ( l ) = ~ , & + ~ , & + ~ , & - ~ & . (2.3.46) Then the equation
(2.3.47) is equivalent to the pair of equations
B=VxA,
aA E=---Vd. at
(2.3.48)
These examples show that familiar vector equations can be reexpressed as equations satisfied by differential forms. All these equations are developed further in Chapter 12. The full analogy between forms and vectors, in particular including cross products, requires the introduction of “supplementary” multivectors, also known as “the star (*) operation.” This theory is developed in Section 4.2.4. What are the features of these newly introduced differential forms derived by exterior differentiation? We state some of them, without proof for now. 0 The forms derived in this way inevitably find themselves acting as the “differential elements” of multidimensionalintegrals. When one recalls two of the important difficulties in formulating multidimensionalintegrals, evaluating the appropriate Jacobian and keeping track of sign reversals, one will be happy to know that these exterior derivatives “take care of hoth problems.” The true power of the exterior derivatives is that this formalism works for spaces of arbitrary di-
54
GEOMETRY OF MECHANICS I: LINEAR
0
0
0
0
mension, though formidable combinatorial calculations may be necessary. We will return to this subject in Sections 4.3.2 and 4.4. The differential forms “factor out” the arbitrary incremental displacements, such as Ax(1) and Ax(2) in the above discussion, leaving them implicit rather than explicit. This overcomes the inelegant need for introducing different differential symbols such as d and 6. Though this feature is not particularly hard to graspit has been thoroughly expounded upon here-not being part of the traditional curriculum encountered by scientists, it is what causes the equations to have an unfamiliar appearance. The quantities entering the equations of physics such as Maxwell’s equations are, traditionally, physically measurable vectors, such as electric field E, that are naturally visualized as arrows. When writte2 i t terms of forms, invariant combinations of forms and vectors, such as (E, Ax), more naturally occur. Something resembling this observation has no doubt already been encountered in sophomore electricity and magnetism. Traditionally, after first encountering Maxwell’s equations in integral form, one uses vector analysis to transform them into relations among curls and divergences relating space and time derivatives of the electric and magnetic quantities. Though the differential versions of Maxwell’s equations fit more neatly on tee shirts, the integral_ver_sionsare just as fundamental; their integrands are invariant products like (E, Ax). It is only when the differential versions of these equations are expressed in terms of exterior derivatives instead of curls and divergences that they acquire an unfamiliar appearance. By far the most fundamental property of the calculus of differential forms is that they make the equations manifestly invariant, that is, independent of coordinates. Of course this is also the chief merit of the vector operators, gradient, divergence, and curl. Remembering the obscurity surrounding these operators when they were first encountered (some of which perhaps still lingers in the case of curl), one has to anticipate a considerable degree of difficulty in generalizing these concepts-which is what the differential forms do. In this section only a small start has been made toward establishing this invariance; the operations of vector differentiation, known within vector analysis to have invariant character, have been expressed by differential forms. Having said all this, it should also be recognized that the differential forms really amount to being just a sophisticated form of advanced calculus.
2.4. ALGEBRAIC TENSORS 2.4.1. Vectors and Their Duals In traditional physics (unless one includes graphical design) there is little need for geometry without algebra-synthetic geometry-but algebra without geometry is both possible and important. Though vector and tensor analysis were both motivated initially by geometry, it is useful to isolate their purely algebraic aspects. Everything
ALGEBRAIC TENSORS
55
that has been discussed so far can be distilled into pure algebra. That will be done in this section, though in far less generality than in the references listed at the end of the chapter. Van der Waerden 141allows numbers more general than the real numbers we need; Arnold [3] pushes further into differential forms. Most of the algebraic properties of vector spaces are “obvious” to most physicists. Vectors x , y, etc., are quantities for which superposition is valid-for scalars a and b, ax by is also a vector. The dimensionality n of the vector space containing x, y, etc., is the largest number of independent vectors that can be selected. Any vector can be expanded uniquely in terms of n independent basis vectors e l , e2, . . ., en;
+
x = eix’.
(2.4.1)
This provides a one-to-one relationship between vectors x and n-component multiplets ( X I , x 2 , . . . ,xn)-for now at least, we will say they are the same thing.I7 In particular, the basis vectors el, e ~. .,. , e,, correspond to (1 , 0, . .. , 0),(0, 1, . . . ,0 ) ,. . . , (0,0, . . ., 1). Component-wise addition of vectors and multiplication by a scalar is standard. hportant new content is introduced when one defines a real-valued linearfunction f ( x ) of a vector x ; such a function, by definition, satisfies relations
Expanding x in basis vectors ei, this yields
-f ( x ) = fi x’ = (f, x ) , .
-
where
-
fi = f(ei).
(2.4.3)
This exhibits the value of?(x) as a linearform in the components x i with coeifficients fi. Now we have a one-to-one correspondence between linear functions f and ncomponent multiplets (fl, f2, . . . , fn). Using language in the loose fashion often applied to vectors, we can say that a linear function of a vectz and a linear form in the vector’s components are the same thing though, unlike f , the fi depend on the choice of basis vectors. This space of linear functions of vectors in the original space is called dual to the original space. With vectors in the original space called contravariant, vectors in the dual space are called covariant. Corresponding to basis vectors ei in the original space there is a natural choice of basis vectors 3 in the dual space. When acting on ei, 3 yields 1; when acting on any other of the ej it yields 0. Just as the components of el are (1, 0, . . . , 0) the components of Z’ are (1, 0, . . ., 0). and so on. More concisely,18 7‘
e (ej)=
(2.4.4)
l7 As long as possible we will stick to the colloquial elementary physics usage of refusing to distinguish between a vector and its collection of components,even though the latter depends on the choice of basis vectors while the former does not. ‘*There is no immediate significance to the fact that one 01 the indices of 6;. is written as a subscript 1 and one as a superscript. Equal to 6ij, 6’j is also a Kronecker-6.
56
GEOMETRY OF MECHANICSI: LINEAR
By taking all linear combinations of a subset of the basis vectors, say the first m of them, where 0 < m < n, one forms a sub-vector-space S of the original space. Any vector x in the whole space can be decomposed uniquely into a vector y= eixi in this space and a vector z = eixi. A “projection operator” P onto the subspace can then be defined by y = Px. It has the property that Pz = P. Since x = P x (1 - P)x, z = (1 - P)x and 1 - P projects onto the space formed from the last n - m basis vectors. There is a subspace 9 in the dual space, known as the “annihilator” of S; it is the vector space made up of all linear combinations of the n - m forms P+’,P s 2 .,.., p .These are the last n - m of the natural basis forms in the dual space, as listed in Eq. (2.4.4).Any form in So “annihilates” any vector in S, which is to say yields zero when acting on the vector. This relationship is reciprocal in that S annihilatesSo.Certainly there are particular forms not in Sothat annihilate certain vectors in S, but So contains all forms, and only those forms, that annihilate all vectors in S. This concept of annihilation is reminiscent of the concept of the orthogonality of two vectors in ordinary vector geometry. It is a very different concept, however, since annihilation relates a vector in the original space and a form in the dual space; an arrow in S such as el crosses no planes corresponding to a form in SO such as Em+’. Only if there is a rule associating vectors and forms can annihilation be used to define orthogonality of two vectors in the same space. By introducing linear functions of more than one vector variable we will shortly arrive at the definition of tensors. However, since all other tensors are introduced in the same way as was the dual space, there is no point in proceeding to this definition without first having grasped the concept of the dual space. Toward that end we should eliminate an apparent asymmetry between contravariant vectors and covectors. This asymmetry has resulted from the fact that we starred with contravariant vectors, and hence might be inclined to think of them as more basic. But consider the space of linear functions of covariant vectors-that is, the space that is dual to the space that is dual to the original space. (As an exercise) it can be seen that the dual of the dual is the same thing as the original space. Hence, algebraically at least, which is which between contravariant and covariant vectors is entirely artificial, just like the choice of which is to be designated by superscripts and which by subscripts.
xy
Cm+,
+
2.4.2. Transformationof Coordinates When covariant and contravariant vectors are introduced in physics, the distinction between them is usually expressed in terms of the matrices accompanying a change of basis vectors. Suppose a new set of basis vectors e‘j is related to the original set ej by (2.4.5)
(If one insists on interpreting this relation as a matrix multiplication, it is necessary to regard e’j and ej as being the elements of row vectors, even though the row elements are vectors rather than numbers, and to ignore the distinction between upper and
ALGEBRAIC TENSORS
57
lower in dice^.)'^ Multiplying on the right by the inverse matrix, the inverse relation is (2.4.6)
For formal manipulation of formulas, the index conventions of tensor analysis are simple and reliable, but for numerical calculations it is sometimes convenient to use matrix notation in which multicomponent objects are introduced so that the indices can be suppressed. This is especially useful when using a computer language that can work with matrices as supported types that satisfy their own algebra of addition, multiplication, and scalar multiplication. To begin the attempt to represent the formulas of mechanics in matrix form, some recommended usage conventions will now be formulated, and some of the difficulties in maintaining consistency will be explicitly addressed. Already, in defining the symbols used in Eq. (2.4.5), a conventional choice was made. The new basis vectors were called e’j when they could have been called e j ‘ ; that is, the prime was placed on the vector symbol rather than on the index. It is a common, and quite powerful notation, to introduce both of these symbols and to use them to express two distinct meanings. (See for example Schutz.) In this notation, even as one “instantiates” an index, say replacing i by 1, one must replace i’ by l’, thereby distinguishing between el and elf.In this way, at the cost of further abstraction, one can distinguish change of axes with fixed vector from change of vector with fixed axes. At this point this may seem like pedantry, but confusion attending this distinction between active and passive interpretations of transformations will dog us throughout this text and the subject in general. One will always attempt to define quantities and operations unambiguously in English, but everyday language is by no means optimal for avoiding ambiguity. Mathematical language, such as the distinction between el and ell just mentioned, can be much more precise. But, sophisticated as it is, we will nor use this notation, because it seems too compact, roo mathematical, too cryptic. Another limitation of matrix notation is that, though it works well for tensors of one or two indices, it is not easily adapted to tensors with more than two indices. Yet another complication follows from the traditional row and column index-order conventions of matrix formalism. It is hard to maintain these features while preserving other desirable features such as lower and upper indices to distinguish between covariant and contravariant quantities, which, with the repeated-index summation convention, yield very compact formulas.20 Often, though, one can restrict calcula‘’Since our convention is that the up/down location of indices on matrices is irrelevant, Eq. (2.4.5) is the same as e’. = ei A i j . This in turn is the same as ei = ( A * ) i , e j , which may seem like a more natural J ordering. But one sees that whether it is the matrix or its transpose that is said to be the transformation matrix depends on whether it multiplies on the left or on the right and is not otherwise significant. *‘The repeated-indexconvention is itself used fairly loosely. For example, if the summation convention is used as in Eq. (2.4.5) to express a vector as a superposition of basis vectors, the usage amounts to a simple abbreviation without deeper significance. But when used (as it was by Einstein originally) to form a scalar from a contravariant and a covariant vector, the notation includes a deeper implication of invariance. In this text both of these conventions will be used, but for other summations, such as over particles in a system, the summation symbol will be shown explicitly.
58
GEOMETRY OF MECHANICS I: LINEAR
tions to a single frame of reference and, in that case, there is no need to distinguish between lower and upper indices. When the subject of vector fields is introduced, an even more serious notational complication will arise because a new kind of “multiplication” of one vector by another will be noncommutative. As a result, the validity of an equation such as AX)^ = xTAT is called into question. One is already accustomed to matrix multiplication being noncommutative, but the failure of vector fields to commute will seriously compromise the power of matrix notation and the usefulness of distinguishing between row and column vectors. In spite of all these problems, matrix formulas will still often be used, and when they are, the following - conventions will be adhered to: As is traditional, contravariant components X I ,x 2 , . . . , x” are arrayed as a column vector. This leads to the remaining conventions in this list. (Covariant)components fi of form fare to be arrayed in a row. The basis vectors ei, though not components of an intrinsic quantity, will be arrayed as a row for purposes of matrix multiplication. Basis covectors3 will be arrayed in a column. Notations such as will not be used; the indices on components are necessarily 1 , 2 , 3 ,. . . . Symbolic indices with primes, as in x,l are, however, legitimate. The indices on a quantity like h i j are spaced apart to make it unambiguous which is to be taken as the row, in this case i , and which as the column index. The up/down location is to be ignored when matrix multiplicationis being employed. In terms of the new basis vectors introduced by Eq. (2.4.5), using Fq. (2.4.6), a general vector x is reexpressed as
-
e’jxl’
= x = ekxk = e‘j(12-i ) ik x k ,
(2.4.7)
from which it follows that ,Ii
= (A-l)j kx k .
(2.4.8)
Because the matrix giving x i --f xi’ is inverse to the matrix giving ei -+ e’i, this is known conventionally as contravariant transformation. If the column of elements x’j and x k are symbolized by x’ and x and the matrix by A-‘, then Eq. (2.4.8)becomes XI
= A-’x.
(2.4.9)
When boldface symbols are used to represent vectors in vector analysis, the notation implies that the boldface quantities have an invariant geometric character, and in this context an equation like (2.4.9) might by analogy be expected to relate two different “arrows” x and XI.The present boldface quantities have not been shown to have this geometric character and, in fact, they do not. As they have been introduced, since x and x’ stand for the same geometric quantity, it is redundant to give them different symbols. This is an instance of the above-mentioned ambiguity in specifying
ALGEBRAIC TENSORS
59
transformations. Our notation is simply not powerful enough to distinguish between active and passive transformations in the same context. For now we ignore this redundancy and regard Eq. (2.4.9) as simply an abbreviated notation for the algebraic relation between the components. Since this notation is standard in linear algebra, it should be acceptable here once the potential for misinterpretation has been understood. Transformation of_covariant components fi has to be arranged to secure the invariance of the form (f,x ) defined in Eq.(2.4.3). Using Eq.(2.4.8) (2.4.10) and from this fk
= fj(A-‘)jk
Or
f ; = f j A ik .
(2.4.1 1)
This is known as covariant transformation because the matrix is the same as the matrix A with which basis vectors transform. The only remaining case to be considered is the transformation of basis one-forms; clearly they transform with A-’ . Consider next the effect of following one transformation by another. The matrix representing this “composition” of two transformations is known as the “concatenation” of the individual matrices. Calling these matrices A1 and 6 2 , the concatenated matrix A can be obtained by successive applications of Eq. (2.4.9): (2.4.12) This result has used the fact that the2ontravariant components are arrayed as a column vector. On the other hand, with f regarded as a row vector of covariant components, Eq. (2.4.1 1) yields
f7/ =?A1 A2, or A = A l A , .
(2.4.13)
It may seem curious that the order of matrix multiplications can be opposite for “the same” sequence of transformations, but the result simply reflects the distinction between covariant and contravariant quantities. Since general matrices A and B satisfy (AB)-’ = B-IA-’, the simultaneous validity of Eq. (2.4.12) and Eq. (2.4.13) can be regarded as mere self-consistency of the requirement that @, x ) be invariant. The transformationsjust considered have been passive, in that basis vectors were changed but the physical quantities were not. Commonly in mechanics, and even more so in optics, one encounters active linear transformations that instead describe honest-to-goodness evolution of a physical system. If the configuration at time tl is described by x ( t l ) and at a later time ?2 by x(t2). linear evolution is described by
and the equations of this section have to be reinterpreted appropriately.
60
GEOMETRY OF MECHANICS I: LINEAR
2.4.3. Transformationof Distributions
Often one wishes to evolve not just one particle in the way just mentioned, but rather an entire ensemble or distribution of particles. Suppose that the distribution, call it p ( x ) . has the property that all particles lie in the same plane at time t i . Such a distribution could be expressed as $ ( x ) 3 ( a x + by cz - d), where 6 is the Dirac &-“function”with argument which, when set to zero, gives the equation of the plane. (For simplicity set d = 0.) Let us ignore the distribution within the plane (described by b(x)) and pay attention only to the most noteworthy feature of this ensemble of points, namely the plane itself and how it evolves. If x(1) is the displacement vector of a generic particle at an initial time t(l), then initially the plane is described by an equation
+
f(l)ix’ = 0.
(2.4.15)
For each of the particles, setting x i = xll) in Eq. (2.4.15)results in an equality. Let us call the coefficients f(1)i“distribution parameters” at time f1 since they characterize the region containing the particles at that time. Suppose that the system evolves in such a way that the individual particle coordinates are transformed (linearly) to xi2). and then to x i 3 ) , according to
(2.4.16) With each particle having been subjected to this transformation,the question is, what is the final distribution of particles? Since the particles began on the same plane initially and the transformations have been linear, it is clear they will lie on the same plane finally. We wish to find that plane, which is to say to find the coefficients f(3)k in the equation f(3)k X k = 0.
(2.4.17)
This equation must be satisfied by xt3, as given by Eq. (2.4.16),and this yields k if(3)k (BA)i x - 0.
(2.4.18)
It follows that f(3)k
1
= f(l)i((BA)-l)ik = f(l)i(A-B
-1
i
k.
(2.4.19)
This shows that the coefficients fi describing a distribution of particles transform covariantly when individual particle coordinatesx i transform contravariantly. We have seen that the composition of successive linear transformations represented by matrices A and B can be either BA or A-lB-’ depending on the nature of the quantity being transformed, and it is necessary to determine from the context which one is appropriate. If contravariantcomponents compose with matrix BA,then covariant components compose with matrix A-’B-’.
ALGEBRAIC TENSORS
61
Though these concatenation relations have been derived for linear transformations, there is a sense in which they are the only possibilities for (sufficientlysmooth) nonlinear transformations as well. If the origin maps to the origin, as we have assumed implicitly, then there is a “linearized transformation” that is approximately valid for “small-amplitude” (close to the origin) particles, and the above concatenation properties must apply to that transformation. The same distinction between the transformation properties of particle coordinates and distribution coefficients must therefore apply also to nonlinear transformations, though the equations can be expected to become much more complicated at large amplitudes. It is only linear transformations that can be concatenated in closed form using matrix multiplication, but the opposite concatenation order of covariant and contravariant quantities also applies in the nonlinear regime. There is an interesting discussion in Schutz, [2, Sec. 2.181, expanding on the interpretation of the Dirac deltafunction as a distribution in the sense in which the word is being used here. If the argument of the delta function is said to be transformed contravariantly, then the “value” of the delta function transforms covariantly.
*2.4.4. Multi-index T e n s o r s and Their Contraction2’ We turn now to tensors with more than one index. Tbo-index covariant tensors are defined by conssering real-valued bilinear functions of two vectors, say x and y. Such a function f(x, y) is called bilinear because it is linear in each of its two arguments separately. When the arguments x and y are expanded in terms of the basis introduced in Q.(2.4.3), one has22 u
f(x, y) = fi, x i y J , where
fi,
-
= f(q, ej).
(2.4.20)
As usual, we will say that the functionTand the array of coefficients fi, are the same thing and that?(x, y) is the same thing as the bilinear form fij x iy J . The coefficients fi, are called covariant components of f. Pedantically it is only T(x, y), with arguments inserted, that deserves to be called aform, but common usage seems to_be to call T a form all by ?self. An expressive notation that will often be used is f(.,.), which indicates that f is “waiting for” two (as yet nameless, or “anonymous”) vector arguments. Especially important are the anti-symmetric bilinear functions f(x, y) that change sign when x and y are interchanged:
-
-
-f(x, y) = -T(Y, x),
or
fij
= -f... I‘
(2.4.21)
21Thissection is rather abstract. The reader willing to accept that the confrnczionof the upper and lower index of a tensor is invariant can skip it. Footnote 23 hints at how this result can be obtained more quickly. 22Eq. (2.4.20) is actually unnecessarily restrictive, since x and y could be permitted to come from different spaces.
62
GEOMETRY OF MECHANICS I: LINEAR
These alternating or antisymmetric tensors are the only multi-index quantities that represent important geometric objects. The theory of determinants can be based on them as well. (See Van der Waerden, Section 4.7.) To produce a contravariant two-index tensor requires the definition of a bilinear function of two covariant vectors G and V. One way of constructing such a bilinear function is to start with two fixed contravariant vectors x and y and to define f@, 7) = (Z, x)
6,Y).
(2.4.22)
This tensor is called the tensor product x 8 y of vectors x and y. Its arguments are u and v. (The somewhat old-fashioned physics terminology is to call f the dyadic product of x and y.) In more expressive notation,
The vectors x and y can in general belong to different spaces with different dimensionalities,but for simplicity in the following few paragraphs we assume they belong to the same space having dimension n. The components of x @ y are .
.
f " = (x@y)(Z'i, Z j ) = ( ? , x ) ( Z J , y ) = x ' Y'.
(2.4.24)
Though the linear superposition of any two such tensors is certainly a tensor, call it t = ( r i j ) , it does not follow in general that two vectors x and y can be found for which t is their tensor product. However, all such superpositions can be expanded in terms of the tensor products ei 8 e j of the basis vectors introduced previously. These products form a natural basis for such tensors t. In the next paragraph the n2-dimensionalvector space of two-contravariant-index tensors t will be called 7. At the cost of greater abstraction, we next prove a result needed to go from one index to two indices. The motivation is less than obvious, but the result will prove to be useful straightaway-what a mathematician might call a lemma;23
THEOREM 2.4.1: For any function B ( x , y) linear in each of its two arguments x and y, there exists an intrinsic linear function of the single argument x 8 y, call it S(x C?J y), such that
The vectors x and y can come from different vector spaces. Proofi In terms of contravariant components x i and y j , the given bilinear function has the form B(X, y) = sjj x i y j .
(2.4.26)
23The reader impatient with abstract argumentation may consider it adequate to base the invariance of the trace of a mixed tensor on the inverse transformation properties of covariant and contravariant indices.
ALGEBRAIC TENSORS
63
This makes it natural, for arbitrary tensor t drawn from space I,to define a corresponding function S(t) that is linear in the components r i j oft: ..
S(t) = sij r’J.
(2.4.27)
When this function is applied to x @ y, the result is S(x @ y) = sij x i y j = B(x, y),
(2.4.28)
which is the required result. Since components were used only in an intermediate stage the theorem assures the relation to be intrinsic (coordinate-free). (As an aside, note that the values of the functions S and B could have been allowed to have other (matching) vector or tensor indices themselves without affecting the proof. This increased generality is required to validate contraction of tensors with more than two rn indices.) Other tensor products can be made from contravariant and covariant vectors. Holding i? and V fixed while x and y vary, an equation like (2.4.22) can also be regarded as defining a covector product i? @V. A mixed vector product f = u @ y can be similarly defined by holding i? and y constant:24
-
f(x,V) = G,x ) F , Y).
(2.4.29)
The components of this tensor are
fi’
= c;;@ y)(ei, ~
j =)
(i?,e i ) g j , y) = ui yj.
(2.4.30)
Later on we will also require antisymmetrizedtensor products, or “wedge products” defined by
V) - (x, 3 ( Y , 3 , y) = (& X) 6, y) - (z, Y) 6, X).
x A Yc;;,7) = (x, 3 ( Y ,
5 A ?(X,
(2.4.31)
-
The generation of a new tensor by “index contraction” can now be considered. Consider the tensor product t = u @ x where iT and x belong to dual vector spaces. The theorem proved above can be applied to the function
BC;;, bilinear in
X)
= (i?,X) = uj x i l
(2.4.32)
and x, to prove the existence of intrinsic linear function S such that
~(ii 8 X) = uj x i = tr(ii 8 XI,
(2.4.33)
where tr(t) is the sum of the diagonal elements of tensor i? €3 x in the particular coordinate system shown (or any other, since 5,x is invariant). Since any mixed two24A deficiency of our notation appears at this point since it is ambiguous whether or not the symbol f in f = 8 y should carry a tilde.
64
GEOMETRY OF MECHANICS 1: LINEAR
component tensor can be written as a superposition of such covectorkontravector products, and since the trace operation is distributive over such superposition, and i ix) is an intrinsic function, it follows that tr(t) = tii is an invariant since ~ (8 function for any mixed tensor. tr(t) is called the contraction oft. 2.4.5. Overlap of Tensor Algebra and Tensor Calculus Before leaving the topic of tensor algebra, we review the differential form & obtained from a function of position x called h ( x ) . We saw a close connection between this quantity and the familiar gradient of vector calculus, V h. There is little to add now except to call attention to a potentially confusing issue of terminology. A physicist thinking of vector calculus thinks of gradients, divergences, and curls (the operators needed for electromagnetism) as being on the same footing in some sense-they are all “vector derivatives.” On the other hand, in mathematics books discussing tensors, gradients are normally considered to be “tensor algebra” and only the divergence and curl are the subject matter of “tensor calculus.” It is probably adequate for a physicist to file this away as yet another curiosity not to be distracted by, but contemplation of the source of the terminology may be instructive. One obvious distinction among the operators in question is that gradients act on scalars whereas divergences and curls operate on vectors, but this is too formal to account satisfactorily for the difference of terminology. Recall from the earlier discussion of differential forms, in particulE Eqs. (2.3.3) and (2.3.4) that, for a linear function h = ax +by, the coefficients of dh are a and b. In this case selecting the coefficient a or b, an algebraic operation, and differentiating h with respect to x or y, a calculus operation, amount to the same thing. Even for nonlinear functions, the gradient operator can be regarded as extracting the coefficients of the linear terms in a Taylor expansion about the point under study. In this linear “tangent space” the coefficients in question are the components of a covariant vector, as has been discussed. What is calculus in the original space is algebra in the tangent space. Such conundrums are not unknown in “unambiguously physics” contexts. For example, both in Hamilton-Jacobi theory and in quantum mechanics there is a close connection between the x-component of a momentum vector and a partial-with-respect-to-xderivative. Yet one more notational variant will be mentioned before leaving this topic. There is a notational convention popular with mathematicians but not commonly used by physicists (though it shouId be since it is both clear and powerful.) We introduce it now, only in a highly specialized sense, intending to expand the discussion later. Consider a standard plot having x as abscissa and y as ordinate, with axes rectangular and having the same scales-in other words ordinary analytic geometry. A function h ( x , y ) can be expressed by equal h-value contours on such a plot. For describing arrows on this plot it is customary to introduce “unit vectors,” usually denoted by (i, j) or (f,9).Let us now introduce the recommended new notation:
(2.4.34)
ALGEBRAIC TENSORS
65
Being equal to i and j, these quantities are represented by boldface symbols? i is that arrow that points along the axis on which x varies and y does not, and if the tail ofi is at x = xo, its tip is at x = xo 1. The same italicized sentence serves just as well to define alax-the symbol in the denominator signifies the coordinate being varied (with the other coordinates held fixed). This same definition will also hold if the axes are skew, or if their scales are different, and even if the coordinate grid is curvilinear. (Discontinuous scales should not be allowed, however.) Note that, though the notation does not exhibit it, the basis vector @/ax also depends on the coordinates other than x because it points in the direction in which the other coordinates are constant. One still wonders why this notation for unit vectors deserves a partial derivative symbol. What is to be differentiated? The answer is h ( x , y) (or any other function of x and y). The result, a h l a x , yields the answer to the question of how much h changes when x varies by one unit with y held fixed. Though stretching or twisting the axes would change the appearance of equal-h contours, it would not affect these questions and answers, since they specify only dependence of the function h ( x , y) on its arguments and do not depend on how it is plotted. One might say that the notation has removed the geometry from the description. One consequence of this is that the application of vector operations such as divergence and curl will have to be rethought, since they make implicit assumptions about the geometry of the space in which arguments x and y are coordinates. But the gradient requires no further analysis. x). What is the From a one-form % and a v e c t z x one can form the scalar 6, quantity formed when one-form dh, defined in Eq. (2.3.4), operates on the vector a/ax just defined? By Eq. (2.3.1) and the defined meaning of a/& we have = 1. Combining this with Eqs. (2.3.4) and (2.3.5) yields
+
&(a/&)
(2.4.35) where the final term is the traditional notation for the x-component of the gradient of h. In this case the new notation can be thought of simply as a roundabout way of expressing the gradient. zome modem authors, Schutz for example, (confusingly in my opinion) s i p l y call dh “the gradient of h.” This raises another question: Should the symbol be dh, which we have been using, or should it be &? Already the symbol d has been used toindicaLe “exterior differentiation,” and a priori the independently defined quantities d h are dh are distinct. But we will show that they are in fact equal, so it is immaterial which notation is used. From these considerations one infers that for contravariant basis vectors ex = and ey = 8 / 8 y the corresponding covariant basis vectors are Z* = d7u and
u
a/&
25Whetheror not they are true vectors depends on whether or not i and j are defined to be true vectors. The answer to this question can be regarded as a matter of convention; if the axes are regarded as fixed once and for all then i and J are true vectors; if the axes are transformed, they are not.
66
Z2
GEOMETRY OF MECHANICS I: LINEAR
&I. Why is this so? For example, because & ( 8 / 8 x ) a ax’
el = -,
a
e2 = - , . . . , e n 8x2
= 1. To recapitulate: 8
=-
(2.4.36)
OX“
are the natural contravariant basis vectors, and the corresponding covariant basis vectors are -1
Z1 = d x ,
-Jz
-2
e =dx
,..., uen = d-xn .
(2.4.37)
The association of a/&’,B / B x 2 , .. . , B/Bx”, with vectors will be shown to be of far more than formal significance in Section 3.4.3. where vectors are associated with directional derivatives.
2.5. (POSSIBLY COMPLEX) CARTESIANVECTORS IN METRIC GEOMETRY 2.5.1. Euclidean Vectors
Now, for the first time, we hypothesize the presence of a “metric” (whose existence can, from a physicist’s point of view, be taken to be a “physical law,” for example the Pythagorean “law” or the Einstein-Minkowski “law”). We will use this metric to “associate” covariant and contravariant vectors. Such associations are being made constantly and without a second thought by physicists. Here we spell the process out explicitly. The current task can also be expressed as one of assigning covariant components to a true vector that is defined initially by its contravariant components. A point in three-dimensional Euclidean space can be located by a vector x = elx 1
+ e2x 2 + e3x3
E eixi,
(2.5.1)
where el, e2, and e3 form an orthonormal (defined below) triplet of basis vectors. Such a basis will be called “Euclidean.” The final form again employs the repeatedindex summation convention, even though the two factors have different tensor character in this case. In this expansion, the components have upper indices and are called “contravariant” though, as it happens, because the basis is Euclidean, the covariant components xi to be introduced shortly will have the same values. For skew bases (axes not necessarily othogonal and to be called “Cartesian”), the contravariant and covariant components will be distinct. Unless stated otherwise, xl, x 2 , and x3 are allowed to be complex numbers-we defer concerning ourselves with the geometric implications of this. We are restricting the discussion to n = 3 here only to avoid inessential abstraction; in The Theory of Spinors by Cartan, most of the results are derived for general n, using arguments like the ones to be used here. The reader may be beginning to sense a certain repetition of discussion of concepts already understood, such as covariant and contravariant vectors. This can best be defended by observing that, even though these concepts are essentially the same
(POSSIBLY COMPLEX) CARTESIAN VECTORS IN METRIC GEOMETRY
67
in different contexts, they can also differ in subtle ways, depending upon the implicit assumptions that accompany them. All vectors start at the origin in this discussion. According to the Pythagorean relation, the distance from the origin to the tip of the arrow can be expressed by a “fundamental form” or “scalar square” @(x)=x.x=
(x + (q2 + (x3)2.
(2.5.2)
Three distinct cases will be of special importance: The components xl, x2, and x3 are required to be real. In this case @(x), conventionally denoted also by 1xI2,is necessarily positive, and it is natural to divide any vector by 1x1 to convert it into a “unit vector.” This metric describes ordinary geometry in three dimensions, and constitutes the Pythagorean law referred to above. 0 The components x’, x2, and x3 are complex with fundamental form given by Eq. (2.5.2). Note that @(x) is not defined to be X’x 1 X 2 x 2 X3 x 3 and that it has the possibilities of being complex or of vanishing even though x does not. If it vanishes, the vector is said to be “isotropic.” If it does not, it can be normalized, converting it into a “unit vector.” 0 In the “pseudo-Euclidean” case the components xl, x2, and x3 are required to be real, but the fundamental form is given not by E!q. (2.5.2) but by 0
+
@(x) =
+ (x2)2
- (x3)2.
+
(2.5.3)
Since this has the possibility of vanishing, a vector can be “isotropic,” or “on the light cone” in this case also. For @ > 0, the vector is “space-like”; for @ < 0 it is “time-like.’’In these cases a “unit vector” can be defined as having fundamental form of magnitude 1. In this pseudo-Euclidean case, “ordinary” space-time requires n = 1 3. This metric could legitimately be called “Einstein’s metric,” but it is usually called “Minkowski’s.” In any case, its existence can be regarded as a physical law, not just a mathematical construct. To the extent possible these cases will be treated “in parallel,” in a unified fashion, with most theorems and proofs applicable in all cases. Special properties of one or the other of the cases will be interjected as required. The “scalar” or “invariant product” of vectors x and y is defined in terms of their Euclidean components by
+
x . y d y 1 + x 2 y 2 + x 3 y 3.
(2.5.4)
Though similar-looking expressions have appeared previously, this is the first one deserving of the name “dot product.” If x . y vanishes, x and y are said to be orthogonal. An “isotropic” vector is orthogonal to itself. The vectors orthogonal to a given vector span a plane. (In n-dimensional space this is called a “hyperplane” of n - 1
68
GEOMETRYOF MECHANICS I: LINEAR
dimensions.) In the pseudo-Euclidean case there is one minus sign in the definition of scalar product as in Eq. (2.5.3). The very existence of a metric permits the introduction of a “natural” association of a vector form ito a vector x such that, for arbitary vector y, i(y) = x y.
Problem 2.5.1: Show that definition (2.5.4) follows from definition (2.5.3) if one assumes “natural” algebraic properties for “lengths” in the evaluation of (x + hy) . (x h y ) , where x and y are two different vectors and h is an arbitrary scalar.
+
2.5.2. Skew Coordinate Frames The basis vectors ql,q2,and q3,in a skew, or “Cartesian,” frame are not orthonorma1 in general. They must however be “independent”; geometrically this requires that they not lie in a single plane; algebraically it requires that no vanishing linear combination can be formed from them. As a result, a general vector x can be expanded in terms of vl,q2,and q3.
(2.5.5)
x = q j x i,
and its scalar square is then given by . .
@(x) = qi . q
j x l x ~
(2.5.6)
gijxixj.
Here “metric coefficients,” and the matrix G they form, have been defined by
gij
= g/”l --~
i * 7 1 j ,
G=
771 ‘ 7 7 1
rll .772
772.111
~ 2 , 7 7 2 772.113
773 .77l
r13 ’772
(
771.773
773 - 7 3
)
.
(2.5.7)
As in Section 2.4, the coefficients x i are known as “contravariant components” of x. When expressed in terms of them, the formula for length is more complicated than the Pythagorean formula because the basis vectors are skew. Nevertheless it has been straightforward, starting from a Euclidean basis, to find the components of the metric tensor. It is less straightforward, and not even necessarily possible in general, given a metric tensor, to find a basis in which the length formula is Pythagorean. *2.5.3. Reduction of a Quadratic Form to a Sum or Difference of Squares26 For describing scalar products, defined in the first place in terms of orthonormal axes, but now using skew coordinates, a quadratic form has been introduced. Con26Tbe material in this and the next section is reasonably standard in courses in algebra. It is nevertheless spelled out here in some detail since, like some of the other material in this chapter. analogous procedures will be used when “symplectic geometry” is discussed.
69
(POSSIBLY COMPLEX) CARTESIAN VECTORS IN METRIC GEOMETRY
versely, given an arbitrary quadratic form @ = giju'u', can we find a coordinate transformation xi = q j u j to variables for which @ takes the form of Eq. (2.5~1)?~ In general, the components can be complex. If the components are required to be real then the coefficients ai, will also be required to be real; otherwise they also can be complex. The reader has no doubt been subjected to such an analysis before, though perhaps not with complex variables allowed.
THEOREM 2.5.1: Every quadratic form can be reduced to a sum of (positive or negative) squares by a linear transformation of the variables. Proof: Suppose one of the diagonal elements, say g l 1 , is non-zero. With a view toward eliminating all terms linear in ul, define (2.5.8) which no longer contains u ' . Hence, defining YI
(2.5.9)
gliu',
the fundamental form can be written as @
1 = -yy: g11
+ aq;
(2.5.10)
the second term has one fewer variable than previously. If all diagonal elements vanish, one of the off-diagonal elements, say does not. In this case define 2
@2
gijuiui - -(g21u1
g23u3)(gl2u2
g12
g12,
+ g13u3),
(2.5.11)
4-g13u3,
(2.5.12)
which contains neither u 1 nor u 2 . Defining Y1
+ Y 2 = g21u1 4- g23u3,
YI - Y 2 =
we obtain @
= g12 ( Y : - Y:)
+ @2,
(2.5.13)
again reducing the dimensionality. 27F0rpurposes of this proof, which is entirely algebraic, we ignore the traditional connection between upperflower index location, and contravariantkovariantnature. Hence the components xi given by x; = a i j d are nor to be regarded as covariant, or contravarianteither, for that matter.
70
GEOMETRY OF MECHANICSI: LINEAR
The form can be reduced to a sum of squares step-by-step in this way. In the real domain, no complex coefficients are introduced, but some of the coefficients may be negative. In all cases, normalizations can be chosen to make all coefficients be 1 or -1.
Problem 2.5.2: Sylvester’s Law of Inertia. The preceding substitutions are not unique but, in the domain of reals, the relative number of negative and positive coefficients in the final form is unique. Prove this, for example by showing a contradiction resulting from assuming a relation @ = y 2 - z l2- z ; = u l +2u 2 - w 2 ,
2
(2.5.14)
between variables y, 21, and 22 with two negative signs on the one hand, and u, u2, and wl with only one negative sign on the other. In “nondegenerate” (that is, det lgj, I # 0) ordinary geometry of real numbers, the number of positive square terms is necessarily 3. 2.5.4. Introductionof Covariant Components The contravariant components x 1 are seen in Eq. (2.5.5) to be the coefficients in the expansion of x in terms of the qi. In ordinary vector analysis one is accustomed to identifying each of these coefficients as the “component of x” along a particular coordinate axis and being able to evaluate it as 1x1 multiplied by the cosine of the corresponding angle. Here we define lowered-index components x i , to be called “covariant,” (terminology to be justified later) as the “invariant products” of x with the ‘li: xi = x . qi = g i k X k ,
or as a matrix equation x = ( G XT) N
where ? stands for the array Eq. (2.5.4) can be written
(XI,
. . . ,x n ) .
x T GT ,
(2.5.15)
Now the scalar product defined in
~ . y = ~ i =y y’ j ~ ’ .
(2.5.16)
By inverting Eq. (2.5.15) contravariant components can be obtained from covariant ones
x T =%(GT)-’, or as components, x i = g i k X k
where g i k = (G-’)jk. (2.5.17) For orthonormal bases, G = 1 and, as mentioned previously, covariant and contravariant components are identical. Introduction of covariant components can be regarded as a simple algebraic convenience with no geometric significance. However, if the angle 0 between vectors x and y is defined by cost? =
X.Y
vwm5’
(2.5.18)
(POSSIBLY COMPLEX) CARTESIANVECTORS IN METRIC GEOMETRY
71
x:(x’= 2, x*= 1)
92
+
FIGURE 2.5.1. The true vector 2111 7)2 expressed in terms of contravariant and, using Eq. (2.5.19), its covariant components related to direction cosines. For Euclidean geometry is normally symbolized by 1x1.
then a general vector x is related to the basis vectors q l , q2,and q3 by direction cosines cos 01, cos 02, cos 03, and its covariant components are xi = X . qi =
Jm
COSO~.
(2.5.19)
This definition is illustrated in Fig. 2.5.1, The angles (or rather their cosines) introduced in these definitions would be just redundant symbols except for the fact that all of trigometry is imported into the formalism in this way.
2.5.5. The Reciprocal Basis Even in Euclidean geometry there are situations in which skew axes yield simplified descriptions, which makes the introduction of covariant components especially useful. The most important example is in the description of a crystal for which displacements by integer multiples of a right-handed triad of “unit cell” vectors ql, q2, and q3leave the lattice invariant. Let these unit cell vectors form the basis of a skew frame as in Section 2.5.2. For any vector x in the original space we can associate a particular form E in the dual space by the following rule, giving its value %(y)when acting on general vector y: 6,y) = x . y. In particular, “reciprocal basis vectors” and basis forms ?f are defined to satisfy @ q .J ) =qi . qJ . =A‘.,J
(2.5.20)
and $, ;i”,and G3 are the basis dual to ql, q2,and q3 as in Eq. (2.4.4). The vectors qi in this equation need to be determined to satisfy the final equality. This can be accomplished mentally;
(2.5.21) where the orientation of the basis vectors is assumed to be such that f i is real and nonzero. (From vector analysis one recognizes to be the volume of the unit
72
GEOMETRYOF MECHANICSI: LINEAR
cell.) One can confirm that Eqs. (2.5.20) are then satisfied. The vectors ql, 772, and q3 are said to form the “reciprocal basis.” The “reciprocal lattice” consists of all superpositionsof these vectors with integer coefficients.
Problem 2.5.3: In terms of skew basis vectors q l , q2,and q3 in three-dimensional Euclidean space, a vector x = q i x i has covariant components xi = gijx’. Show that x=x1q
1
+ x 2 q 2 + x 3 q3.
where the q‘ are given by Eq.(2.5.21). By inspection one sees that reciprocal base vector q’ is normal to the plane containing q3and q2.This is illustrated in Fig. 2.5.2, which shows the unit cell vectors superimposed on a crystal lattice. (q3points normally out of the paper.) Similarly, 772 is normal to the plane containing q3and ql. Consider the plane passing through the origin and containing both q3 and the vector q1 N q 2 , where N is an integer. Since there is an atom situated at the tip of this vector, this plane contains this atom as well as the atom at the origin and the atom at 2(ql N q 2 ) , and so on. For the case N = 1, these atoms are joined by a line in the figure and several other lines, all parallel and passing through other atoms
+
+
-=I a = 0.69
b = 0.57 sin4 = 0.87 = 1.67
= 2.02
FIGURE 2.5.2. The crystal lattice shown has unit cell vectors 91 and 92 as shown, as well as 9 3 pointing normally out of the paper. Reciprocal basis vectors qi and $ are shown. The particular lattice planes indicated by parallel lines correspond to the reciprocal lattice vector 4 - q’ . It is coincidental that $ appears to lie in a crystal plane.@ , = absin4.
(POSSIBLY COMPLEX) CARTESIAN VECTORS IN METRIC GEOMETRY
73
that are shown as well. The vector
is perpendicular to this set of planes. (Again for N = 1) the figure confirms that q' - $ is normal to the crystal planes shown.
Problem 2.5.4: Show that for any two atoms in the crystal, the plane containing them and the origin is normal to a vector expressible as a superposition of reciprocal basis vectors with integer coefficients, and that any superposition of reciprocal basis vectors with integer coefficients is normal to a set of planes containing atoms. [Hint:For practice at the sort of calculation that is useful, evaluate (ql q2). (ql q2).I
+
+
It was only because the dot product is meaningful that Eq. (2.5.20) results in the But once that identification is association of an ordinary vector q' with the form made, all computations can be made using straightforwardvector analysis. A general vector x can be expanded either in terms of the original or the reciprocal basis
q.
x = qix' = x i qi ,
(2.5.22)
(The components xi can be thought of either as covariant components of x or as components of 2 such that %(y) = xiy'.) In conjunction with Eqs. (2.5.17) and (2.5.20) this yields
(gik)=G-l= q3
.q1
773
.$
q3
.$
(2.5.23)
Problem 2.5.5: Confirm the Lugrunge identity of vector analysis
(A x B) . (C x D) = det
:: ; 1 ;:,"I.
(2.5.24)
This is most simply done by expressing the cross products with the three, index anti-symmetric symbol E i j k . With the vectors A, B, C, and D drawn from ql,q2,and q3,each of these determinants can be identified as a cofactor in Q. (2.5.7). From this show that g = det IGI.
Problem 2.5.6: Show that original basis vectors qi are themselves reciprocal to the reciprocal basis vectors qi.
74
GEOMETRY OF MECHANICS I: LINEAR
2.5.6. Wavefronts, Lattice Planes, and Bragg Reflection In this section, to illustrate the physical distinction between contravariant and covariant vectors, we digress into the subjects of wave propagation, wave-particle duality, coherent interaction, energy and momentum conservatibn, and, ultimately, the condition for Bragg scattering of X-rays from crystals. The main purpose of this section is to provide examples of covariant vectors. Though these calculations may appear, superficially, to have little to do with mechanics, in the end a surprising connection with mechanics, especially quantum mechanics, will emerge. A two-dimensional slice through a crystal lattice is shown in Fig. 2.5.2. The unit cell vectors q1 and q2 lie in the plane shown. Their lengths are a and b. The remaining basis vector q3 is normal to both of them and has length c. Applying Eqs. (2.5.21), f i = abc sin 4, q1has length l/(a sin#) and is normal to q2,and # has length l/(b sin 4) and is normal to ql. In preparation for describing the interaction of X-rays with a lattice we review some properties of waves. We use standard Euclidean geometry. The wave vector k of a monofrequency plane wave points in the direction of propagation of the wave and has magnitude 2n/A where k is the wavelength. A “wavefront” is, on the one hand, a plane orthogonal to k and, on the other hand, a plane on which the “phase” # is constant. Analytically, # appears in the description of the wave by a wave function *(x, t ) q,(x, t )
-
ei@(x)e-iwr
, where
4(x) = k . x = k ; x ’ .
(2.5.25)
Note that the final expression, trivially valid for Euclidean axes, is also valid for skew axes; in this case it is seen to be economical to describe the wave vector by its covariant components, k; . To emphasize this point we can associate with k a form such that N
4(x) = k W .
(2.5.26)
This is entirely equivalent to the second part of (2.5.25). We wish now to point out the natural connection between pairs of adjacent wavefronts and the pairs of planes of a covariant vector. Though previously the spacing between pairs of planes representing a covariant vector have been taken to be “unity,” it is more natural, in a physics context, for the spacing to have the dimensions of length, and the natural choice is the wavelength I or perhaps A/(2n). Even though waves are being described, it is customary from geometric optics to define “rays” as being normal to wavefronts. One can keep track of the value of 4 by “stepping off’ wavelengths along a ray and multiplying the number of wavelengths by 2n. But it is more in the intended spirit of the present discussion to observe that the advance of 4 (in radians) along any path is obtained simply by counting the number of wavefronts crossed. This eliminates the need for introducing rays at all, since it works for any path. Fig. 2.5.3illustrates Bragg scattering from a particular set of parallel lattice planes in a two-dimensional lattice with unit cell vectors 77, and q2.Though shown as single
(POSSIBLY COMPLEX) CARTESIANVECTORS IN METRIC GEOMETRY
75
FIGURE 2.5.3. Bragg scattering from a crystal. The candidate scattering planes are indicated by straight lines with spacing d . It is only coincidentalthat the incident beam is directed more or less . construction hinted at by the dashed wavefronts normal to the lattice plane containing ~ 1The of incident an scattered wave demonstrates that coherence from all scattering sites in the same plane requires the (candidate) angle of reflection to be equal to the angle of incidence 0.
arrows, the incident X-rays actually form a wave-packet or beam, extended both longitudinally and transversely. We must assume that each wave packet is coherent (i.e., plane-wave-like) over a volume containing many atoms. Then the essence of the present calculation is to seek a situation in which there is constructive interference of the scattering amplitudes from numerous, say N, scattering sites so that the scattering amplitude, proportional to @, dominates all other scattering processes by a relative factor of order N . Offhand, one might imagine that N could be of the order of Avogadro’s number, which would yield a truly astronomical enhancement factor. Even apart from the fact that such a large scattering probability might imply more scattered intensity than incident intensity, there are many effects that reduce the effective coherence volume, thereby restricting the value of N to smaller numbers. (Some such effects are microstructure or “mosaic spread” due to imperfect crystals, thermal effects, and limited beam coherence.) For our purposes we will only assume that a situation yielding coherence from numerous sites dominates the scattering process. The result in this case is called “Bragg scattering.” The construction of Fig. 2.5.3 demonstrates that coherence from all scattering sites in a single plane requires the (candidate) angle of Bragg reflection to be equal to the angle of incidence 8.In this case the incident beam phase lag for the scattering on the right relative to the scattering on the left is exactly compensated for by its phase advance after scattering. This argument relies on the assumed equality of scattered and incident wavelengths. With the rest energy of a single scattering site being large compared to the energy of a single photon, this would tend to be approximately true even for scattering from a single atom, but in Bragg scattering the crystal recoils as a whole (or as nf sites anyway), validating the assumption by a further large factor N.
76
GEOMETRY OF MECHANICS I: LINEAR
For simplicity the following discussion is restricted to the particular configuration of planes shown in Fig. 2.5.3; the origin and ql q2 lie in the same plane, and the normal to the scattering planes (emphasized by parallel lines) is given by 17’ - $. (At the cost of introducing a few more integer coefficients, the final results could be proved in the far greater generality for which they are valid.) Denoting the incident and scattered wave vectors as p and q. the corresponding forms are and Define also ? = 5 - i. Motivated by earlier discussion, these quantities are defined as forms (with overhead tildes). (This comment is primarily pedantic in any case since the presence of a tilde, say on 5. distinguishes it from a corresponding vector p only by the feature that a dot product such as p . x can be expressed as F(x).) The condition ensuring coherence of the atom at the origin and the next atom on the same plane (and hence all atoms in the plane) is
+
G.
This condition fixes the direction, but not the length of r. For general scattering planes, this condition would beF(rnlql + r n 2 q 2 ) = 0 where rnl and rn2 are integersthe natural argument of ^i: is a vector from the origin to any lattice site. The enhancement factor h/z coming from the many coherent sites in any one plane may’be appreciable, but it will still.lead to negligible scattering unless there is also coherence from different planes. One way of expressing this extra coherence condition is that the phase lag caused by the extra path in scattering from the next plane must be equal to 2 n (for simplicity we skip the further possibility of this being an integral multiple of 2n),or h = 2dsin
(5 -
0) ,
(2.5.28)
which is traditionally known as “Bragg’s law.” This is illustrated in Fig. 2.5.4. This condition can also be expressed by an equation more nearly resembling Eq. (2.5.27). Starting from the origin, the vector q1 leads to an atom on the preceding scattering plane, and the vector q2leads to the one on the next plane. Recalling Eq.(2.5.26),and applying the condition for constructive interference from these
extrapath
-
dcos0
incident
:,
. e e ,
I
1 reflected
FIGURE 2.5.4. Geometric construction leading to Bragg’s law.
(POSSIBLY COMPLEX) CARTESIAN VECTORS IN METRIC GEOMETRY
77
atoms as well, leads to the equations u
r(ql) = -2n,
and F(v2)= 27r.
(2.5.29)
These are (a rather specialized version of) the so-called “Laue equations.’’ We now use these equations in order to write r as a superposition of reciprocal basis vectors of the form r = q 1 + & 2.
(2.5.30)
From Eq. (2.5.20) one has rJ . q j = Jij, which, combined with Eqs. (2.5.29) yields = -,9 = -2n, and hence
(Y
r
-=
2n
q2 - q1.
(2.5.31)
Referring to Fig. 2.5.2, this shows that the Bragg condition can be expressed by the statement that (except for factor 2n) the difference of incident and scattered wave vectors is equal to the reciprocal lattice vector describing the planes from which the scattering occurs. This form of the condition for coherence from successive planes is illustrated graphically in Fig. 2.5.5. The incident beam is represented by wavefronts that correspond to the incident covariant wave vector E, and the scattered beam by (shown as -6 in the figure). As drawn, equality of angles of incidence and reflection is satisfied, but the spacing of planes is not quite right for coherence. For conve-
FIGURE 2.5.5. Bragg scattering from a crystal. Dashed lines are wavefronts of incident wave covariant vector and (negative) reflected wave covariant vector -i. Dotted lines represent covariant vector - q. As drawn the Bragg condition is almost, but not quite, satisfied.
7 7
78
GEOMETRY OF MECHANICS I: LINEAR
nience in correlating with Eq. (2.5.31), let us suppose the contour lines of p and q in Fig. 2.5.5 are spaced by 2n rather than by unity. Then, to make the dotted lines match the solid lines exactly, it would be necessary to decrease the incident and scattered line spacings slightly; this would require a slight increase in the light frequency or a slight decrease in the angle of incidence (relative to the normal). Elegant as it is, Eq.(2.5.31) becomes yet more marvelous when conservation of momentum is invoked along with the deBroglie relation between momentum and wavelength. By introducing Planck’s constant we can state that if the momentum of an incident photon is given by bnc= hfi and of a reflected photon by fiEf = h i , then conservation of momentum yields for the recoil momentum of the lattice as a whole, AClat = hF. (Apologies for the dimensionally inconsistent notation.) Then Eq. (2.5.31) becomes (2.5.32) (Naturally, for scattering from another set of planes, the right hand side of this equation contains the appropriately different reciprocal lattice vector.) One of the justifications for calling this equation “marvelous” is that both sides of the equation refer to the lattice, leaving no reference to the incident and scattered photon beams. This suggests, and the suggestion is confirmed experimentally, that other elementary processes, such as bremstrahlung and electron-positron pair production, will acquire coherent enhancement when the crystal’s recoil momentum matches one of its reciprocal lattice vectors as in Eq.(2.5.32). Some of these considerations may well have been behind deBroglie’s original conjecture relating momentum and inverse wavelength. For our purposes it is perhaps more appropriate to regard these results as telling us that the natural interpretation of momentum is as a covariant vector. This statement is especially evident when the condition is written out in covariant components. Then Eq. (2.5.32) becomes
(This is dimensionally consistent because the reciprocal basis vectors have the dimensions of inverse length.) The generalization of this equation to arbitrary crystal planes is that the lattice recoil momentum (divided by h ) has to be a superposition of reciprocal basis vectors with integer coefficients. The following comments, though not germane to classical mechanics, may be of interest nonetheless. We have here stressed a connection of Planck‘s constant h with momentum that can be investigated experimentally using Bragg scattering from a crystal. There is an independent line of investigation, for example using the photoelectric effect, in which h is related to the energy of a photon. Comparison of the values of h obtained from these entirely independent types of experiment provides a serious test of the overall consistency of the quantum theory. By this point the reader should be convinced that the natural geometric representation of a covariant vector is as contour lines (in spite of any previously held prejudice to the contrary based on thinking of a gradient vector as an arrow pointing in the direction of maximum rate of ascent). It may take longer to become convinced
79
(POSSIBLY COMPLEX) CARTESIAN VECTORS IN METRIC GEOMETRY
that in the context of smooth manifolds (such as the set of configurations of a system described by generalized coordinates in Lagrangian mechanics) this is the onfy possible interpretation of a covariant vector. It is because, in that case, there is no such thing as a direction of maximum rate of ascent. Numerous important topics concerning the use of skew frames (i.e., frames linearly related to Euclidean frames) remain to be covered. They are characterized by the property that components of the metric tensor are constant, independent of position. We will be returning to these topics starting in Section 4.1.1. The reader might prefer to proceed there directly, skipping the discussion of curvilinear (nonlinear) reference frames in the next chapter. As mentioned before, even in nonlinear situations, a limited region can always be defined in which linear analysis is approximately valid.
Problem 2.5.7: With patterns photocopied onto transparency material from Fig. 2.5.6, two sets of parallel lines, representing incident and reflected wavefronts, are
0
0
0 0 0
0 0
0 0 0
0
0 0
0
0 0
0
0
0 0 0
0
0 0
0 0
0
0 0
0
0
0
0
0
0 0
0 0
0
0
0
0 0
0 0
0 0
0
0 0
0
0 0 0
FIGURE 2.5.6. Patterns for investigating covariant vectors and the Bragg reflection condition. This page is to be copied (twice) onto transparency material in order to perform Problem 2.5.7. The purpose of the two patterns on the left is to give a preview of the sort of pattern being looked for. The virtue of a pattern of circles is that all possible orientations of parallel (if only tangent) lines are represented. The problem uses only the patterns on the right.
80
GEOMETRY OF MECHANICS I: LINEAR
to be laid over the “lattice” of tiny circles in a configuration for which the Bragg condition is satisfied. Cell atoms are located at (O,O), (3,0), (1,3), (4,3), etc. and, in the same units, the spacing between lines is 21/22. There is not a unique solution. If managing three arrays proves to be too frustrating, a fairly good solution can be obtained using the planes in the lower left comer of the figure.
Problem 2.5.8: For Bragg scattering from the plane, mentioned below Eq. (2.5.27), containing the origin and the point m 1ql + m2q2, find the recoil momentum r corresponding to Bragg scattering with the longest wavelength possible and obtain a formula for that wavelength. *2.6. UNITARY GEOMETRY2*
Yet another kind of geometry, namely unitary geometry, can be introduced. This is the “geometry” of quantum mechanics. In physics one prefers to avoid using complex numbers because physical quantities are always real. But complex numbers inevitably result when the equations of the theory are solved. In quantum mechanics one accepts this inevitability and allows complex numbers from the start. However, since the eigenvalues of physical operators are interpreted as the values of physical measurements, it is necessary to limit the operators to those that have real eigenvalues, namely Hermitean operators. This can be regarded as one of the main principles of quantum mechanics. This is fortunate because the resulting simplification is great. In classical mechanics complex quantities also necessarily intrude, and so also do eigenvalue problems, and with them complex eigenvalues. But these eigenvalues do not have to be real, and there is no justification for restricting operators to those that are Hermitean. This is unfortunate, because the resultant formalism is “heavy.” As a result, this mathematics has not been much used in classical mechanics and cannot be said to be essential to the subject. Nevertheless the unitary formalism is consistent with the ideas that have been encountered in this chapter. It may also be interesting to see quantum methods applied to classical problems. However, since our operators will in general not be Hermitean, results cannot be carried from one field to the other as easily as we might wish. The main application of the theory in this section will be to analyze the solution of singular linear equations. This is more nearly algebra than geometry, but the results can be interpreted geometrically. They will be used while discussing linearized Hamiltonians and will be essential for the development of perturbation theory in the last chapter of the text. Other than this, the material will not be used. Because complex eigenvalues and eigenvectors are unavoidable in analyzing linearized Hamiltonian equations, it provides major simplification to introduce “ad**This section and the rest of the chapter can be skipped until needed. The results are used only in discussing Hamiltonian eigenvectors starting in Section 15.3.4 and in developing near-symplectic perturbation theory in Chapter 16. On the other hand, its connection with quantum mechanical formalism may be of interest; it also serves as a fairly elementary generalization of ordinary Euclidean geometry and illustrates the association of a form with a vector.
UNITARY GEOMETRY
81
joint” equations and solutions. It is for similar reasons that Dirac’s “bras” and “kets” are used in quantum mechanics. For any two (possibly complex) vectors w and z, a “Hermitean scalar product” is defined by the sum
(w, z) = wi*zi;
(2.6.1)
for the rest of this section two vectors placed in parentheses like this will have this meaning. One can say therefore that the elements w’* are components of a covariant vector or form in a space dual to the space containing z. In this way an association is established between the original space and the dual space-the association is implemented by complex conjugation. This definition has the feature that . .
(z, z) = zi*z‘
= 1212
(2.6.2)
is necessarily real and positive (unless z = 0). This is the “metric” of unitary geometry. The vanishing of this product can be expressed “w is orthogonal to z.” More explicitly this means that the form associated with w vanishes when evaluated on vector z. The adjoint matrix At is obtained from A by the combined operation of transposition and complex conjugation (in either order);
At = A*T.
(2.6.3)
(In the case that A is real, which will normally be true in our applications, the adjoint is simply the transpose.) The motivation for this definition is provided by the following equation:
(w. Az) = (A’w, z).
(2.6.4)
Definition (2.6.3) is applicable to vectors as well; that is, zt is the row vector whose elements are the complex conjugates of the elements of column vector z. Hence Eq. (2.6.1) can also be written as
(w, z) = wtz.
(2.6.5)
Much of mechanics can be reduced to the solution of the equation
dz - = AZ dt
(2.6.6)
where z is a column vector of unknowns and A a matrix whose elements characterize the (linearized) dependence of “velocities” dz/dt on position z. In principle all these quantities (except t) could be complex, but it will not hurt to think of them all as real. The equation that is said to be adjoint to (2.6.6) is (2.6.7)
82
GEOMETRY OF MECHANICSI: LINEAR
This is not the same as “taking the adjoint” of Eq.(2.6.6), which yields
dzt - = ztAt dt
(2.6.8)
An important feature making the adjoint solution useful can be seen in the following.
Let x(t) be a solution of Eq. (2.6.7) and y(t) a solution of Eq. (2.6.6). We evaluate
d -(x, y) = (-Atx, y) dt
+ (x, Ay) = 0.
(2.6.9)
That is, the scalar product of any solution of the direct equation with any solution of the adjoint equation is a constant. Perhaps the solutions of Eq. (2.6.6) that are easiest to visualize are those whose initial values are zl(0) = (1,0,0, .. . ,O ) T , z2(0) = (0, 1,0, . . ., O ) T , and so on. One can form the “transfer matrix” M(t) of Eq. (2.6.6) by assembling these solutions as the columns
M(t) = (zl(t)
~ 2 ( t ) .. . ) .
(2.6.10)
Note that M(0) = 1, the identity matrix. A transfer matrix Mad’(?)of the adjoint equation can be defined similarly. The equations
d -Madj = -AtMadj dt
and
d -M=AM dt
(2.6.1 1)
can be used to show that the transfer matrices are related by
Mad’([) = (Mt)-’(t).
(2.6.12)
This relation is true at t = 0 since both sides are equal to 1 and it continues to be true for all t since, using Eqs. (2.6.1l), (2.6.13) In discussing perturbation theory we will frequently need the transfer matrix of the adjoint equation; we will always express Mad’ as it appears as on the right-hand side of Eq. (2.6.12) to reduce by one the number of confusingly related functions. The solutions of Eqs. (2.6.6) and (2.6.7) (2.6.14) are not particularly simply related. The best one can do is Eq. (2.6.9) and Eq. (2.6.12), along with the following converse statement. If a function $(x, t) = (z(t), x(t)) (obtained from test function z(t) and arbitrary solution x(t) of the first equation) is a constant of the motion of the first of these equations, then z(t)) satisfies the
UNITARY GEOMETRY
second equation. To confirm this, evaluate the rate of change of by hypothesis); +(z,Ax)= dt
83
+ (which vanishes
(Z
-+A’z,x
. )(2.6.15)
Assuming an arbitrary function can be formed by superposition of solutions x ( t ) , the desired result follows. In solving Eq.(2.6.6), the eigenvectorsof A play an important role. We can exploit Dirac-like notation to provide a “box” to contain the eigenvalue of a vector w = (w’,w 2 , . . . , w2n)Tthat is known to be an eigenvector of a matrix (operator) A with eigenvalue A. This provides a convenient labeling mechanism. For simplicity we assume h is nondegenerate. Symbolizing the eigenvector by w = /A), it therefore satisfies
Alh) = All-).
(2.6.16)
w2*,. . . , w ~ ~ *If )A. has p as an eigenvalue, then At Its adjoint is Ih)t = (wl*, has p* as eigenvalue. Define (pit to be the eigenvector of At with eigenvalue p*; it therefore satisfies At(Plt = CL*(Pl+.
(2.6.17)
The adjoint of this relation is (PIA = CL(II.1.
(2.6.18)
(11.1 can therefore also be said to be the “left eigenvector” of A with eigenvalue p. The scalar product of row vector (ul = ( u l , u2, . . . , ~ 2 n with ) Ih) = w is defined by
(2.6.19) which is the same as straight matrix multiplication. In general (A( and Ih), being left and right eigenvalues of the same matrix with the same eigenvalue, are not simply related, and neither are (A*l and /A). (Confirm this for yourself on a simple matrix like
(k i).)
But in the special case that A is
“self-adjoint” or “Hermitean,”
At = A ,
(2.6.20)
the situation is simpler. In quantum mechanics it is taken as a matter of principle that operators representing observable quantities satisfy (2.6.20), since this condition ensures that the eigenvalues are real. (You should reconstruct the proof that this is true.) We will not be able to make this assumption universally, but if A is Hermitean, Eq. (2.6.18) becomes
A ( A t = P(Plt, or (PIt = IF.).
(2.6.21)
84
GEOMETRY OF MECHANlCS [: LINEAR
Under the same circumstance (At = A), we have
(A I A) =
c
wi*wi = (w,w) E Iwl2 .
(2.6.22)
2.6.1. Solution of Singular Sets of Linear Equations Adjoint matrices can be usefully applied to the problem of solving linear algebraic equations of the form AX = b,
(2.6.23)
where A is a square, 2n x 2n matrix and b is a 2n-element column vector. (The number of equations being even has no significance, other than that this is always the case in Hamiltonian mechanics.) All elements are (possibly complex) constants. In the trivial case that det IAl # 0, the solution is
x = A-’b,
(2.6.24)
but the situation is more complicated if det IAI = 0, since Eq. (2.6.23) may or may not be solvable. In ordinary geometry Eq.(2.6.23) could be the formulation of a problem “find vector x whose dot products with a set of vectors a(1),a(2),.,.. are equal to the set of values b .= ( b l ,bZ, . ..).7729 The matrix A would have a{1) as its upper row of elements, a:2, as its second row, and so on. The determinant of A vanishes if any combination of the a’s are mutually dependent, say for simplicity a(1)= a(2). In this case, EQ. (2.6.23) would be solvable, if at all, only if 61 = b2. In general, dependency among the rows of A has to be reflected in corresponding dependency among the elements of b. If det IAl = 0, there is a nonvanishing vector z satisfying the equation ATz = 0. (For the simple example mentioned in the previous paragraph this would be z = (1, - 1, 0, 0, . . .).) A necessary condition that has to be satisfied by b for the equations to be solvable can be obtained from the matrix product zTb = Z’AX = ( A = z ) ~ x= 0. One would therefore conclude that Eq. (2.6.23) could be solvable only if b is orthogonal to z. For the simple example this implies bl = b2, as already stated. The initial statement of the problem just discussed made it appear that the set of equations under discussion was specific to metric geometry, but this is misleading. With no metric defined, no product of vector a(i)with vector x can be defined. But the rows of the original matrix A can always be regarded as the components a(i)jof one-forms &. Then the individual equations are a(i)jx’ = bi. It is then no longer appropriate to regard b as a vector. These reinterpretations of the meaning of quantities entering Eq. (2.6.23) have no effect on the necessary condition for its solvability, 29AU entries in A and b would be real in ordinary geometry,and solutions for which entries in x are complex would be declared unacceptable. Such solutions would however be judged acceptable from the point of view of this section.
UNITARY GEOMETRY
85
since it was derived algebraically. The solvability condition is (6,z) = 0, which involves only the invariant product defined for any linear vector space. In the remainder of this section, conditions like this one for the solvability of Eq.(2.6.23) will be based on the Hermitean product (2.6.1). In this case it is a priori natural to think of b and the rows of A as forms. Though the overhead tildes will not be used, the temptation to think of b and x as belonging to the same space should be resisted. Since things become seriously messy if 0 is a multiple root of A, we will initially assume this to not be the case. Let y and z then be the solutions (unique up to multiplicativefactors) of the “homogeneous” and “adjoint homogeneous” equations:
Ay=O
and Atz=O.
(2.6.25)
Conjecturing that x satisfies Eq. (2.6.23), we form the scalar (2,
b) = (2, AX) = (Atz, X) = 0.
(2.6.26)
Certainly then, Eq. (2.6.23) can be solved only if b is “orthogonal” to z in the sense that (2, b) = 0. Conversely, suppose (2,
b) = 0.
(2.6.27)
The vector z can be chosen as a basis vector in the dual space, say the first, and 2n - 1 more basis vectors can be chosen, arbitrary except for being independent. We assemble these column vectors into a 2n x 2n matrix Z. In a similar way we form a square matrix Y whose first column is y (defined above) and whose other columns are arbitrary but independent. Next form the matrix
ZtAY=(. 0 A0o ) ,
(2.6.28)
which has been partitioned into a 1 x 1 element in the upper left, a (2n - I ) x (2n - 1) matrix A0 in the lower right, and so on. The elements shown to vanish do so because of the conditions previously placed on z and y. Also, because of the assumed absence of multiple roots, the matrix A0 has to be nonsingular. We return to the original Eq.(2.6.23), changing variables according to
x’ = Y-’x,
and defining b’ = Ztb.
(2.6.29)
The equations become
(o0
Ao)x’ 0 = b’ G ((:b)).
(2.6.30)
The uppermost element of b’ is (z, b). If this vanishes, then Eq. (2.6.30) is solvable; otherwise it is not. Supposing that (z, b) = 0, we can find a “particular solution” of
86
GEOMETRY OF MECHANICS I: LINEAR
Eq. (2.6.23), (0, x:)~, whose uppermost element is 0 and whose remaining elements satisfy Am = b or x g = A g l b N .
(2.6.31)
The matrix inversion is necessarily possible. From here on, the vector xo will be augmented by another uppermost component with value 0. We have found therefore that the condition (z, b) = 0, with z satisfying the first of Eqs. (2.6.25), is necessary and sufficient for the solvability of the original equation in the event that det IAJ= 0. Assured of the existence of a solution xo, we ask whether it is unique. Obviously not, since xo can be augmented by any multiple of y and still remain a solution. Since one prefers to have unique solutions, it is worth establishing a disciplined procedure for placing a minimal further requirement or requirements that pick out, from the set of solutions, the particular one that satisfies the extra requirements. In this spirit we insist that x also satisfy
(y, x) = 0,
which is to say ytx = 0.
(2.6.32)
Consider then the equation (A - ZY+)X = b,
(2.6.33)
which is the same as the original equation except for the second term. (Note that the second term (in parentheses) is a column vector multiplying a row vector, which yields a square matrix of the same size as A.) But our condition (2.6.32) makes the second term vanish. The unique solution to our augmented problem is therefore given by x = (A - zyt)-'b,
(2.6.34)
provided this matrix is not itself singular. Since the properties of z have not been used, the factor z could be replaced by (almost) any other, and it is unthinkable that the matrix would always be singular. But for the particular choice of z (the solution to the homogeneous adjoint equation), the matrix in Eq. (2.6.33) can be shown to be nonsingular. For the proof, it is enough to show that the only solution of the equation (A - zyt)yo is yo = 0. This is left as a not particularly easy exercise. Commonly the zero root of A is multiple, say double, in which case the homogeneous equation has two independent solutions, say y1 and y2. Since any linear combination of these is also a solution, these are not uniquely determined but, other than being independent, they can be chosen arbitrarily. Similarly the adjoint homogeneous equation has two independent solutions z1 and 2 2 . All of the above arguments can be generalized to cover this situation. Eq.(2.6.27) is generalized by accumulating the adjoint homogeneous solutionsjust mentioned as the columns of a matrix
V=(z1
22).
(2.6.35)
UNITARY GEOMETRY
87
The necessary and sufficient condition for the solvability of Eq. (2.6.1) is then
Vtb = 0.
(2.6.36)
In words, b must be orthogonal to all solutions of the adjoint homogeneous equation. The main increase in complexity comes in generalizing condition (2.6.32). We accumulate also the solutions of the direct homogeneous equation as the columns of a matrix.
u=(Yl
Y2)>
(2.6.37)
and place the extra condition on solution x that
u t x = 0.
(2.6.38)
(A - W t ) x = b,
(2.6.39)
Replacing Eq. (2.6.33) by
the unique solution of the original equation, augmented by condition (2.6.38) is given by
x = (A - W t ) - ' b .
(2.6.40)
Roots of even higher multiplicity can be handled in the same way. Later in the text, operators like the one appearing on the right-hand side of this equation, which generate a unique solution consistent with subsidiary conditions, will be symbolized by S,as in
x = Sb,
where S = (A - VUt)-'.
Problem 2.6.1: For the matrix A = this section.
(2.6.41)
( i)check all the formulas appearing in
When equations like Eq. (2.6.23) arise in a purely algebraic context they are often expected to be nonsingular, for example because they solve a well-posed problem. But it is more common for Eq.(2.6.23) to be singular in a geometric context. For example, in ordinary geometry, if A represents a projection onto the ( x , y ) plane followed by a rotation around the z axis, then Eq. (2.6.23) can be solved only if b lies in the ( x , y) plane, and the solution is unique only up to the possible addition of any vector parallel to the z-axis. The equations of this section have similar geometric interpretations for unitary geometry. In technology (and later in this text) the sort of analysis that has been applied to Eq. (2.6.23) is often somewhat generalized. Let the n-component vector x = ( x ' , . . . ,x") be replaced by x(t), a vector functim depending on a continuous parameter ?.Apart from possible new requirements, such as continuity,this amounts to
88
GEOMETRY OF MECHANICS I: LINEAR
little more than letting n +. 00. Suppose also that b is replaced by “drive” f(t) and A by linear “response operator” md2/dt2 k:
+
(m$
+ k) x ( t ) = f(t).
(2.6.42)
This equation can have “transient” solutions with f = 0 and “driven” solutions with f # 0. Important practical issues are whether a driven solution exists and, if it does, how the transient content of the solution is made unique by insisting that other conditions be satisfied. Examples of the use of the notation, manipulations,and results of this section can be found starting in Section 15.3.4.
BIBLIOGRAPHY References 1. R. Abraham and J. E. Marsden, Foundations of Mechanics, Addison-Wesley, Reading, MA, 1985. 2. B. F.Schutz, Geometrical Methods of Mathematical Physics, Cambridge University Press, Cambridge, UK, 1995. 3. V. I. Arnold, Mathematical Methods of Classical Mechanics, 2nd ed., Springer-Verlag, New York, 1989. 4. B. L. Van der Waerden, Algebra. Vol. 1, Springer-Verlag, New York, 1991.
References for Further Study Section 2.2 E. Cartan, Lqons sur la gkometrie des espaces de Riemann, Gauthiers-Villas, Paris, 195 1. (English translation available.) J. A. Schouten, TensorAnalysisfor Physicists, 2nd ed., Oxford University Press, Oxford, 1954.
Section 2.5 E. Cartan, The Theory of Spinors, Dover, 198 1
I
Section 2.6 H.Weyl, The Theory of Groups and Quantum Mechanics, Dover, New York, p 1-27. V. Yakubovitch and V. M. Starzhinskii, Linear Direrential Equations with Periodic Coeficients, Wiley, New York, 1975.
3 GEOMETRY OF MECHANICS II: CURVILINEAR 3.1. (REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS 3.1.1. Introduction In this section the description of orbits in real n-dimensional Euclidean space is considered, but using nonrectangular coordinates. The case n = 3 will be called “ordinary geometry.” Generalizing to cases with n > 3 is unnecessary for describing trajectories in ordinary space, but it begins to approach the generality of mechanics, where realistic problems require the introduction of arbitrary numbers of generalized coordinates. Unfortunately the Euclidean requirement (i.e., the Pythagorean theorem) is typically not satisfied in generalized coordinates. However, analysis of curvilinear coordinates in ordinary geometry already requires the introduction of mathematical methods like those needed in more general situations. It seems sensible to digest this mathematics in this intuitively more familiar setting rather than in the more abstract mathematical setting of differentiable manifolds. In the n = 3 case, much of the analysis to be performed may already be familiar, for example from courses in electricity and magnetism. For calculating fields from symmetric charge distributions, for example one that is radially symmetric, it is obviously convenient to use spherical rather than rectangular coordinates. This is even more true for solving boundary value problems with curved boundaries. For solving such problems, curvilinear coordinate systems that conform with the boundary must be used. It is therefore necessary to be able to express the vector operations of gradient, divergence, and curl in terms of these “curvilinear” coordinates. Vector theorems such as Gauss’s and Stokes’s need to be similarly generalized. In electricity and magnetism one tends to restrict oneself to geometricaliy simple coordinate systems such as polar or cylindrical systems, and in those cases some of the following formulas can be derived by more elementary methods. Here we con89
90
GEOMETRY OF MECHANICS II: CURVILINEAR
sider general curvilinear coordinates where local axes are not only not parallel at different points in space (as is true already for polar and cylindrical coordinates) but may be skew, not orthonormal. Even the description of force-free particle motion in terms of such curvilinear coordinates is not trivial-confirm this by describing forcefree motion using cylindrical coordinates. More generally, one is interested in particle motion in the presence of forces that are most easily described using particular curvilinear coordinates. Consider, for example, a beam of particles traveling inside an elliptical vacuum tube which also serves as a wave guide for an electromagnetic wave. Since solution of the wave problem requires the use of elliptical coordinates, one is forced to analyze the particle motion using the same coordinates. To face this problem seriously would probably entail mainly numerical procedures, but the use of coordinates conforming to the boundaries would be essential. The very setting up of the problem for numerical solution requires a formulation such as the present one. The problem just mentioned is too specialized for detailed analysis in a text such as this; these comments have been intended to show that the geometry to be studied has more than academic interest. But, as stated before, our primary purpose is to assimilate the necessary geometry as another step on the way to the geometric formulation of mechanics. Even such a conceptually simple task as describing straight-line motion using curvilinear coordinates will be instructive. 3.1.2. The Metric Tensor An n-dimensional “Euclidean” space is defined to consist of vectors x whose components along rectangular axes are xl, x2, . . . ,x ” , now assumed to be real. The “length” of this vector is x . x = X I 2 +x2
2
+ . . . +x”!
(3.1.1)
The “scalar product” of vectors x and y is x . y = x I y 1 + x 2y2 + . * . + x ” y ” .
(3.1.2)
The angle 0 between x and y is defined by (3.1.3) repeating the earlier result Eq. (2.5.18). That this angle is certain to be real follows from the well-known Schwarz inequality. A fundamental “orthonormal” set of “basis vectors” can be defined as the vectors having rectangular components el = (1,0,.,.,O),ez= (0, 1,...,0), etc. More general “Cartesian” or “skew” components are related to the Euclidean components by linear transformations xli
= Aijxj,
xi = (A-l)ijx’J.
(3.1.4)
Such a homogeneous linear transformation between Cartesian frames is known as a “centered-affine” transformation. If the equations are augmented by (possibly
(REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS
91
vanishing) additive constants, the transformation is given the more general name ‘‘fine.” In terms of the new components the scalar product in (3.1.2) is given by (3.1.5) where the coefficients g’,k are the primed-system components of the metric tensor. Clearly they are symmetric under the interchange of indices, and the quadratic form with x = y has to be positive definite. In the original rectangular coordinates gjk = 6jk, where 6,k is the Kronecker symbol with value 1 for equal indices and 0 for unequal indices. In the new frame, the basis vectors ell = (1, 0, . . . ,0), e’2 = (0, 1, . . . , 0), etc., are not orthonormal in general, in spite of the fact that their given contravariant components superficially suggest it; rather (3.1.6)
e fI . . eJ’ .--k ’ i j . f
As defined so far, the coefficients gjk are constant, independent of position in space. Here, by “position in space” we mean “in the original Euclidean space.” For many purposes, the original rectangular coordinates x ’ , x 2 , .. . ,X” would be adequate to locate objects in this space and, though they will be kept in the background throughout most of the following discussion, they will remain available for periodically “getting our feet back on the ground.” These coordinates will also be said to define the “base frame” or, when mechanics intrudes, as an “inertial” frame.’ As mentioned previously, “curvilinear” systems, such as radial, cylindrical, or elliptical systems, are sometimes required. Letting u l , u 2 , . . . , u” be such coordinates, space is filled with corresponding coordinate curves; on each of the %’curves,” u1 varies while u 2 , . . . , U” are fixed, and so on. Sufficiently close to any particular point P, the coordinate curves are approximately linear. In this neighborhood the curvilinear infinitesimal deviations Au’ , Au2, . . . , Au“ can be used to define the scalar product of deviations Ax and Ay:
This equation differs from Eq. (3.1.5) only in that the coefficients g j k ( P ) are now permitted to be functions of position P? The quantity 4 is designated as IAxl or as As and is known as “arc length.” ’There is no geometric significance whatsoever to a coordinate frame’s being inertial, but the base frame will occasionally be called inertial as a mnemonic aid to physicists, who are accustomed to the presence of a preferred frame such as this. The curvilinear frame under study may or may not be rotating or accelerating. 21n this way, a known coordinate transformationhas determined a corresponding metric tensor g , k ( P ) . Conversely, one can contemplate a space described by components u’ , u 2 , . . . u” and metric tensor g , k ( P ) , with given dependence on P, and inquire whether a transformation to components for which the scalar product is Euclidean can be found. The answer, in general, is no. A condition that must be satisfied to ensure that the answer to this question be yes is given in Cartan [I].
.
92
GEOMETRY OF MECHANICS II: CURVILINEAR
FIGURE 3.1.1. Relating the “natural” local coordinate axes at two different points in ordinary space described by curvilinear coordinates. Because this is a Euclidean plane, the unit vectors el and e2 at M can be “parallel slid to point M +dM without changing their lengths or directions; they are shown there as dashed arrows. The curve labeled u h + 1 is the curve on which u1 has increased by 1, and so on.
3.1.3. Relating Coordinate Systems at Different Points in Space One effect of the coordinates’ being curvilinear is to complicate the comparison of objects at disjoint locations. The quantities that will now enter to discipline such comparisons are called “Christoffel coefficients.” Deriving them is the purpose of this section. Consider the coordinate system illustrated in Fig. 3.1.1, with M dM(ul d u l . u2 d u 2 , . . . , u” d u n ) being a point close to the point M(ul, u2, . . . , u“). (The figure is planar but the discussion will be n-dimensional.) For example, the curvilinear coordinates ( u ’ , u2, . . . , u”) might be polar coordinates (I;8,Cp). The vectors M and M dM can be regarded as vectors locating the points relative to an origin not shown; their base frame coordinates (x’ ,x2, . . . ,x”) refer to a rectangular basis in the base frame centered there; one assumes the base frame coordinates are known in terms of the curvilinear coordinates and vice versa. At every point, “natural” basis vectors3 (el, e2, . , .en) can be defined having the following properties:
+
+
+
+
+
ei is tangent to the coordinate curve on which u i varies while the other coordinates are held constant. Without loss of generality i can be taken to be 1 in subsequent discussion. 0
With the tail of el at M, its tip is at the point where the first component has increased from u’ to u’ 1.
+
3The basis vectors being introduced at this point are none other than the basis vectors called 8 / 6 u 1,
B/Bu2,etc.. in Section 2.4.5,but we ref’rain from using that notation here.
(REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS
93
However, the previous definition has to be qualified because the unit increment of a coordinate may be great enough to cause the coordinate curve to veer noticeably away from the straight basis vector-think, for example, of a change in polar angle @ -+ @ 1 radian. Clearly the rigorous definition of the “length” of a particular basis vector, say el, requires a careful limiting process. Instead, forsaking any pretense of rigor, let us assume the scale along the u 1 coordinate curve has been expanded sufficiently by “choosing the units” of u1 to make the unit vector coincide with the coordinate curve to whatever accuracy is considered adequate.
+
0
0
We will be unable to refrain from using the term “unit vector” to describe a basis vector ei, even though doing so is potentially misleading because, at least in physics, the term “unit vector” usually connotes a vector of unit length; the traditional notation for a vector parallel to ei, having unit length, is 6.Here we have ei 11 $ but $ = ei/leil.
If one insists on ascribing physical dimensions to the ei, one must allow the dimensions to be different for different i. For example, if (el, e2, e3) correspond to (r, 8, @), then the first basis vector has units of length while the other two are dimensionless. Though this may seem unattractive, it is not unprecedented in physics-one is accustomed to a relativistic 4-vector having time as2nz coordinate and distances as the others. On the other hand, the vectors @, 8,4) all have units of meters-but this is not much of an advantage since, as mentioned previously, the lengths of these vectors are somewhat artificial in any case.
Hence, deviating from traditional usage in elementary physics, we will use the basis vectors ei exclusively, even calling them unit vectors in spite of their not having unit length. Dimensional consistency wiIi be enforced separately. Dropping quadratic (and higher) terms, the displacement vector dM can be expanded in terms of basis vectors at point M:4
d~ = d u ’ e l + du2e2 + .dunen = du‘ei.
(3.I .8)
Note that this equation imples aM/aul = e l , aM/au2 = e2, etc. At each point other than M, the coordinate curves define a similar “natural” nplet of unit vectors. The reason that “natural” is placed in quotation marks here and above is that what is natural in one context may be unnatural in another. Once the particular coordinate curves (u’,u2, .. . ,u ” ) have been selected, the corresponding n-plet (el, e2, . . .,en) is natural, but that does not imply that the coordinates ( u l , u2,. . . , u”) themselves were in any way fundamental. Our present task is to express the frame (e’l ,e’2, . . . , e’n)at M dM in terms of the frame ( e l ,e2, . . . ,en) at M. Working with just two components for simplicity,
+
4A physicist might interpret Eq. (3.1.8) as an approximate equation in which quadratic terms have been neglected; a mathematician might regard it as an exact expansion in the “tangent space” at M.
94
GEOMETRY OF MECHANICSIt: CURVILINEAR
the first basis vector can be approximated as
e’l = el
+ del = el + q l e l + wfe2
(3.1.9a)
The (yet to be determined) coefficients oij can be said to be “affine-connecting” as they connect quantities in affinely related frames; the coefficients r j k are known as Christoffel symbols or as an “affine connection.” Both forms of Eq. (3.1.9) will occur frequently in the sequel, with the (b) form being required when the detailed dependence on coordinates ui has to be exhibited, and the simpler (a) form being adequate when all that is needed is an expansion of new basis vectors in terms of old. Here, as in the previous chapter, we employ a standard but bothersome notational practice; the incremental expansion coefficients have been written as q’rather than as doi’-a notation that would be harmless for the time being but would clash later on when the notation do is conscripted for another purpose. To a physicist it seems wrong for a differential quantity del to be a superpositionof quantities like ullel that appear, notationally, to be nondifferential. But, having already accepted the artificial nature of the units of the basis vectors, we can adopt this notation, promising to sort out the units and differentials later. The terminology “affine connection” anticipates more general situations in which such connections do not necessarily exist. This will be the case for general “manifolds’’ (spaces describable, for example, by “generalized coordinates” and hence essentially more general than the present Euclidean space). For general manifolds there is no “intrinsic” way to relate coordinate frames at different points in the space. Here “intrinsic” means “independent of a particular choice of coordinates.” This can be expressed by the following prohibition against illegitimate vector superposition: A vector at one point cannot be expanded in basis vectors belonging to a different point?
After this digression we return to the Euclidean context and Eq. (3.1.9).Thisequation appears to be doing the very thing that is not allowed, namely expanding e’l in terms of the ei . The reason it is legitimate in this case is that there is an intrinsic way of relating frames at M and M + dM-it is the traditional parallelism of ordinary geometry, as shown in Fig. 3.1.1. One is really expanding e’l in terms of the vectors ei slid parallel from M to M dM.All too soon, the concept of “parallelism” will have to be scrutinized more carefully, but for now, since we are considering ordinary space, the parallelism of a vector at M and a vector at M dM has its usual, intuitively natural meaning-for example, basis vectors el and e’l in the figure are almost parallel, while e2 and e’2 are not. With this interpretation,Q.(3.1.9) is a relation entirely among vectors at M+dM. The coefficients oiJand rijkbeing well defined, we proceed to determine them,
+
+
’This may seem counterintuitive; if you prefer, for now replace “cannot” by “must not” and regard it as a matter of dictatorial edict.
(REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS
95
starting by rewriting Eq. (3.1.9) in compressed notation:
e', = ei
+ dei = ei + wij ej = ei
(3.1.10a)
+ riJkduke j .
(3.1.lob)
The quantities 0.'
= rijkdUk
(3.1.1 1)
are one-forms, linear in the differentials duk.6The new basis vectors must satisfy Q. (3.1.6):
e/ i .eI r = gjr
+ dgjr = (ei + mi'
e j ) . (e,
+ a,"e s ).
(3.1.12)
Dropping quadratic terms, this can be written succinctly as
+
dgj, = wiJg,, w,Sgi, '2 W i r
+
Ori.
(3.1.13)
(Because the quantities w i j are not the components of a true tensor, the final step is not a manifestly covariant, index-lowering tensor operation, but it can nonetheless serve to define the quantities wij, having two lower indices.) Because also
dgi, - ( a g i , / a u J )d u J ,one obtains
(3.1.14)
For reasons to be made clear shortly, we have written the identical equation three times, but with indices permuted, substitutions like gri = gir having been made in some terms, and the first equation having been multiplied through by - 1.
+
Problem 3.1.1: Show that Eqs. (3.1.14) yield n2(n 1)/2 equations that can be applied toward determining the n3 coefficients rijk.Relate this to the number of parameters needed to fix the scales and relative angles of a skew basis set. For the n = 3, ordinary geometry case, how many more parameters are needed to fix the absolute orientation of a skew frame? How many more conditions on the ri'k does this imply? Both for n = 3 and in general, how many more conditions will have to be found to make it possible to determine all of the Christoffel coefficients? old references they were known as pfaffian forms.
%
GEOMETRY OF MECHANICS II: CURVILINEAR
3.1.3.1. Digression Concerning “Flawed Coordinate Systems.” Two city dwellers part company intending to meet after taking different routes. The first goes east for N E street numbers, then north for NN street numbers. The second goes north for N N street numbers, then east for N E street numbers. Clearly they will not meet up in most cases because the street numbers have not been established carefully enough. Will their paths necessarily cross if they keep going long enough? Because cities are predominantly two-dimensional, they usually will. But it is not hard to visualize the presence of a tunnel on one of the two routes that leads one of the routes below the other without crossing it. In dimensions higher than two that is the generic situation. Though it was not stated before, we now require our curvilinear coordinate system to be free of the two flaws just mentioned. At least locally, this can be ensured by requiring (3.1.15) When expressed in vector terms using Eq. (3.1.8), the quantities being differentiated here can be expressed
aM
(3.1. I 6 )
aui = ei* Hence, using Eq. (3.1.10)b, we require
(3.1.17) This requirement that ryi be symmetric in its lower indices yields n2(n - 1)/2 further conditions which, along with the n2(n 1)/2 conditions of Eq. (3.1.14), should permit us to determine all n3 of the Christoffel coefficients. It can now be seen why Eq. (3.1.14) was written three times. Adding the three equations and taking advantage of Eq. (3.1.17) yields
+
(3.1.18) For any particular values of “free indices” i and j (and suppressing them to make the equation appear less formidable) this can be regarded as a matrix equation of the form gkrrr
= Rk
or GI’ = R.
(3.1.19)
Here G is the matrix (gkr) introduced previously, r = (r,)is the set of Christoffel symbols for the particular values of i and j , and R = (&) is the corresponding righthand side of Eq. (3.1.18). The distinction between upper and lower indices has been
(REAL)CURVILINEAR COORDINATES IN N-DIMENSIONS
97
ignored? Being a matrix equation, this can be solved without difficulty to complete the detemzination of the Christoffel symbols:
r =G-~R,
(3.1.20)
and the same can be done for each i , j pair. Though these manipulations may appear overly formal at this point, an example given below will show that they are quite manageable. 3.1.4. The Covariant (or Absolute) Differential There is considerable difference between mathematical and and physical intuition in the area of differentiation. Compounding this, there is a plethora of distinct types of derivative, going by names such as total, invariant, absolute, covariant, variational, gradient, divergence,curl, Lie, exterior, Frechkt, Lagrange, etc. Each of these-some are just different names for the same thing4ombines the common concepts of differential calculus with other concepts. In this chapter some of these terms are explained, and eventually nearly all will be. The differential in the denominator of a derivative is normally a scalar, or at least a one-component object, often d t , while the numerator is often a multicomponent object. The replacement oft by a monotonically related variable, say s = f(r), makes a relatively insignificant change in the multicomponent derivatives-all components of the derivative are multiplied by the same factor d t l d s . This makes it adequate to work with differentials rather than derivatives in most cases, and that is what we will do. We will disregard as inessential the distinction between the physicist’s view of a differential as an approximation to a small but finite change and the mathematician’s view of a differential as a finite yet exact displacement along a tangent vector. We start with a type of derivative that may be familiar to physicists in one guise, yet mysterious in another; the familiar form is that of coriolis or centrifugal acceleration. Physicists know that Newton’s first law-free objects do not accelerateapplies only in inertial frames of reference. If one insists on using an accelerating frame of reference-say one fixed to the earth, incorporating latitude, longitude, and altitude-the correct description of projectile motion requires augmenting the true forces, gravity and air resistance, by “fictitious” coriolis and centrifugal forces. These extra forces compensate for the fact that the reference frame is not inertial. Many physicists, perhaps finding the introduction of fictitious forces artificial and hence distasteful, or perhaps having been too well-taught in introductory physics that “there is no such thing as centrifugal force,” resist this approach and prefer a strict inertial frame description. Here we insist instead on a noninertial description using the curvilinear coordinates introduced in the previous section. ’Failing to distinguish between upper and lower indices ruins the invariance of equations as far as transformation between different frames is concerned, but it is valid in any particular frame. In any case, since the quantities on the two sides of Eq. (3.1.19) are not tensors, distinction between upper and lower indices would be unjustified.
98
GEOMETRY OF MECHANICS II: CURVILINEAR
A particle trajectory can be described by curvilinear coordinates u ' ( t ) , u 2 ( t ) , . . . , u"(f) giving its location as a function of time r. For example, uniform motion on a circle of radius R is described by r = R, Q = o f . The velocity v has curvilinear velocity components that are dejned by .
du' dt
V ' E - E U ,
.i
(3.1.21)
In the circular motion example, i. = 0. Should one then define curvilinear acceleration components by (3.1.22) No! One could define acceleration this way, but it would lead, for example, to the result that the radial acceleration in uniform circular motion is zero-certainly not consistent with conventional terminology. Here is what has gone wrong: while v is a perfectly good arrow, and hence a true vector, its components vi are projections onto axes parallel to the local coordinate axes. Though these local axes are not themselves rotating, a frame moving so that its origin coincides with the particle and having its axes always parallel to local axes has to be rotating relative to the inertial frame. One is violating the rules of Newtonian mechanics. Here is what can be done about it: Calculate acceleration components relative to the base frame. Before doing this we establish a somewhat more general framework by introducing the concept of vectorjeld. A vecforjeld V ( P ) is a vector function of position that assigns an arrow V to each point P in space. An example with V = r, the radius vector from a fixed origin, is illustrated in Fig. 3.1.2. (Check the two boldface arrows with a ruler to confirm V = r.) In the figure the same curvilinear coordinate system as appeared in Fig. 3.1.1 is assumed to be in use. At each point the curvilinear components V' of the vector V are defined to be the coefficients in the expansion of V in terms of local basis vectors:
v = Vie'.
(3.1.23)
The absolute differential DV of a vector function V( P ) . like any differential, is the change in V that accompanies a change in its argument, in the small change limit. For this to be meaningful it is, of course, necessary to specify what is meant by the changes. In Fig. 3.1.2, in the (finite) change of position from point M to point M',the change in V is indicated by the arrow labeled V' - V; being an arrow, it is manifestly a true vector. Jh terms of local coordinates, the vectors at M and M' are given, respectively, by
V
= V ( M ) = V'el
+ V2e2,
V' = V(M') = V'le'I
+ VI2e'2.
(3.1.24)
In the limit of small changes, using Eq. (3.1.10a), one has
DV
= d(V')ej + V j d ( e , ) = d(V')ei + V j u j i e i = ( d ( V ' ) + V j m j i ) e j . (3.1.25)
(REAL) CURVILlNEAR COORDINATES IN N-DIMENSIONS
99
FIGURE 3.1.2. The vector field V ( P ) = r(P). where r(P) is a radius vector from point 0 to point P, expresssed in terms of the local curvilinear coordinates shown in Fig. 3.1.1. The change V’ - V in going from point M to point M’ is shown.
This differential (a true vector by construction) can be seen to have contravariant components given by
DV’=(DV)‘= d V ‘ + V J w j i ~ d V ’ + V j , J ! . ~ d u ~ ,
(3.1.26)
where the duk are the curvilinear components of M’relative to M. (Just this time) a certain amount of care has been taken with the placement of parentheses in these Note that, since equations. The main thing to notice is the definition DV E (DV)’. the components uk and V’ are known functions of position, their differentials duk and d V’ are unambiguous; there is no need to introduce symbols D(uk) and D( V‘) since, if one did, their meanings would just be duk and d V’ . On the other hand the quantity DV is a newly defined true vector whose components are being first evaluated in Eq.(3.1.26). (It might be pedagogically more helpful if these components rather than by DV but since that is never done, were always symbolized by (DV)’ it is necessary to remember the meaning some other way. (For the moment the superscript i has been moved slightly away to suggest that it “binds somewhat less tightly” to V than does the D.)Note then that the DV are the components of a true vector, while d V ’ , differential changes expressed in local coordinates, are not. DV is commonly called the covariunt differential; this causes DV to be the “contravariant components of the covariant differential.” Since this is unwieldy, we will use the term absolute differential rather than covarianfdifferential. If the vector
’
’;
’
100
GEOMETRY OF MECHANICS II: CURVILINEAR
being differentiated is a constant vector A , it follows that D A = 0,
.
.
and hence, d A i = -AJ#,’.
(3.1.27)
How to obtain the covariant components of the absolute differential of a variable vector V is the subject of the following problem.
Problem 3.1.2: Consider the scalar product, V . A , of a variable vector V and an arbitrary constant vector A. Its differential, as calculated in the base frame, could be designated D (V . A), while its differential in the local frame could be designated d (V . A). Since the change of a scalar should be independent of frame, these two differentials must be equal. Use this, and Eq. (3.1.27) and the fact that A is arbitrary, to show that the covariant components of the absolute differentials of vector V are given by D v i = d V j - vkaik .
(3.1.28)
If ?; is the form naturally associated with V then the DVi are the components of a form E V .
Problem 3.1.3: The line of reasoning of the previous problem can be generalized to derive the absolute differential of more complicated tensors. Consider, for example, a mixed tensor aiJ having one upper and one lower index. Show that the absolute differential of this tensor is given by Da
iJ
= d a i J - akJ‘ wi k
+ aik #k j .
(3.1.29)
Problem 3.1.4: Ricci’s Theorem. Derived as in the previous two problems, Dai, = dai, - ak, mik - aik w j k .
(3.1.30)
Using this formula and Eq. (3.1.13), evaluate Dgij, the absolute differential of the metric tensor gi,, and show that it vanishes. This means that the metric tensor elements act like constants for absolute differentiation. Use this result to show that the absolute differential D ( A .B ) of the scalar product of two constant vectors A and B vanishes (as it must). Because the concept of absolute differentiation is both extremely important and quite confusing, some recapitulation may be in order. Since the difference of two arrows is an arrow, the rate of change of an arrow is an arrow. Stated more conventionally, the rate of change of a true vector is a true vector. Confusion enters only when a vector is represented by its components. It is therefore worth emphasizing that The components of the rate of change of vector V are not, in general, the rates of change of the components of V.
(REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS
101
This applies to all true tensors. Unfortunately, since practical calculation almost always requires the introduction of components, it is necessary to develop careful formulas, expressed in component form, for differentiating vectors (and all other tensors). The derivation of a few of these formulas is the subject of the set of problems just above. 3.1.5. Derivation of the Lagrange Equations of Mechanics from the Absolute Differential In mechanics one frequently has the need for coordinate systems that depend on position (curvilinear) or time (rotating or accelerating). Here we analyze the former case while continuing to exclude the latter. That is, the coefficients of the metric tensor can depend on position but are assumed to be independent of t . On the other hand, the positions of the particle or particles being described certainly vary with time. In this section we symbolize coordinates by q’ rather than the ui used to this point. There is no significance whatsoever to this change other than the fact that generalized coordinates in mechanics are usually assigned the symbol q . The first vector to be subjected to invariant differentiation will be the position vector of a point following a given trajectory.
Example 3.Z.Z: Motion that is simple when described in one set of coordinates may be quite complicated in another. For example, consider a particle moving parallel to the x axis at y = 6 with constant speed u. That is (x = u t , y = 6). In spherical coordinates, with 6 = 71/2, the particle displacement is given by 6 r=sin 4 ’
4 =tan-
1 -.6
ut
The time derivatives are U
vrri=ucos4,
v@E$=---sin24; 6
following our standard terminology for velocities, we have defined v‘ = i and v@ = 6. (This terminology is by no means universal, however. It has the disagreeable feature that the components are not the projections of the same arrow onto mutually orthonormal axes, as Fig. 3.1.3 shows. Also they have different units. They are, however, the contravariant components of a true vector along well-defined local axes.) Taking another time derivative yields u2 F = -sin 6
3
..
4, 4 =
2u2 -CCOS~
b2
sin34.
Defining by the term “absolute acceleration”the acceleration in an inertial coordinate frame, the absolute acceleration obviously should vanish in this motion. And yet
102
GEOMETRY OF MECHANICS II: CURVILINEAR
X
Ut
FIGURE 3.1.3. A particle moves parallel to the x-axis at constant speed u.
the quantities i: and Example 3.1.4.
6 are nonvanishing. We will continue this example below in
Example 3.1.2: In cylindrical ( r , 4, z) coordinates the nonvanishing Christoffel elements are 1 r22 = -r,
(as will be shown shortly). The polar components of r = (r, 0) components of the covariant derivative with respect to r are
= (4' , q 2 ) and the
That is, (i-, t$) are the polar components of D r l d t . This is a misleadingly simple result, however. It follows from the result that, if r = rer is the radius vector of a particle, and r is taken as its first, and its only nonvanishing, coordinate, then Dq'ldr = 4' rr'i,:. For this particular choice of coordinates, the second term vanishes. In general, though the elements dq' transform like the components of a vector, the coordinates q' themselves do not, because they are macroscopic quantities that cannot be expected to have properties derived from linearization. Hence the quantities Dq' have nondescript geometric character and should perhaps not even have been written down. Nevertheless, (q l , q 2 , . . . ,q") is a true vector.
+
Exumpfe3.1.3: This same result can be obtained simply using traditional vector analysis on the vectors shown in Fig. 3.1.4:
The factor r in the final term reflects the difference between our basis vector eg and the unit length vector (i) of ordinary vector analysis.
(REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS
103
P FIGURE 3.1.4. Time rate of change of a unit vector.
If the changes to vector V discussed in the previous section occur during time d t , perhaps because a particle that is at M at time t moves in such a way as to be at M' at time t d t , the differentials DV' of vector V can be converted to time derivatives:8
+
The quantity V being differentiated in Eq. (3.1.31) is any vector field. One such possible vector field, defined at every point on the trajectory of a moving particle, is its instantaneous velocity v = d x / d t . Being an m o w (tangent to the trajectory) v is a true vector; its local components are ui = dq'/dt = q ' . The absolute acceleration a is defined by
Dv a=-
dt '
. Dv' or a' = -. dt
(3.1.32)
Substituting V' = ui in Eq. (3.1.31) yields (3.1.33)
As the simplest possible problem of mechanics let us now suppose that the particle being described is subject to no force and, as a result, to no acceleration. Setting a' = 0 yields
# = -ri k 4.' q k. j
(3.1.34)
This is the equation of motion of afree particle. In rectangular coordinates, since rik = 0,this degenerates to the simple result that v is constant; the motion in question is along a straight line with constant speed. This implies that the solution of Eq. (3.1.34) is the equation of a straight line in our curvilinear coordinates. Since a line is a purely geometric object, it seems preferable to express its equation in 'In this and previous expressions the common shorthand indication of ford time derivative by an overhead dot has been used. One can inquire why V' has been defined to mean dV ' / d r , rather than DV ' / d t . It is just a convention (due originally to Newton), but the convention is well established, and it must be respected if nonsense is to be avoided. The vector field V. though dependent on position, has been assumed to be constant in time; if V has an explicit time dependence, the term V r would have to include also a contribution av'lar.
104
GEOMETRY OF MECHANICS II: CURVILINEAR
terms of arc length s along the line rather than time t. As observed previously, such a transformation is easy-especially so in this case since the speed is constant. The equation of a straight line is then d2q’ -=-r! ds2
. dqjdqk
--.
J
ds ds
(3.1.35)
Suppose next that the particle is not free, but rather is subject to a force F. Substituting a = F/m, where m is the particle mass, into Eq. (3.1.33) can be expected to yield Newton’s law expressed in these coordinates, but a certain amount of care is required before components are assigned to F. Before pursuing this line of development, we look at free motion from another point of view. There are two connections with mechanics that deserve consideration-variational principles and Lagrange’s equations. The first can be addressed by the following problems: the first of these is reasonably straightforward; the second is less so and could perhaps be deferred or looked up9
Problem 3.1.5: Consider the integral S =
J7
L ( q ’ , q ’ , r)dr, evaluated along a candidate particle path between starting position at initial time rl and final position at time 22, where L = ygijq’q’. Using the calculus of variations, show that Eq. (3.1.34) is the equation of the path for which S is extreme. In other words, show that Eq. (3.1.34) is the same as the Euler-Lagrange equation for this Lagrangian L .
Problem 3.1.6: It is “obvious” also, since free particles travel in straight lines and straight lines have minimal lengths, that the Euler-Lagrange equation for the trajectory yielding extreme value to integral I = 1d s where ds2 = g;,dq’dq’ should lead also to Eq. (3.1.35). Demonstrate this. These two problems suggest a close connection between Eq. (3.1.33) and the Lagrange equations that will now be considered. The key dynamic variable that needs to be defined is the kinetic energy T. In the present context, using Eq. (3.1.7),
T =m --gjkqiqk. 2
(3.1.36)
(Though not exhibited explicitly, the metric coefficients depend on the coordinates 4’ .) When one thinks of “force” one thinks either of what its source is, for example an electric charge distribution, or what it must be to account for an observed acceleration. Here we take the latter tack and (on speculation) introduce a quantity Q; (to be
9B0th problems are solved in Dubrovin et al. [2].
(REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS
105
interpreted later as the “generalized force” corresponding to q’ ) by Ql=
d aT d t aq
aT
7 --
aql‘
(3.1.37)
We proceed to evaluate Ql by substituting for T from Eq. (3.1.36), and using Eq. (3.1.18):
(3.1.38)
This formula resembles Eq. (3.1.33); a comparison with Eq. (2.5.17) shows that their right-hand sides are covariant and contravariant components of the same vector. Expressed as an intrinsic equation, this yields
(3.1.39) This confirms that the Lagrange equations are equivalent to Newton’s equations since the right-hand side is the acceleration a’. For this equation to predict the motion, it is of course necessary for the force Q’ to be given.
Recapitulation: From a given particle trajectory it is a kinematical job to infer the acceleration, and the covariant derivative is what is needed for this task. The result is written on the right-hand side of Eq. (3.1.39), in the form of contravariant components of a vector. It was shown in Eq. (3.1.38) that this same quantity could be obtained by calculating the “Lagrange derivatives” d / d t ( a T / a q ) - a T / a q where T = (m/2)g,k q J q k (The . occurrence of mass m in the definition of T suggests that it is a dynamical quantity, but inclusion of the multiplier m is rather artificial; T and the metric tensor are essentially equivalent quantities.) It is only a minor complication that the Lagrange derivative of T , yields covariant components which need to “have their indices raised” before yielding the contravariant components of acceleration. Dynamics only enters when the acceleration is ascribed to a force according to Newton’s law, a = F / m . When a in this equation is evaluated by the invariant derivative as in Eq. (3.1.39), the result is called “Newton’s equation.” When a is evaluated by the Lagrange derivative of T , the result is called “Lagrange’sequation.” Commonly force is introduced into the Lagrange equation by introducing L = T - V , where V is “potential energy.” This is an artificial abbreviation, however, since it mixes a kinematic quantity T and a dynamic quantity V. From the present point of view, since it is not difficult to introduce forces directly, this procedure is, logically speaking, clearer than introducing them indirectly in the form of potential energy.
106
GEOMETRY OF MECHANICS II: CURVILINEAR
The prominent role played in mechanics by the kinetic energy T is due, on the one hand, to its close connection with ds2 and, on the other hand, to the fact that virtual force components Ql can be derived from T using Fq.(3.1.37). 3.1.6. Practical Evaluation of the Christoffel Symbols The “direct” method of obtaining Christoffel symbols for a given coordinate system is to substitute the metric coefficients into Eq. (3.1.18) and solve Eqs. (3.1.19). But this involves much differentiation and is rather complicated. A practical alternative is to use the equations just derived. Suppose, for example, that spherical coordinates are in use: ( q l , q2,q 3 )= (r, 6,4). In terms of these coordinates and the formula for distance ds, one obtains metric coefficients from Eqs. (3.1.5) and (2.5.17): ds2 = d r 2
+ r2 do2 + r2 sin26 d#’,
822 = r
gll = 1,
2
,
g33
= r2 sin26,
(3.1.40)
and all off-diagonal coefficients vanish. ( g ” , g22,g33are defined by the usual index raising.) Acceleration components aican then be obtained using Eq.(3.1.39),though it is necessary first to raise the index of Ql using the metric tensor. From Eq. (3.1.39) one notes that the Christoffel symbols are the coefficients of terms quadratic in velocity components, i., 8, and in Eq. (3.1.38), and this result can be used to obtain them. Carrying out these calculations (for spherical coordinates with m = l), the kinetic energy and the contravariant components of virtual force are given by
4
2T = i 2 + r 2 b 2 + r 2 sin28#2,
_ - _ _~ ~ ) = k . + ; i2* - s i n 6 c o s e ) 2 ,
( d aT
Q2=-
r2 dt
a@
case . .
Q 3 = - - (1d ~ - - 5 ) = 4 + - i ) + 2..- 6 )2 . r2sin26 dt a+ r
(3.1.41)
sin 6
Matching coefficients, noting that the coefficientswith factors of 2 are the terms that are duplicated in the (symmetric) off-diagonal terms of (3.1.38), the nonvanishing Christoffel symbols are
r2I
= -r,
r,2
= F,
3
r313= -r
sin26 ,
1
2 r33 = -sin6
1
r 2 3 =sin6 .
r 1 3= -, r
3
cose
cos6,
(3.1.42)
(REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS
107
Example 3.1.4: We test these formulas for at least one example by revisiting Example 3.1.1. Using Eq.(3.1.41), we find that the force components acting on the particle in that example are u2 sin3 4 - r$2 = 0, Q' = 7 2u2
Q3
= p cos4 sin3 4
2 + -i-i = 0. r
This shows that the particle moving in a straight line at constant speed is subject to no force. This confirms statements made previously.
Problem 3.1.7: For cylindrical ( p , 4, z) coordinates, calculate the Christoffel symbols both directly from their defining Eqs. (3.1.18) and indirectly using the Lagrange equation.
3.1.7. Evaluation of the Christoffel Symbols Using Maple It is even easier to obtain the Christoffel symbols using a computer program. The following listing shows the use of Maple to obtain the Christoffel symbols for spherical and cylindrical coordinates. This would be most useful for less symmetric coordinates, defined by more complicated formulas. Calculate Christoffel coefficients readlib(tensor1: Ndim:=3: #
# #
Spherical coordinates xl:=r; x2:=theta; x3:=phi; gll:=l; g22:=r-2; g33:=rA2*sin(theta)^2; tensor0 ; for i from 1 by 1 to 3 do for j from 1 by 1 to 3 do for k from 1 by 1 to 3 do if i
= (Y". Y"f)
3
(12.1.37)
where v is the ordinary particle velocity. As defined, the 4-velocity does not have the dimensions of velocity. Rather it is scaled by a factor c so that it is the velocity in units in which c = 1; as far as units are concerned, that makes it, like &, the ratio of the particle speed to the velocity of light. Because d s 2 = dx' dxi, the 4-scalar formed from u i is constant, independent of the particle's three velocity;
u'ui = 1;
(12.1.38)
this makes calling it the "magnitude squared' of ui potentially misleading and hence inappropriate. The 4-acceleration w iis defined similarly: (12.1.39) Differentiating Eq. (12.1.38), the 4-velocity and 4-acceleration are seen to be mutually "orthogonal": U'Wi
= 0.
(12.1.40)
These simple results have considerable significance in relativistic mechanics. 12.2. THE RELATIVISTIC PRINCIPLE OF LEAST ACTION
In nonrelativistic mechanics the Lagrange equation is derivable from the principle of least action, according to which the actual trajectory taken by a particle, between times to and t , minimizes the action function S defined by
S=
lof
(12.2.1)
Ldt
When the minimized function S(x0, to; x , t) is expressed in terms of the coordinates of the upper limit x and t , with the lower endpoint fixed, it satisfies the HamiltonJacobi relations.
as
p=-, ax
H --- -
as at
'
where H is the Hamiltonian corresponding to Lagrangian L .
(12.2.2)
ENERGY AND MOMENTUM
357
It is straightforward to generalize what has just been stated in such a way as to satisfy the requirements of relativity while at the same time leaving nonrelativistic relationships (i.e., Newton's Law) valid when speeds are small compared to c. Owing to the homogeneity of space and the homogeneity of time, the relativistically generalized action S cannot depend on the particle's coordinate 4-vector X I . Furthermore, it must be a relativistic scalar, as otherwise it would have directional properties forbidden by the isotropy of space. Owing to Eq.(12.1.38), it is impossible to form a scalar other than a constant using the 4-vector u' . In short, the only possibility for the action of a free-particle (ie., one subject to no force) is
1 I
S = (-mc)
d s = (-mc2)
lo1 7 J1 -
v2
dt,
(12.2.3)
where d s , the invariant interval defined in Eq.(12.1.4), is the proper time multiplied by c. The integral is to be evaluated between initial and final world points P1 and P2.A priori the multiplicative factor could be any constant, but it will be seen below why the factor has to be (-mc).The negative sign is significant. It corresponds to the seemingly paradoxical result mentioned above that the free-particle path from P1 and P2 maximizes the proper time taken. Comparing J3q. (12.2.1) and (12.2.3), it can be seen that the free-particle Lagrangian is L ( x , ~= ) -mc2
J
1 - -.
(12.2.4)
12.3. ENERGY AND MOMENTUM Using standard Lagrangian formalism the momentum p is dejned by
mv
(12.3.1)
For u small compared to c this gives the nonrelativistic result p 2 mv. This is the relation that fixed the constant factor in the initial definition of the Lagrangian. Using Eq. (12.2.4) and Eq. (12.3.1), one obtains the Hamiltonian H and hence the energy & by
&=p.v-L=-----
mcL
Jiq
- m c 2 yu.
(12.3.2)
For v small compared to c this gives
(12.3.3)
358
RELATIVISTICMECHANICS
which is the classical result for the kinetic energy, except for the additive constant €0 = mc2, which is known as the rest energy. An additive constant like this has no effect in the Lagrangian description. From Eq. (12.3.1) and Eq. (12.3.2) come the important identities E 2 -- p 2c 2 i - m2 c4 ,
EV
P=-p.
(12.3.4)
For massless particles like photons these reduce to u = c and
E
p = -;
(12.3.5)
C
This formula also becomes progressively more valid for a massive particle as its total energy becomes progressively large compared to mc2. As stated previously, m is the “rest mass,’’ a constant quantity, and there is no question of “mass increasing with velocity” as occurs in some descriptions of relativity, such as the famous “E = rnc2,” which is incorrect in our formulation. Remembering to express H in terms of p , the relativistic Hamiltonian of a free particle is given by
.-/,
H(p) =
(12.3.6)
H depends on p but not on q. 12.3.1. 4-Vector Notation
Referring back to Eq.(12.1.37),it can be seen that p. as given by Eq. (12.3.1), and E , as given by Eq. (12.3.2), are closely related to the 4-velocity ui.We define a momentum 4-vector p’ by
(12.3.7)
We expect that p i pi, the scalar product of pi with itself should, like all scalar products, be invariant, The first of Eqs. (12.3.4)shows this to be true: p i p , = E 2 / c2
- p 2 = m2c2.
(12.3.8)
Because they belong to the same 4-vector, the components of p and E in different coordinate frames are related according to the Lorentz transformation, Eq. ( 1 2.1.15).
359
RELATlVlSTlC HAMILTON-JACOEITHEORY
12.4. RELATIVISTIC HAMILTON4ACOBITHEORY Corresponding to the Hamiltonian of Eq.(12.3.6), the Hamilton-Jacobi equation is
(
$)2
= c2
(g)2+ c2 (
;)2
+ c2
(g) + 2
m2c4.
(12.4.1)
Since the relations p = aS/ax and & = -aS/at of Eq. (12.2.2) were derived purely from the calculus of variations, without reference to physical meaning, they must remain valid in relativistic mechanics. Nevertheless we will rederive them in order to illustrate an abbreviated manipulation. The variations, 6S,in the action accompanying a variation, 6 x i ( t ) , away from the true world trajectory are what establish the equations of motion. Here S x i ( t ) is an arbitrary function. Variation of the integrand of Eq.(12.2.3) yields Sds = 6
J
m =
(Adxi)dx'
+ dXiS(dx')
2J&s
(12.4.2) The last line is preparatory to integration by parts. The variation of the action is (12.4.3) With 6 d s given by Eq. (12.4.2), the integration limits can be held fixed or varied as we wish. If the endpoints are held fixed, the term coming from the first term of the last Iine of Eq.(12.4.2) vanishes. In that case, since the principle of least action assures that SS vanishes, and since Ax' is arbitrary, the vanishing of the 4-acceleration wi = dui / d s follows; this is appropriate for a force-free situation. When the upper endpoint of the integral in Eq. (12.4.3) is varied, but with the requirement that the trajectory be a true one, then the term in the integral coming from the second term in Eq. (12.4.2) vanishes leaving
SS = -mcuiSx'.
(12.4.4)
This yields pi = mcui =
(F,
as
-p) = --
axi
(12.4.5)
which agrees with Eqs. (12.3.7). Notice that the contravariant and covariant indices magically take care of the signs. Also, the result is consistent with Eq.(12.1.35); the 4-gradient of a scalar is a covuriant 4-vector.
360
RELATIVISTIC MECHANICS
12.5. FORCED MOTION If the 4-velocity is to change, it has to be because force is applied to the particle. It is natural to define the 4-force g' by the relation
where the classically defined Newtonian force is F = - -dP . dt
(12.5.2)
The time component yuF . v/c2 is related to the rate of work done on the particle by the external force. Note that it vanishes in the case that F . v = 0, as is true for a charged particle in a purely magnetic field.
12.6. GENERALIZATIONOF THE ACTION TO INCLUDE ELECTROMAGNETICFORCES The action for a free particle t
S = - m c 1 ds
(12.6.1)
was selected because, except for an arbitrary multiplicative factor, d s is the only first-order-differential,origin-independent4-scalar that can be constructed. The constant factor was selected to assure correspondence with Newtonian mechanics in the nonrelativistic limit. We now generalize this by introducing an initially arbitrary 4-vector function of position A' ( x i ) = ( 4 ,A), and take for the action
S = l o t ( - m c d s - eA; d x ' ) .
(12.6.2)
The integrand certainly satisfies the requirement of being a relativistic invariant. Like the factor -mc, chosen to make free motion come out right, the factor e is chosen to make this action principle lead to the forces of electromagnetism, with e being the charge on the particle. SI units (also known as MKS units) are being employed. The factors qj and A are called the scalar and vector potentials, respectively. Spelling out the integrand more explicitly, and making the differential be d t , so as to be able to
GENERALIZATION OF THE ACTION TO INCLUDE ELECTROMAGNETICFORCES
361
extract the Lagrangian, the action is S=
lo‘(
- m c 2 / z
+ eA . v - e#)
dt.
(12.6.3)
This shows that the Lagrangian is
L = -mc2Jl-
212 C2
+ eA . v - etp.
(12.6.4)
(Another candidate for the action that would be consistent with relativistic invariance is A ( x ’ ) ds, where A ( x ’ ) is a scalar function of position, but that would not lead to electromagnetism.) Once the Lagrangian has been selected, one must slavishly follow the prescriptions of Lagrangian mechanics in order to introduce the “momentum” P, conjugate to x, and to obtain the equations of motion. This newly introduced (uppercase’) momentum will be called the generalized momentum, to distinguish it from the previously introduced “ordinary momentum” or “mechanical momentum” p. You should continue to think of the (lowercase) quantity p as the generalization of the familiar mass times velocity of elementary mechanics. The generalized momentum P has a more formal significance connected with the Lagrange equations. It is given by
aL p=--
-
mv
t- e A = pt- eA. Jq
(12.6.5)
Notice in particular that, unlike p, the generalized momentum P depends explicitly on 4-position x’ . We need only follow the rules to define the Hamiltonian by
which must still, however, be expressed in terms of P rather than v. In Q. (12.3.4), the rest mass m ,the ordinary momentum p , and the ordinary or mechan&il energy &kin = m c 2 / d m were related by
&in = p2c2 + m2c4
(12.6.7)
Here we have used the symbol &kin, which, since it includes the rest energy, differs by that much from being a generalization of the “kinetic energy” of Newtonian mechanics. Nevertheless it is convenient to have a symbol for the energy of a particle ‘Some authors reverse the roles of uppercase P and lowercase p.
362
RELATIVISTIC MECHANICS
that accompanies its very existence and includes its energy of motion but does not include any “potential energy” due to its position in a field of force. Using Eq. (12.6.5) and Eq. (12.6.6). this same relation can be expressed in terms of P and H:
(H- e#)2 = (P - eA)2c2 + m2c4.
(12.6.8)
Solving for H yields
+
+
H ( x , P)= ,/m2c4 (P- eA(x’))*c2 e 4 ( n i ) .
(12.6.9)
Remember that the Hamiltonian is important two ways. One is formal; differentiating it appropriately leads to Hamilton’s equations. The other deals with its numerical value, which is called the energy, at least in those cases where it is conserved. Eq. (12.6.9) should seem entirely natural; the square root term gives the mechanical energy (remember that the second term under the square root is just c2 times the ordinary momentum) and the other term gives the energy it has by virtue of its charge e being located at position xi where the electric potential function is 4(xi . Corresponding to this Hamiltonian, the Hamilton-Jacobi equation is
(g+
2
e#) - (VS - eA)2c2 - m2c4 = 0.
(12.6.
12.7. DERIVATION OF THE LORENTZ FORCE LAW To obtain the equations of motion for our charged particle, we write the Lagrange equations with L given by Eq. (12.6.4). One term is V L = ~ v ( A ( x ’ )V) . - eV#(x’).
(12.7.1)
Remembering that the very meaning of the partial derivative symbol in the Lagrange equation is that v is to be held constant, the first term becomes
e V ( A . v ) = e ( v . V ) A + e v x ( V x A),
(12.7.2)
where a well-known vector identity has been used. The meaning of the expression (v . V)A is certainly unambiguous in Cartesian coordinates. Its meaning is ambiguous in curvilinear coordinates, but we assume Cartesian coordinates without loss of generality since this term will be eliminated shortly. With Eq.(12.7.2), and using Eq. (12.6.5), the Lagrange equation becomes d dt
-p
d + e-A dt
= e ( v - V)A
+ ev x (V x A) - eV#.
(12.7.3)
At this point a great bargain appears. For any function F ( x , t ) , the partial derivatives and the total derivative are related by
-dF
= -aF
dt
at
+ (v. V ) F
(12.7.4)
GAUGE INVARIANCE
363
The first term gives the change of F at a fixed point in space, and the second term gives the change due to the particle’s motion. This permits a hard-to-evaluate term on the left-hand side, d A / d t , and a hard-to-evaluate term on the right-hand side, ( v . V ) A ,to be combined to make an easy-to-evaluate term, yielding
aA 9 = -e-eV$+ev dt at
x (V x A).
(12.7.5)
At this point we introduce the electric field intensity E and the magnetic field intensity B defined by
aA E = -- - V $ , at
B=VxA.
(12.7.6)
Finally we obtain the so-called Lorentzforce law
_ dp - eE+ev dt
x B.
(12.7.7)
Since the A’ was arbitrary, the electric and magnetic fields are completely general, consistent with Eq.(12.7.6). From its derivation, the Lorentz equation, though not manifestly covuriunr, has unquestionable relativistic validity. It describes the evolution of spatial components. One can look for the corresponding time-component evolution equation. It is (12.7.8) and gives the change in mechanical energy due to the applied field. (Check it using Eq. (12.5.2).) This is entirely consistent with the Newtonian formula that the rate of change of energy is the rate that the external force (given by d p l d t ) does work, as given by the right-hand side of Eq. (12.7.8). Under the Lorentz force law, Eq. (12.7.7), since the magnetic force is normal to v, it follows that a magnetic field can never change the particle energy. Rather, the rate of change of energy is given by
dEkin - - eE . v , dt
(12.7.9)
just as should have been expected. 12.8. GAUGE INVARIANCE Though the 4-potential A’ E (4, A ) was introduced first, it is the electric and magnetic fields E and B that manifest themselves physically through the forces acting on charged particles. They must be determinable uniquely from the physical conditions. But because E and B are obtained from A’ by differentiation, there is a lack of uniqueness in A’, much like the “constant of integration” in an indefinite integral. In
364
RELATIVISTIC MECHANICS
electrostatics this indeterminacy has already been encountered; adding a constant to the electric potential has no observable effect. With the 4-potential, the lack of determinacy can be more complicated because a change in $ can be compensated for by a change in A. For mathematical methods that are based on the potentials, this can have considerable impact on the analysis, though not on the (correctly) calculated E and B fields. The invariance of the answers to transformations of the potentials is called “gauge invariance.” The gauge invariance of the present theory follows immediately from the action principle of Eq. (12.6.2).Suppose the action in that equation is altered according to
(12.8.1) where f ( x ) is an arbitrary function of position. When integrating over the revised Lagrangian, the extra term does not alter the action and hence does not affect determination of extremals. As a result, the physics is unaffected by this “change of gauge.” It is instructive also to confirm that this change in A’ has no effect on E and B when evaluated by Eq.(12.7.6).
12.9. TRAJECTORY DETERMINATION 12.9.1. Motion in a Constant Uniform Electric Field As an example using these equations consider a particle with charge e moving in a uniform electric field E directed parallel to the z-axis. Nonrelativistically one would solve rn dx/dt = eE, and this equation remains relativistically valid if cast in the form dP dt
- = eE.
(12.9.1)
With E constant this can be integrated once immediately,
+
(12.9.2)
p = PO eEt,
but further integration requires p to be expressed in terms of v = x. This becomes progressively more important as the motion becomes more relativistic, u 2 c, since, though p can increase without limit, v cannot. The best procedure for finding v is to find €kin, and use v = c2p/€kin. To simplify the algebra a bit, with no essential loss of generality, consider the special initial conditions illustrated in Fig. 12.9.1, with initial energy €0, initial mo, to the y-axis and normal to E, and origin chosen so the particle mentum p ~ ?parallel starts from x = €o/(eE), y = 0. From Eq. (12.9.2),using Eq. (12.6.7).
px = eEt,
p y = po.
€kin = ,/E:
+ (ceEt)2.
(12.9.3)
TRAJECTORY DETERMINATION
365
"1"' I
-
x c
Eo/eE FIGURE 12.9.1. Trajectory followed by a charged particle in a uniform electric field. The initial momentum is transverse to the electric field.
Then, using Eq. (12.3.4),
c2eEt
P0C2
€02 + ( c e E t )
(12.9.4)
Notice the superficially surprising result that v y approaches zero for large t. Integrating these yields
(12.9.5) The orbit equation can be obtained by eliminating t from this equation:
(12.9.6)
12.9.2. Motion in a Constant Uniform Magnetic Field The equation of motion in a magnetic field is
dP - = e v x B. dt
(12.9.7)
Since the force depends on v, this equation cannot be integrated immediately. However the motion is even simpler in the respect that €kin is conserved in a pure magnetic field. This follows immediately from Eq. (12.7.8): d&in
-= v . ( e v x B) = 0. dt
(12.9.8)
As a result, all of the quantities flu, yv, €, u , and p are constants of the motion. Assuming the magnetic field B& is directed along the z-axis, Eq. (12.9.7) becomes
366
RELATIVISTIC MECHANICS
dv ec2B v x 2. dt &o
(1 2.9.9)
Introducing the “cyclotron frequency” (constant in the nonrelativistic regime, but energy-dependent at high energy) ec2B
(12.9.10)
w, = -,
€0
the velocity components satisfy ux = o c u y ,
iry = -w,v,,
ljz
= 0.
(12.9.11)
With appropriate initial conditions these yield
(12.9.12) Integrating these yields the expected motion on a cylinder of radius r given by
(12.9.13)
12.10. THE LONGITUDINALCOORDINATE AS INDEPENDENTVARIABLE Highly relativistic particles usually belong to “beams” of more or less parallel particles. As in optics, it is then convenient to distinguish between “transverse” coordinates x and y and a “longitudinal” coordinate, now to be called s, and to use s rather than t as the independent variable. (In the “paraxial approximation,” if all particles are traveling at almost the speed of light, s and ct are approximately equal.) Ordinarily these coordinates are defined as “increments” relative to a “nominal” or “reference” particle that defines the center of the beam. That means that the triplet ( x , y, t) is to be regarded as being the quantities whose evolution as a function of s is to be described by the equations of motion. The motion of the reference particle is as, p ; ( ~ )f ,o ( s ) , p:(~)),where sumed to be known; it is given by (xo(s), p ; ( ~ )yo(s), P‘ will be defined shortly. For writing linearized equations we could define small differences, S x ( s ) = x(s) - A&), etc., but instead, at a certain point below, we will simply redefine the coordinates (x, P x , y , PY, t , P ‘ t ) as small deviations from the reference orbit. For now, they are absolute particle coordinates of a general particle in a global frame of reference. The transformation from t to s, as independent variable, in Hamiltonian language, is straightforward but confusing. For the moment suppress ( y , PY), since they enter
THE LONGITUDINAL COORDINATE AS INDEPENDENT VARIABLE
367
just like (x,P"). The Hamiltonian (12.6.9) has the form H = H ( x , P x , s, P s , t ) = Jm2c4
+ (P - eA(x))2c2+e4(x),
(12.10.1)
where the canonical momentum P and mechanical momentum p are related by
P=p+eA.
( 12.10.2)
Of Hamilton's equations, the ones we will refer to below are ds aH _ ---aps' dt
or
dt -=(=) ds
-1
,
(12.10.3)
and (12.10.4) Define a new variable P' = - H ( x , P x , S, P s , t ) .
(12.10.5)
This is to be solved for P s , with the answer expressed in terms of a function K , which will turn out to be the new Hamiltonian: P S = -K(X, P X ,t , P', s).
(12.10.6)
From Eqs. (12.6.9) and (12.6.5), it can be seen that the numerical value of -P' is the total energy € = €M ecp. The differential d P s can be obtained either directly from Eq. (12.10.6), or indirectly from Q. (12.10.5), using Eq. (12.10.3). The results are
+
an
d P S = --dx ax
= (-dP'-
- -aK: dPX
a PX
8H =dx
- -an dt at
aH a PX
- -dPX
an
- -dP' apt
aH
- -ds
as
an
- -d s
as
- -adHt at
)
(%)-I
. (12.10.7)
Equating coefficients, and using Eq. (12.10.4), as well as the other Hamilton equations in the original variables, the equations of motion in the new variables can be written in Hamiltonian form, with derivatives with respect to s being symbolized by primes:
(12.10.8)
368
RELATIVISTICMECHANICS
The manipulations that have been described can be performed explicitly, using Eqs. (12.10.1) and (12.10.5), with the result
K = -eAli - (PI + e#)2/c2
- m2c2
- (Pl- eA1)2,
(1 2.10.9)
where components parallel to and perpendicular to the reference orbit have been introduced. (In field-free regions we have
where the generalized momentum p' is minus the energy.) The Hamilton equations are (12.10.11) These equations can be "linearized" by approximating the right-hand side of Eq. (12.10.11) by the first term in a Taylor expansion:
2.10. As forewarned, the quantities xi and Pi are now to be interpreted as small deviations
from the known reference trajectory. The partial derivatives are evaluated on the reference trajectory. These equations correspond to a quadratic Hamiltonian,
BIBLIOGRAPHY
References for Further Study L. D.Landau and E.M.Lifshitz, The Classical Theory of Fields, Pergamon, Oxford, 1971.
13 SYMPLECTIC MECHANICS 13.1. DERIVATION OF HAMILTON’S EQUATIONS
“Sympiectic mechanics” is the study of mechanics using “symplectic geometry,” a subject that can be pursued with no reference whatsoever to mechanics. However, we will regard “symplectic mechanics” and “Hamiltonian mechanics” as essentially equivalent. For coherence, we start by rederiving Harniltonian mechanics, using however a more formal approach than in the previous chapter, where Harniltonian mechanics was introduced from a “Hamilton-Jacobi” perspective. Here we review the formal, analytical derivation of Hamilton’s equations starting from Lagrange’s equations. This is largely repetitive of material in Section 7.2, where the Routhian reduction procedure was described; there a single momentum variable was introduced and used to replace the second-order differential equation for its conjugate coordinate by two first-order equations. This amounted to treating one coordinate by Hamilton’s equations and all the others by Lagrange’s equations and was effective primarily if the coordinate was cyclic so the momentum variable was conserved. Here we transform all the equations into Hamiltonian form. Given coordinates q and Lagrangian L,“canonical momenta” are defined by (13.1.1)
p, is said to be “conjugate” to qj. To make partial differentiation like this meaningful, it is necessary to specify what variables are being held fixed. We mean implicitly that variables q’ for all i, q’ for i # j , and t are being held fixed. Having established variables p, it is absolutely required in all that follows that velocities q be explicitly expressible in terms of the q and p, as in
4’ = f’(q,p, r ) ,
Of
4 = f(q,P,t ) .
(13.1.2) 369
370
SYMPLECTIC MECHANICS
The prototypical example of this is (13.1.3) L = -m ( x * 2 + y- )2 - V ( x , y ) , p = - -aL ;=mr, r-= - .P 2 ar m Hamilton's equations can be derived using the properties of differentials. Define the "Hamiltonian" by
H ( q , p. t ) = p i f ' ( q , p, t ) - L ( q . f ( q , t ) . f ) , ~9
( 13.1.4)
where the functions f ' were defined in Eq. (13.1.2). If these functions are, for any reason, unavailable, the procedure cannot continue; it is absolutely obligatory that the velocity variables be eliminated in this way. Furthermore, as indicated on the left-hand side of E,q. (13.1.4), it is essential that the formal arguments of H be q and p and t. Then, when writing partial derivatives of H , it will be implicit that the variables being held constant are all but one of the q and p and t. If all independent variables of the Lagrangian are varied incrementally the result is (13.1.5)
(It is important to appreciate that the q' and the q' are being treated as formally independent at this point. Any temptation toward thinking of q' as some sort of derivative of q' must be fought off.) The purpose of the additive term pk f in the definition of H is to cancel terms proportional to dq' in the expression for dH;
= -pi dq'
aL + q'dpi - dt, at
(13.1.6)
where the Lagrange equations (5.3.8) as well as Eq. (13.1.2) have been used. This transformation is known as a "Legendre transformation." Such a transformation has a geometric interpretation,' but it is probably adequate to think of it as purely a formal manipulation. Similar formal manipulations are common in thermodynamics. Hamilton's first-order equations follow from Fq.(13.1.6):
aH pi = -7, 34
(13.1.7)
'The geometric interpretation of a Legendre transformationis discussed in Arnold [ I ] and Lanczos 121.
DERIVATION OF HAMILTON’S EQUATIONS
371
Remember that in the partial derivatives of H,the variables p are held constant, but in aL/at the variables q are held constant. For reminding oneself how p , q i - L metamorphoses into “total energy,” a good example is that of a particle in a 2-D potential, mentioned in Eq.(13.1.3). The Hamiltonian is H ( X ,p) = p x i
1 -2 1 1 + p y y - -mx - - m i 2 + V ( X , y ) = %(p: + p : ) + V ( X ,y ) . 2 2 ( 13.1.8)
This example is also a good way to remember which of the Hamilton equation gets a negative sign.
Problem 13.1.1: Recall Problem 10.1.1, which described rays approximately parallel to the z-axis, with the index of refraction given by n = no(1 + B(x2 + y 2 ) ) . Generalizing this a bit, allow the index to have the form n ( p ) where p = -./, Using ( p , 4) coordinates, where 9 is an azimuthal angle around the z-axis, write the Lagrangian L ( p , p’, 4, #, z) appropriate for use in Eq. (5.3.2). (As in that equation primes stand for d/dz.) Find momenta pp = aL/ap’ and p# = aL/a#’, and find the functions f’ defined in Eq. (13.1.2). Find an ignorable coordinate and give the corresponding conserved momentum. Write the Hamiltonian H according to Eq. (13.1.4). Why is H conserved? Take H = E. Solve this for and eliminate using the conserved momentum found earlier. In this way the problem has been “reduced to quadratures.” Write the integral that this implies.
4
Problem 13.1.2: For coordinates that are not rectangular, the kinetic energy ac, quires a somewhat more general form than in Eq. (13.1.8); T = i A r s ( q ) q r q S with V = V ( q ) . In this case, defining matrix B = A-’, find the Hamiltonian, write Hamilton’s equations, and show that the (conserved) value of H is the total energy E = T +V . The momentum components are proportional to the velocity components only if matrix A,, is diagonal, but they are always homogeneously related. 13.1.1. Charged Particle in ElectromagneticField Both as review and to exercise the Hamiltonian formalism, consider a nonrelativistic particle in an electromagnetic field. In Section 12.6, it is shown that the Lagrangian is
+
L = j m ( i 2+ j 2 + i 2 ) e ( A , i
+ A y j + A , i ) - e @ ( x ,y , z ) ,
(13.1.9)
where A is the vector potential and @ is the electric potential. The middle terms, linear in velocities, cannot be regarded naturally as either kinetic or potential energies. Nevertheless their presence does not impede the formalism. In fact, consider an even more general situation,
L = iArs(q)q‘qS
+ Ar(q)q‘ - V ( q ) .
(13.1.10)
372
SYMPLECTIC MECHANICS
Then Pr
= Ars$
+ Ar,
and
4‘‘ = Brs(ps - A r ) .
(13.1.1 1)
By comparing this with Problem 13.1.2, it can be seen that in this case the momentum and velocity components are inhomogeneously, though still linearly, related. The Hamiltonian is
and Hamilton’s equations follow easily.
13.2. RECAPITULATION We have seen that Newtonian and Lagrangan mechanics are naturally pictured in configuration space, while Hamiltonian mechanics is based naturally in phase space. This is illustrated in Fig. 13.2.1. In configuration space, one deals with spatial trajectories (they would be rays in optics) and “wavefront-like” surfaces that are transverse to the trajectories. A useful concept is that of a “congruence” or bundle of spacefilling, nonintersecting curves. A point in phase space fixes both position and slope
CONFIGURATION SPACE
PHASE SPACE
wavefronts f P
I
trajectories
Y
Trajectories can cross. Initial position does not determine trajectory.
trajectory of particle ( 1)
reference trajectory
Trajectories cannot cross. Initial position determines subsequent trajectory.
FIGURE 13.2.1. Schematic representation of the essential distinctions between configuration space and phase space. In phase space it is especially convenient to define a “referencetrajectory‘‘ as shown and to relate nearby trajectories to it.
THE SYMPLECTIC PROPERTIES OF PHASE SPACE
373
of the trajectory passing through that point, and as a result there is only one trajectory through any point and the valid trajectories of the mechanical system naturally form a congruence of space-filling, nonintersecting curves. This is in contrast to configuration space, where a rule relating initial velocities with initial positions must be given to define a congruence of trajectories. In Newtonian mechanics, it is natural to work on finding trajectories starting from the n second-order, ordinary differential equations of the system. In Hamilton-Jacobi theory one first seeks the wavefronts, starting from a partial differential equation. As stated already, both descriptions are based on configuration space. If the coordinates in this space are the 3n Euclidean spatial components, the usual Pythagorean metric of distances and angles applies and, for example, it is meaningful for the wavefronts to be orthogonal to the trajectories. Also, the distance along a trajectory or the distance between two trajectories can be well defined. The natural geometry of Hamiltonian mechanics is phase space and one seeks the trajectories as solutions of 2n first-order, ordinary differential equations. In this space, the geometry is much more restrictive, as there is a single trajectory through each point. Also, there is no natural metric by which distances and angles can be defined. “Symplectic geometry” is the geometry of phase space. It is frequently convenient, especially in phase space, to refer a bundle of system trajectories to a single nearby “reference trajectory,” as shown in Fig. 13.2.1.But because there is no metric in phase space, the “length” of the deviation vector is not defined. Even in Hamiltonian mechanics one usually starts from a Lagrangian L(q,q, t ) .
13.3. THE SYMPLECTIC PROPERTIES OF PHASE SPACE 13.3.1. The Canonical Momentum One-form Why are momentum components indicated by subscripts, when position components are indicated by superscripts? Obviously it is because momentum components are covariant whereas position components are contravariant. How do we know this? Most simply it has to do with behavior under coordinate transformations. Consider a transformation from coordinates q’ to Qi = Q’ (q).Increments to these coordinates are related by
aQ‘ dQ = - d q J a4
.
= A’j(q)dqj,
(13.3.1)
J
which is the defining equation for the Jacobean matrix A’ j ( q ) .This is a linear transformation in the tangent space belonging to the manifold M , whose coordinates are q. The momentum components P corresponding to new coordinates Q are given by
374
SYMPLECTIC MECHANICS
where ( h - ’ ) j l = a4’/aQ’.2 This uses the fact that . the. matrix of derivatives a q j / a Qiis the inverse of the matrix of derivatives 8 Q J / a q l .It is the appearance of the transposed inverse Jacobean matrix in this transformation that validates calling p a covariant vector. With velocity q (or displacement dq) residing in the tangent space, one says that p resides in the cotangent space. From Eq. (2.2.3) we know that these transformation properties ensure the existence of a certain invariant inner product. In the interest of making contact with the notation used there, we therefore introduce, temporarily at least, the symbol 7 for momentum. Then the technical meaning of the statement that 7 resides in the cotangent space is that the quantity (7,dq) = pi d4’ is invariant to the coordinate transformation from coordinates q to coordinates Q. As an alternate notation for 5, one can introduce a one-form or defined so that :(*) E (5,.), which yields a real number when acting operator if1) on increment dq. (The . in (E, .) is just a placeholder for dq.) It is necessary to distinguish mathematically between p dq and pi dq’, two expressions that a physicist is likely to equate mentally. Mathematically the expression pi dq’ is a one-form definable on any manifold, whether possessed of a metric or not, while p .dq is a more specialized quantity that is only definable if it makes sense for p and dq to be subject to scalar multiplication because they reside in the same metric space. The operator is known as a “one-form,” with the tilde indicating that it is a form and the superscript (1) meaning that it takes one argument. Let the configuration space, the elements of which are labeled by the generalized coordinates q, be called a “manifold” M. At a particular point q in M, the possible velocities q are said to belong to the “tangent space” at q, denoted by TMq. The operator “maps” elements of T M , to the space R of real numbers:
-
Consider a real-valued function f(q) defined on M, f : M + R.
As introduced in Section 2.4.5, the prototypical example of a o2e-form is the “differential” of a function such as f ;it is symbolized by #’) = dfq. An incremental deviation dq from point q is necessarily a local tangent vector. The corresponding (linearized) change in value of the function, call it dfq (not boldface and with no tilde) is proportional to dq. Consider the lowest-order Taylor approximation,
20ur convention i s that matrix elements such as A’, do not depend on whether the indices are up or down, but their order matters; in this case the order j then 1 is indicated by the I being spaced to the right.
THE SYMPLECTIC PROPERTIES OF PHASE SPACE
375
By the “linearized” value of df we mean this approximation to be taken as exact so that (13.3.3) this is “proportional” to dq in the sense that doubling dq doubles df,. If dq is tangent to a curve y passing through the point q then, except for a scale factor proportional to rate of progress along the curve, d f , can be regarded as the rate of change of f along the curve. Except for the scale factor, d f q is the same for any two curves that are parallel as thexpass through q. Though it seems convoluted at first, for the particular function f,df, therefore maps tangent vector dq to real number dfq;3
Zf, : TM, + R . To recapitulate, the quantity (p, .), abbreviated as $ or later even just as p, is said to be a “one-form,” a linear, real-valued function of one vector argument. The components pj of in a particular coordinate system, which in “classical” terminology are called covariant components, in “modern” terminology are the coefficients of a one-form. We are to some extent defeating the purpose of introducing one-forms by insisting on correlating their coefficients with covariant components. It is done because components are to a physicist what insulin is to a diabetic. A physicist says that “piq’ is manifestly covuriant (meaning invariant under coordinate transformation) because q i is contravariant and pi is covariant.” A mathematician says the same thing in coordinate-free fashion as “cotangent space one-form 5 maps tangent space vector q to a real number.” What about the physicist’s quantity peq? Here physicists (Gibbs initially I believe) have also recognized the virtue of intrinsic coordinate-free notation and adopted it universally. So p . q is the well-known coordinate-independentproduct of three factors, the magnitudes of the two vectors and the cosine of their included angle. But this notation implicitly assumes a Euclidean coordinate system, whereas the “one-form” notation does not. This may be the source of the main difficulty a physicist is likely to have in assimilating the language of modem differential geometry: Traditional vector calculus, with its obvious power, already contains the major benefits of intrinsic description without being burdened by unwelcome abstraction. But traditional vector analysis contains the implicit specialization to Euclidean geometry. This makes it all the more difficult to grasp the more abstract analysis required when Euclidean geometry is inappropriate. Similar comments apply with even greater force to cross products p x q and even more yet to curls and divergences. -i For particular coordinate q i , the coordinate one-form dq picks out the corre-i sponding component V i from arbitrary vector V as V‘ = (dq ,V). Since the components pi are customarily called “canonically conjugate” to the coordinates q’, the 3We use the notation d”f for the differentid-of f but, according to Section 4.4, where the operator is discussed, it is shown that this is the same as df.
376
SYMPLECTIC MECHANICS
one-form ( 13.3.4)
is said to be the “canonical momentum one-form.” Incidentally, when, one uses the differential form will evenexpanded in terms of its components as pi tually be replaced by an ordinary differential dq’ , and manipulations of the form will not be particularly distinguishable from the manipulations that would be performed on the ordinary differential. Nevertheless, it seems somewhat clearer when describwhich is a property ing a multiplicity of mechanical systems, to retain the form of the coordinate system, than to replace it with dq’, which is a reconfiguration of a particular mechanical system.
a,
G’,
a,
13.3.2. The Symplectic Two-Form ;3 In spite of having just gone to such pains to explain the appropriateness of using the symbol 5 for momentum in order to make the notation expressive, we now drop the tilde. The reason for doing this is that we plan to work in phase space, where q and p are to be treated on a nearly equal footing. Though logically possible, it would be simply too confusing, especially when introducing forms on phase space, to continue to exhibit the intrinsic distinction between displacements and momenta explicitly, other than by continuing to use subscripts for the momentum components and superscripts for generalized coordinates. By lumping q and p together we get a vector space with dimension 2n, double the dimensionality of the configuration space. (As established previously, there is no absolute distinction between covariadt and contravariant vectors per se.) Since we previously identified the p’s with forms in configuration space and will now proceed to introduce forms that act on p in phase space, we will have to tolerate the confusing circumstance that p is a form in configuration space and a portion of a vector in phase space. Since “phase space” has been newly introduced, it is worth mentioning a notational limitation it inherits from configuration space. A symbol such as x can mean either where a particle actually is or where, in principle, it c d d be; context is necessary to determine which is intended. Also, when the symbol iappears it usually refers to an actual system velocity, but it can also serve as a formal argument of a Lagrangian function. The same conventions have to be accepted in phase space. But the q’s and the p’s are not quite equivalent as the q’s are defined independently of any particular Lagrangian, while the definition of the meaning of the p ’ s depend on the Lagrangian. Still, they can refer either to a particular evolving system or to a possible configuration of the system. Mainly, then, in phase space the combined set q, p plays the same role as q plays in configuration space. In Problem 10.1.1 it was found that the quantity q ( z ) p 2 ( z ) - xz(z)pl(z) calculated from two rays in the same optical system is constant, independent of longitudinal coordinate z. This seemingly special result can be generalized to play a central role in Lagrangian (and hence Hamiltonian) mechanics. That is the immediate task.
THE SYMPLECTIC PROPERTIES OF PHASE SPACE
377
The simultaneous analysis of more than one trajectory at a time characterizes this newer-than-Newtonian approach. We start by reviewing some topics from Section 2.1. Recall Eq. (2.4.23), by which tensor product f = x 63 y is defined as a function of one-forms 6 and 5
fc;,’;> = (6,x)G, Y).
(13.3.5)
Furthermore, as in Eq. (2.4.29), a (mixed) tensor product f = 6 63 y can be similarly defined by
f(x, 3 = (ii, x) 6, y).
(13.3.6)
From quantities like this “wedge products” or “exterior products” are defined by
-
X A
y6,
v)
U A v(X, y) =
(X,
6 )(y,?) - (X, 7)(Y,S),
6,X) fi,y) - (ii,y)fi,X).
(13.3.7)
Another result from Section 2.1 was the construction of a multicomponent bivector from two vectors, x and y, with the components being the 2 x 2 determinants constructed from the components of the two vectors. As shown in Fig. 4.2.2 and again in Fig. 13.3.1, these components can be interpreted as the areas of the projections onto the coordinate axes of the paralellogram formed from the two vectors. They can also regarded (except for a possible combinatorial factor) as the components of an anti-symmetric two-component tensor, xi’, with x12 = x ’ y 2 - x 2 y 1 , etc. We now intend to utilize these quantities in phase space. As in geometric optics, we will consider not just a solitary orbit, but rather a congruence of orbits or, much of
FIGURE 13.3.1. The “projected area” on the first coordinate plane (9’. p1) defined by tangent vectors d q l ) = (dq(l),dp(l)ITand dz(2) = (dq(n), d ~ ( 2 ) ) ~ .
378
SYMPLECTICMECHANICS
the time, two orbits. As stressed already, in phase space there can be only one valid orbit through each point, which is the major formal advantage of working in phase space. To discuss two particular close orbits without giving preference to either, it is useful to refer them both to a reference path as in Fig. 13.2.1. Though it would not be necessary, this reference path may as well be thought of as a valid orbit as well. A point on one nearby orbit can be expressed by dz(1) = (dq(l), dp(l))Tand on the other one by dq2) = (dq(2,, dp(2))T. Consider a particular coordinate q , say the first one, and its conjugate momentu_m p. Since these can be regarded as functions in phase space, the differential forms d2 and are everywhere defined4 As in Eq. (2.3.l), when “coordinate one-form” dq operates on the vector dz(11,the result is
6
&(dz(l)) = d q ( l ) , and similarly &(dz(l$ = dp(1).
(13.3.8)
&,
Notice that it has been necessary to distinguish say, which is a form specific to the coordinate system, from dq(l),which is specific to a particular mechanical system (1). As usual, the placing of the (1) in parentheses, as here, “protects” it from being interpreted as a vector index. Consider then the wedge product5 (1 3.3.9)
Copying from Eq. (13.3.7), when Z operates on the two system vectors, the result is
This quantity vanishes when the components are proportional, but not, in general, otherwise. So far q and p have either referred to a one-dimensional system or are one pair of coordinates in a multidimensional system. To generalize to more than one configuration space coordinate we define (13.3.I 1) i-1
This is known as the “symplectic two-form” or, because conjugate coordinates are singled out, “the canonical two-form.’’ (The sum is expressed explicitly, rather than by the repeated index convention, to defer addressing the question of the geometric
6
4Recall that, since we are working in phase space, the symbol has a meaning different from what it would have in configuration space. Here it expects as argument a phase s.ace tangent vector dz. A notational difficulty we will have is that it is not obvious Khether the quantity dq is a one-form associated with one particular coordinate q or the set of one-forms dq’ cooresponding to all the coordinates 4’.We shall state which is the case every time the symbol is used. Here it is the former. 5T0be consistent, we should use z(’) to indicate that it is a two-form, but the symbol will be used so frequently that we leave off the superscript (2).
THE SYMPLECTIC PROPERTIES OF PHASE SPACE
379
character of the individual terms.) Acting on vectors u and v, this expands to
v)(Gj,u)>.
( 13.3.12)
(dz(l)*dz(2)) = C ( d q f l )dp(2)i - dqf2)dp(l)i)-
( 1 3.3.13)
u ) ( ~ v) i ,-
ij(u, v) =
(
~
'
3
i=l
When Z acts on dz(l) and d z ( 2 ) , the result is n
i=l
If the two terms are summed individually they are both scalar invariants, but it is more instructive to keep them paired as shown. Each paired difference, when acting on two vectors, produces the directed area of a projection onto one of the (q' , p i ) coordinate planes; see Fig. 13.3.1. For example, dq&)dp(z)l- dqfi)dp(l)lis the area of a projection onto the q ' , p1 plane. For one-dimensional motion there is no summation , and no projection, and Z(dz(l),dz(2)) is simply the area defined by ( d q ( l ) dp(1)) and (&(2), 4 7 ( 2 ) ) . As in Section 4.4.3, a two-form &(2) can be obtained by exterior differentiation of Zc1). Applying Eq. (4.4.10), (13.3.14)
This yields an alternative expression for the canonical two-form.
13.3.3. lnvariance of the Symplectic Two-Form Now consider the coordinate transformation from q' to Qi = Qi (9) discussed earlier in the chapter. Under this transformation
than the expression (The expression for the differential of Pj is more complicated . . for the differential of Q because the coefficients a QJ /aq' are themselves functions of position.) The Jacobian matrix elements satisfy (1 3.3.16)
After differentiation this yields
The factor %(*) in the final term in Eq. (13.3.15) can be evaluated using these a4 ~ Q J two results. In the new coordinates, the wedge product is
380
SYMPLECTIC MECHANICS
G. &’
Here the terms proportional to A with equal index values have vanished individually and those with unequal indices have canceled in pairs because they are odd under the interchange of i and I , whereas the coefficient a2Qif ag’aq‘ entering by virtue of Eq. (13.3.17) is even under the same interchange. To obtain the canonical two-form Z and demonstrate its invariance under coordinate transformation, all that has been assumed is the existence of generalized coordinates g’ and some particular Lagrangian L(q, q, r), as momenta pi were derived from them. One says that the phase space of a Lagrangian system is sure :o be “equipped” with the form G. It is this form that will permit the identificationof oneforms and vectors in much the same way that a metric permits the identification of covariant and contravariant vectors (as was discussed in Section 4.2.4). This is what will make up for the absence of the concept of orthogonality in developing within mechanics the analog of rays and wavefronts in optics. One describes these results as “symplectic geometry,” but the results derived so far, in particular Eq. (1 3.3.18), can be regarded simply as differential calculus. The term “symplectic calculus” might therefore be as justified.6 Another conclusion that will follow from Eq. ( I 3.3.18) is that the two-form A dpi evaluated for any two-phase space trajectories is “conserved’ as time advances. We will put off deriving this result (which amounts to being a generalized Liouville theorem) for the time being. It is mentioned at this point to emphasize that it follows purely from the structure of the equations-in particular from the definition in Eq. (13.1.1) of the momenta pj as a derivative of the Lagrangian with respect to velocity 4’. Since the derivation could have been completed before a Hamiltonian has even been introduced, it cannot be said to be an essentially Hamiltonian result, or a result of any property of a system other than the property of being characterized by a Lagrangian. For paraxial optics in a single transverse plane a result derived in Problem 10.1.1 was the invariance of the combination x(1)p(2)- x(2)p(1)for any two rays. This is an example of Eq. (13.3.18). Because that theory had already been linearized, the conservation law applied to the full amplitudes and not just to their increments. In general however the formula applies to small deviations around a reference orbit, even if the amplitude of that reference orbit is great enough for the equations of motion to be arbitrarily nonlinear.
-
G’
6Explanation of the source of the name symplectic actually legitimizes the topic as geometry since it relates to the vanishing of an antisymmetric form constructed from the coordinates of, say, a triplex of three points. The name “symplectic group” was coined by Hermann Weyl (from a Greek word with his intended meaning) as a replacement for the term “complex group” that he had introducedeven earlier, with “complex” used in the sense, “Is a triplex of points on the same line?” He intended “complex” to mean more nearly “simple” than “complicated and certainly not to mean But the collision of meanings become an embarrassment to him. Might one not therefore call a modem movie complex a “cineplectic group”?
n.
THE SYMPLECTIC PROPERTIES OF PHASE SPACE
381
13.3.4. Use of G to Associate Vectors and One-Forms To motivate ttus discussion recall, for example from Eq. (2.5.15) (which read xi = g i k x k ) that a metric tensor can be used to obtain covariant components xi from contravariant components x k ; this was “lowering the index.” The orthogonality of two vectors x i and y’ could then be expressed in the form x . y = X j y ’ = 0. The symplectic two-form discussed in the previous section can be written in the form G(.,.) to express the fact that it is waiting for two vector arguments, from which it will linearly produce a real number. It is important also to remember that, as a tensor, Z is antisymmetric. This means, for example, that Z(6 ii) = 0 where ii is any vector belonging to the tangent space TM, at system configuration x. For the time being, here we are taking a “belt and suspenders” approach of indicating a vector 5 with both boldface and overhead arrow. This is done only to stress the point and this notation will be dropped when convenient. Taking ii as one of the two vector arguments of G, we can define a new quantity (a one-form) G(.) by the formula u
u(*) = G(G,.).
(13.3.19)
This formula “associates” a one-form ’ii with the vector ii. Since the choice of whether to treat as the first or second argument in Eq. (13.3.19) was arbitrary, the sign of the association can only be conventional. The association just introduced provides a one-to-one linear mapping from the tangent space T M , to the cotangent space TMZ, spaces of the same dimensionality. For any particular choices of bases in these spaces, the association could be represented by matrix multiplication ui = A j j u j , where Ai, is an antisymmetric, square matrix with nonvanishing determinant, and which would therefore be invertible. Hence the association is one-to-one in both directions and can be said to be an isomorphism. The inverse map can be symbolized by
I : TM,* + T M , .
(13.3.20)
As a result, for any one-form 5there is sure to be a vector
ij = 1% such that
5= Z(5,
a).
(13.3.21)
An immediate (and important) example of this association is its application to gf, which is the standard one-form that can be constructed from any fynction f defined over phase-space; Eq. (13.3.21) can be used to generate a vector df = Igf from the one-form df so that
i f = I%
satisfies iif = Z(if, -1.
(13.3.22)
13.3.5. Explicit Evaluation of Some Inner Products Let q be a specific coordinate, say the first one, and p be its conjugate mome_ntum, and let f ( q , . . . , p , . . .) be a function defined on phase space. Again we use dq and
382
SYMPLECTIC MECHANICS
& tempo@ly
as the one-forms corresponding to these particular coordinates. The one-form df can be expressed two different ways, one according to its original definition, the other using the association (13.3.22) with G spelled out as in Eq.(13.3.12);
It follows that
In practice these equations would be applied to each of the individual pairs of coordinates and momenta. 13.3.6. The Vector Field Associated with
d7i
Since the Hamiltonian is a function on phase space, its differential one-form d% is well defined:
What is the associated vector dH = Z%? Fig. 13.3.2 shows the unique trajectory (q(t), p(t)) passing through some particular point q(o), p(0) and an incremental tangential displacement at that point represented by a 2n-component column vector
ai=
(t;) (jjdr.
(13.3.26)
=
(Our notation is not consistent, as, this time, dq and dp do stand for an array of components. Also, to reduce clutter we have suppressed the subscripts (0), which
P
-9 FIGURE 13.3.2. The vector d z , ~ ,= (dq(o,, dp(o,)T is tangent to a phase space trajectory given by q O ) ( t )= (q(o,(t),p(o)(t)).The trajectory is assumed to satisfy Hamilton's equations.
THE SYMPLECTIC PROPERTIES OF PHASE SPACE
383
were only introduced to make the point that what followed referred to one particular point.) Hamilton’s equations state
.). aH
($)=(--
(13.3.27)
a4*
and these equations can be used to evaluate the partial derivatives appearing in Eq.(13.3.25). The result is (13.3.28) On the other hand, evaluating the symplectic two-form on i z yields
where the inner products have been evaluated using Eq.(13.3.24). Dividing through by dt, the equation implied by Eqs. (13.3.28) and (13.3.29) can therefore be expressed using the isomorphism introduced in Eq. (13.3.21): -+
-
i=ZdH.
(13.3.30)
Though particular coordinates were used in deriving this equation, in the final form the relationship is coordinate-independent, which is to say that the relation is intrinsic. This is in contrast with the coordinate-dependent geometry describing the Hamilton-Jacobi equation in the previous chapter. In configuration space, one is accustomed to visualizing the force as being directed parallel to the gradient of a quantity with dimensions of energy, namely the potential energy. Here in phase space we find the system velocity related to (though not parallel to) the “gradient” of the Hamiltonian, also an energy. In each case, the motivation is to change the problem from that of finding vectorial quantities to the (usually) simpler problem of finding scalar quantities. This motivation should also be reminiscent of electrostatics, where one finds the scalar potential and from it the vector electric field.
13.3.7. Hamilton’s Equations in Matrix Form7 It is customary to represent first-order differential equations such as Eqs. (13.3.30) in matrix form. Since there are 2n equations, one wishes to represent the operator I , which has so far been entirely formal, by a 2n x 2n matrix. Actually, according to 7This section applies especally the geometry of covariant and contravariant vectors developed in Section 2.2.
384
SYMPLECTIC MECHANICS
Eq. (13.3.22), Q. (13.3.30) can be written even more compactly as = dH, but the right-hand side remains to be made explicit. When expressed in terms of canonical coordinates, except for sign changes and a coordinate-momentum interchange, the components of dH are the same as the components of the ordinary gradient of H.If we are prepared to refrain from further coordinate transformations (especially if they mix displacements and momenta), we can artificially introduce a Pythagorean relation in phase space (even though displacementsand momenta have different physical dimensions). Then incremental “distances” are given by ds 2 = d q . d q + d p . d p ,
(13.3.31)
where these are ordinary dot products of vectors. With this metric, the vector dp, -dq is orthogonal to dq, dp (see Fig. 13.3.3). In metric geometry, a vector can be associated with the hyperplane to which it is orthogonal. If the dimensionality is even and the coordinates are arranged in (9’, p i ) pairs, the equation of a hyperplane through the origin takes the form aiq’ +b’pi = 0. The vector with contravariantcomponentsq’ = 6‘ , pi = -ai is normal to this plane. On the other hand, the coefficients (ai , b’) in the equation for the hyperplane can be regarded as covariant components of the vector normal to the plane since the formula expressing its normality to vector (q;, poi) lying in the plane is qjqt; p’poi = 0. -+ In this way, a contravariant vector dz is associated with a covurianf vector, or oneform. Fig. 13.3.3 illustrates the vector with components (a, 6) normal to the dashed line with equation ax by = 0. With this (coordinate-dependent) identification, the isomorphism I can be expressed explicitly as
+
+
/aH\
where 0 is an n x n matrix of 0’s and 1 is an n x n diagonal matrix of 1’s. One equality sign in Eq. (13.3.32) is “qualified” because it equates intrinsic quantities to coordinate specific quantities. Notice that S is a rotation matrix yielding rotation
FIGURE 13.3.3. The line bx + ay = 0 is perpendicular to the line ax - by = 0.
SYMPLECTIC GEOMETRY
385
through 90” in the q , p plane when n = 1, while for n > 1 it yields rotation through 90” in each of the q‘ , pi planes separately. Using S, the 2n Hamilton’s equations can be written in the form 2
aH = -s-.
(13.3.33)
aZ
At this point it would have seemed more natural to have defined S with the opposite sign, but the choice of sign made in Eq. (13.3.32) will be convenient for making comparisons with notation standard in a different field in a later section. When the alternate symbol J = -S is used, Hamilton’s equations become i = J(aH/az). It should be reemphasized that, though a geometric interpretation has been given to the contravariantkovariant association, it is coordinate-dependent and hence artificial. Even changing the units of, say, momenta, but not displacements, changes the meaning of, say, orthogonality. It does not, however, change the solutions of the equations of motion. The isomorphism (13.3.32) can be applied to an arbitrary vector to relate its contravariant and covariant components
(13.3.34)
13.4. SYMPLECTIC GEOMETRY In the previous sections, the evolution of a mechanical system in phase space was codified in terms of the antisymmetricbilinear form z, and it was mentioned that this form plays a role in phase space analogous to the metric form in Euclidean space. The geometry of a space endowed with such a form is called “symplectic geometry.” The study of this geometry can be formulated along the same lines that ordinary geometry was studied in the early chapters of this text. In Chapter 3, one started with rectangular axes for which the metric tensor was the identity matrix. When skew axes were introduced, the metric tensor, though no longer diagonal, remained symmetric. Conversely it was found that, given a symmetric metric tensor, axes could be found such that it became a diagonal matrix-the metric form became a sum of squares (possibly with some signs negative). It was also shown that orthogonal matrices play a special role describing transformations that preserve the Pythagorean form, and the product of two such transformations has the same property. Because of this and some other well-known properties, these transformations were said to form a group, the orthogonal group. Next, when curvilinear coordinates were introduced, it was found that similar diagonalization could still be performed locally. Here we will derive the analogous “linearized” properties and will sketch the “curvilinear” properties heuristically.
386
SYMPLECTIC MECHANICS
13.4.1. Symplectic Products and Symplectic Bases
For symplectic geometry the step analogous to introducing a metric tensor was the step of introducing the “canonical two-form”
Here, in a step analogous to neglecting curvilinear effects in ordinary geometry, we have removed the differential “d” symbols because we now assume purely linear geometry for all amplitudes. Later, when considering “variational” equations that relate solutions in the vicinity of a given solution, it will be appropriate to put back the “d” symbols. (Recall that for any vector z = ( q ’ , P I ,q 2 , p 2 , . ..)T, one has q’ (z) = (G*, z) = q’ and so on.) The form i;l accepts two vectors, say w and z, as arguments and generates a scalar. One can therefore introduce an abbreviated notation
-
[w,21 = Law, 2 ) .
(1 3.4.2)
and this “skew-scalar” or “symplectic” product is the analog of the dot product of ordinary vectors. If this product vanishes, the vectors w and z are said to be “in involution.” Clearly one has
[w,21 = - [ z , wl and [z, 23 = 0,
(13.4.3)
so every vector is in involution with itself. The concept of vectors being in involution will be most significant when the vectors are solutions of the equations of motion. A set of n independent solutions in involution is said to form a “Lagrangian set.” An example of such a set is given below in Section 15.4. The skew-scalar products of pairs drawn from the 2n basis vectors e,l, eq2, . . .
and e p ’ , ep2,. . .
( 13.4.4)
are especially simple; (with no summation implied) [edr) , e p ( i ) ] = 1,
and all other basis vector products vanish.
(13.4.5)
To express this in words, in addition to being skew-orthogonal to itself, each basis vector is also skew-orthogonal to all other basis vectors except that of its conjugate mate, and for that one the product is f l . Any basis satisfying these special product relations is known as a “symplectic basis.” Though the only skew-symmetric form that has been introduced to this point was that given in Eq. (13.4.1), in general a similar skew-product can be defined for any skew-symmetric form 6 whatsoever. Other than linearity, the main requirements for i;l are those given in Eq. (13.4.3). but to avoid “degenerate” cases it is also necessary to require that there be no nonzero vector orthogonal to all other vectors. With these properties satisfied, the space together with Zi is said to be symplectic. Let N stand for its dimensionality. A symplectic basis like (13.4.4) can be found
SYMPLECTIC GEOMETRY
387
for the space. To show this one can start by picking any arbitrary vector u1 as the first basis vector. Then, because of the nondegeneracy requirement, there has to be another vector, call it v1, that has a nonvanishing skew-scalar product with u1, and the product can be made exactly 1 by appropriate choice of a scale factor multiplying v1. If N = 2, then n = N / 2 = 1, and the basis is complete. For N > 2, if an appropiate multiple of u1 is subtracted from a vector in the space, the resulting vector either vanishes or has vanishing skew-scalar product with u1. Perform this operation on all vectors. The resulting vectors form a space of dimensionality N - 1 that is said to be “skew complementary” to u1; call it U1.It has to contain v1. Similarly one can find a space V1 of dimensionality N - 1 skew complementary to v1. Since V1 does not contain v1, it follows that U1 and Vl do not coincide, and hence their intersection, call it W, has dimension N - 2. On W we must and can use the same rule .] for calculating skew-scalar products, and we now check that this product is nondegenerate. If there were a vector skew-orthogonal to all elements of W ,because it is also skew-orthogonal to u1 and v1 it would have to be skew-orthogonal to the whole space, which is a contradiction. By induction on n we conclude that the dimensionality of the symplectic space is even, N = 2n, and since a symplectic basis can always be found (as in Fq.(13.4.5)), all symplectic spaces of the same dimensionalityare isomorphic, and the skew-scalar product can always be expressed as in J2q. (13.4.1). The arguments of this section have assumed linearity, but they can be generalized to arbitrary curvilinear geometry and, when that is done, the result is known as Darboux’s theorem. From a physicist’s point of view, the generalization is obvious since, looking on a fine enough scale, even nonlinear transformations appear linear. A variant of this “argument” is that, just as an ordinary metric tensor can be transformed to be Euclidean over small regions, the analogous property should be true for a symplectic “metric.” This reasoning, however, is only heuristic (see Arnold [ 1, p. 2301 for further discussion). [a,
13.4.2. Symplectic Transformations
For symplectic spaces, the analogs of orthogonal transformation matrices (which preserve scalar products) are symplectic matrices M (that preserve skew-scalar products). The “transform” Z of vector z by M is given by
Z = Mz.
( 1 3.4.6)
The transforms of two vectors u and v are Mu and Mv, and the condition for M to be symplectic is
[MU,Mv] = [u, v].
(13.4.7)
If MI and M2 are applied consecutively, their product, M2M1, is necessarily also symplectic. Since the following problem shows that the determinant of a symplectic matrix is 1, it follows that the matrix is invertible, and from this it follows that the symmetric transformations form a group.
388
SYMPLECTICMECHANICS
Problem 13.4.1: In a symplectic basis, the skew-scalar product can be reexpressed as an ordinary dot product by using the isomorphism I defined in Eq.(13.3.21), and I can be represented by the matrix S defined in Eq. (13.3.32). Using the fact that det IS1 = 1, adapt the argument of Section 4.1.1 to show that det 1MI = 1 if M is a symplectic matrix. 13.4.3. Properties of Symplectlc Matrices Vectors in phase space have dimensionality 2n and, when expressed in a symplectic basis, have the form ( q l , p1, q 2 ,p 2 , . . .)T or ( q l , q 2 , .. ., PI, p 2 , . . . ) T , whichever one prefers. Because it permits a more compact partitioning, the second ordering is more convenient for writing compact, general matrix equations. But when motion in one phase space plane, say (ql , P I ) , is independent of, or approximately independent of, motion in another plane, say (q2, p2), the first ordering is more convenient. In Eq.(13.3.32). the isomorphism from covaria$ to contravariant components was expressed in coordinates for a particular form dH. The inverse isomorphism can be applied to arbitrary vector to yield a form %
1%S(:)
where S = ( l 0 -1 o)
.
(13.4.8)
(The qualified equality symbol % acknowledges that the notation is a bit garbled, with the left-hand side appearing to be intrinsic and the right-hand side expressed in components.) Using Eq.(13.4.8) it is possible to express the skew-scalar product [w,z] of vectors w and z (defined in Eq.(13.4.2)) in terms of ordinary scalar products and from that a quadratic form:
Since displacements and momenta are being treated homogeneously here, it is impossible to retain the traditional placement of the indices for both displacements and momenta. Eq.(13.4.9) shows that the elements -Sj, are the coefficients of quadratic form giving the skew-scalar product of vectors and Zb in terms of their components:
When the condition Eq. (13.4.7) for a linear transformationM to be symplectic is expressed with dot products as in Eq. (13.4.9), it becomes su . v = SMU
+
M V= M ~ S M UV..
( 1 3.4.1 1)
SYMPLECTIC GEOMETRY
389
This can be true for all u and v only if
M ~ S M= s.
(13.4.12)
This is an algebraic test that can be applied to a matrix M whose elements are known explicitly to determine whether or not it is symplectic.
Problem 13.4.2: Show that condition (13.4.12) is equivalent to
M S M ~= s.
(13.4.13)
Problem 13.4.3: Hamilton’s equations in matrix form are (13.4.14) and a change of variables with symplectic matrix M,
z = MZ, is performed. Show that the form of Hamilton’s equations is left invariant. Such transformations are said to be “canonical.” A result equivalent to Eq. (13.4.12) is obtained by multiplying it on the right by
M-’ and on the left by S: M-’ = -SMTS.
(13.4.15)
This provides a handy numerical shortcut for determining the inverse of a matrix that is known to be symplectic since the right-hand side requires only matrix transposition and multiplication by a matrix whose elements are mainly zero, and the others fl.Subsequent formulas will be abbreviated by introducing to be called the “symplectic conjugate” of arbitrary matrix A by
x,
-
A = -SATS.
( 13.4.16)
A necessary and sufficient condition for matrix M to be symplectic is then
M-’ =
m.
(13.4.17)
From here on, when a matrix is symbolized by M, it will implicitly be assumed to be symplectic and hence to satisfy this equation. For any 2 x 2 matrix A, with S given by Eq.(13.4.8), substitution into Eq. (13.4.16) yields (13.4.18)
390
SYMPLECTIC MECHANICS
assuming the inverse exists. Hence, using Eq. (13.4.17), for II = 1 a necessary and sufficient condition for syrnplecticity is that det IMI = 1. For n > 1, this condition will shortly be shown to be necessary, but it can obviously not be sufficient since Eqs. (13.4.17) imply more than one independent algebraic condition. For most practical calculations, it is advantageous to list the components of phase space vectors in the order z = (4’, p1, q 2 , p2)‘ and then to streamline the notation further by replacing this with z = ( x , p , y. q ) T . (Here, and when the generalization to arbitrary n is obvious, we exhibit only this n = 2 case explicitly.) With this ordering, the matrix S takes the form 0 - 1 0
0
. = ( I 0 0O 0 O - 01 ). 0
0
1
( 13.4.19)
0
Partitioning a 4 x 4 matrix M into 2 x 2 blocks, it and its symplectic conjugate are (13.4.20) The eigenvalues of a symplectic matrix M will play an important role in the sequel. The “generic” situation is for all eigenvalues to be unequal, and that is much the easiest case for the following discussion since the degeneracy of equal eigenvalues causes the occurrence of indeterminate ratios that require special treatment in the algebra. Unfortunately there are two cases where equality of eigenvalues is unavoidable: (1) Systems often exhibit symmetries that, if exactly satisfied, force equality among certain eigenvalues or sets of eigenvalues. This case is more a nuisance than anything else since the symmetry can be removed either realistically (as it would be in nature) or artificially; in the latter case the perturbation can later be reduced to insignificance. It is very common for perturbing forces of one kind or another, in spite of being extremely small, to remove degeneracy in this way. (2) It is often appropriate to idealize systems by one or more variable “control parameters” that characterize the way the system is adjusted externally. Since the eigenvalues depend on these control parameters the eigenvalues may have to become equal as a control parameter is varied. It may happen that the system refuses to allow this (see Problem 1.2.5). or sometimes the eigenvalues can pass through each other uneventfully. In any case, typically the possibility of such “collisions” of the eigenvalues contributes to the “essence” of the system under study, and following the eigenvaluesthrough the collision or absence of collision is essential to the understanding of the device. For example, a “bifurcation” can occur at the point where the eigenvalues become equal, and in that case the crossing point marks the boundary of regions of qualitatively different behavior. In spite of this inescapability of degeneracy, in the interest of simplifying the discussion, for the time being we will assume all eigenvalues of M are distinct. When discussing approximate methods in a later chapter, it will be necessary to rectify this oversimplification.
SYMPLECTIC GEOMETRY
391
The eigenvalues h and eigenvectors +A of any matrix A satisfy the “eigenvalue” and the “eigenvector” equations det IA - 111 = 0,
and A+, = h+A.
(13.4.21)
Since the determinant is unchanged when A is replaced by A T ,a matrix and its transpose share the same set of eigenvalues. From Eq. (13.4.16) it follows that the symplectic conjugate & also has the same set of eigenvalues. Then, from Eq. (13.4.17) it follows that the eigenvalue spectrum of a symplectic matrix M and its inverse M-’ are identical. For any matrix, if h is an eigenvalue, then l/h is an eigenvalue of the inverse. It follows that if h is an eigenvalue of a symplectic matrix, then so also is l/h. Even if all the elements of M are real (as we assume), the eigenvectors can be complex, and so can the eigenvalues. But here is where symplectic matrices shine. Multiplying the second of Eqs. (13.4.21) by M-’ and using Eq. (13.4.17), one concludes both that
Writing h = reie, then l / h = ( l / r > C i eis also an eigenvalue, and these two eigenvalues are located in the complex A-plane, as shown in Fig. 13.4.la. However, it also follows from the normal properties of the roots of a polynomial equation that if an eigenvalue I = reie is complex then its complex, conjugate A* = re-” is also an eigenvalue. This is illustrated in Fig. 13.4.lb. It then follows, as shown in Figures 13.4.1~ and 13.4.ld, that the eigenvalues can only come in real reciprocal pairs, or in complex conjugate pairs lying on the unit circle, or in quartets as in Figure 13.4.1~. For the cases illustrated in Fig. 13.4. Id, these requirements can be exploited algebraically by adding the equations (13.4.22) to give
(M +
= A+A
where A = A.
which shows that the eigenvalues A of M explicitly in the 4 x 4 case yields
+
+ A-’,
(13.4.23)
are real. Performing the algebra
MfB=(CA +B +A D B+C B)
(13.4.24)
where the off-diagonal combination E and its determinant € are defined by
E=C+Br
(i g),
and & = d e t J E ( = e h - fg.
(13.4.25)
392
SYMPLECTIC MECHANICS
0
Imh I
FIGURE 13.4.1. (a) If A = rei8 is an eigenvalue of a symplectic matrix, then so also is l / h = (l/r)e-”. (b) If an eigenvalue A = re’’ 1s complex, then its complex conjugate A* = re-ie is also an eigenvalue. (c) Ifany eigenvalue is complex with absolute value other than 1, the three complementary points shown are also eigenvalues. (d) Ergenvalues can come In pairs only if they are real (and reciprocal) or lie on the unit circle (symmetrically above and below the real axis).
The eigenvalue equation is8 (trD - h ) l
= h2-(trA+trD)h+trAtrD-&
= 0 (13.4.26)
whose solutions are
+
AA,D = (@A trD)/2
f
J;rA-lrD)2I4+E.
(13.4.27)
The eigenvalues have been given subscripts A and D to facilitate discussion in the common case that the off-diagonal elements are small, so the eigenvalues can be associated with the upper-left and lower-right blocks of M, respectively. Note that the eigenvalues satisfy simple equations A A + A D = trA
+ trD,
A A A D = trAtrD - &.
(13.4.28)
*It i s not in general valid to evaluate the determinant of a partitioned matrix treating the blocks as if they were ordinary numbers, but it is valid if the diagonal blocks are individually proportional to the identity matrix, as is the case here.
SYMPLECTIC GEOMETRY
393
Though we have been proceeding in complete generality and this result is valid for any n = 2 symplectic matrix, the structure of these equations all but forces one to contemplatethe possiblility that E be “small,” which would be true if the off-diagonal blocks of M are small, as would be the case if the x and y motions were independent or almost independent. Calling x “horizontal” and y “vertical,” one says that the offdiagonal blocks B and C “couple” the horizontal and vertical motion. If B = C = 0, the horizontal and vertical motions proceed independently. The remarkable feature of Eqs. (13.4.28) is that, though B and C together have eight elements capable of not vanishing, they shift the eigenvalues only through the combination €. In Eq. (13.4.27), we should insist that A ( D ) go with the +(-) sign respectively when trA - trD is positive and vice versa. This choice ensures that, if & is in fact small, the perturbed eigenvalue AA will correspond to approximatelyhorizontal motion and AD to approximately vertical. Starting from a 4 x 4 matrix, one expects the characteristic polynomial to be quartic in h, but here we have found a characteristic polynomial quadratic in A. The has nothing but pairs of degenerate reason for this is that the combination M roots, so the quartic characteristic equation factorizes exactly into the square of a quadratic equation. We have shown this explicitly for n = 2 (and for n = 3 in a problem below), but the result is true for arbitrary n. Anticipating results to appear later on, multiplying M by itself repeatedly will be of crucial importance for the behavior of Hamiltonian systems over long periods of time. Such powers of M are most easily calculated if the variables have been transformed to make M diagonal, in which case the diagonal elements are equal to the eigenvalues. Then, evaluating M’ for large (integer) I, the diagonal elements are A’ and their magnitudes are I h f , which approach 0 if 1311 < 1 or 00 if lhl > 1. Both of these behaviors can be said to be “trivial.” This leaves just one possibility as the case of greatest interest. It is the case illustrated on the left in Fig. 13.4.1dall eigenvalues lie on the unit circle. In this case there are real angles p~ and p~ satisfying
+a
+
A A = e i p A ediCLA = 2 cos P A ,
AD = ,&WD + e--iPD = 2cospD.
(13.4.29)
In the special uncoupled case, for which B and C vanish, these angles degenerate into F~ and p y ,the values appropriate for pure horizontal and vertical motion, and we have
The sign of determinant E has special significance if the uncoupled eigenvalues are close to each other. This can be seen most easily by rearranging Eqs. (13.4.27) and (1 3.4.29) into the form ( C O S ~ A- C
1 OS~D = )-(trA ~ - trD)* +€. 4
(13.4.31)
394
SYMPLECTIC MECHANICS
If the unperturbed eigenvaluesare close, the first term on the right-hand side is small. Then for E c 0, the perturbed eigenvalues A can become complex (which pushes the eigenvalues A off the unit circle, leading to instability.) But if E > 0 the eigenvalues remain real and the motion remains stable (at least for sufficiently small values of & > 0). An even more important inference can be drawn from Eqs. (13.4.27) and (13.4.29). If the parameters are such that both c o s p ~< c o s p ~lie in the (open) range -1 < c o s p ~< c o s p ~CI 1, then both angles p~ and are real and the motion is “stable.” What is more, for sufficiently small variations of the parameters the eigenvalues, because they must move smoothly, cannot leave the unit circle and these angles necessarily remain real. This means the stability has a kind of “robustness” against small changes in the parameters. This will be considered in more detail later. Pictorially, the eigenvalues on the left in Fig. 13.4.ld have to stay on the unit circle, as the parameters are varied continuously. Only when an eigenvalue “collides” with another eigenvalue can the absolute value of either eigenvalue deviate from 1. Furthermore, if the collision is with the complex conjugate mate, it can only occur at either +I or -1. The reader who is not impressed that it has been possible to find closed-form algebraic formulas for the eigenvalues of a 4 x 4 matrix should attempt to do it for a general matrix ((a, b, c, d ) , (e, f,g, h ) , . . .). It is symplecticity that has made it possible. To exploit our good fortune we should also find closed-form expressions for the eigenvectors. One can write a four-component vector in the form z=(:>,
where
x=(;)
One can then check that the vectors’ X=(
&
)
and
and
c=(:>
( I 3.4.32)
Y = ( TE l )
(13.4.33)
h-trDX
+
+
satisfy the (same) equations (M M-’)X = AX and (M M-’)Y = AY for either eigenvalue and arbitrary x or 5 . If we think of & as being small so that the eigenvectorsare close to the uncoupled solution, then we should select the A factors so that Eqs. (13.4.33) become (13.4.34) In each case, the denominator factor has been chosen to have a “large” absolute value so as to make the factor multiplying its two-component vector “small.” In this way, the lower components of X and the upper components of Y are “small.” In the limit of vanishing & only the upper components survive for x-motion and only the lower for y. This formalism may be mildly reminiscent of the four-component spin vectors describing relativistic electrons and positrons. 91f the coupling is strong it is technically advantageous to define X with an explicit factor A protects the lower component from divergence. Similarly Y is defined with factor A - trA.
- trD that
SYMPLECTIC GEOMETRY
395
There is another remarkable formula that a 4 x 4 symplectic matrix must satisfy. A result from matrix theory is that a matrix satisfies its own eigenvalue equation. Applying this to M one has
+ m,
(M+
- (AA
+ AD) (M + M)+ AA A D = 0.
(13.4.35)
Rearranging this yields
M2
+m2 - (AA + h ~ ) ( M + m +) 2 +
A A A D = 0.
(13.4.36)
By using Eq. (13.4.28), this equation can be expressed entirely in terms of the coefficient of M
M2 +
m2- (tr A + trD)(M + m) + 2 + tr A tr D - E = 0.
Problem 13.4.4: Starting with M (13.4.35) explicitly.
(13.4.37)
+ M expressed as in Eq. (13.4.24), verify Eq.
Problem 13.4.5: Find the equation analogous to Eq. (13.4.36) that is satisfied by a 6 x 6 symplectic matrix A B E M = C D F ((3 H J )
(13.4.38)
It is useful to introduce off-diagonal combinations
(13.4.39) The eigenvalue equation for A = A
+ 1/A is cubic in this case,
A 3 - p l A 2 -p2A - p 3 = O ,
(13.4.40)
but it can be written explicitly, and there is a procedure for solving a cubic equation. The roots can be written in terms of the combinations
This is of more than academic interest as the Hamiltonian motion of a particle in three-dimensional space is described by such a matrix.
13.4.4. Alternate Coordinate Ordering The formulas for symplectic matrices take on a different appearance when the coordinates are listed in the order z = ( q l , q 2 ,. . . , p1, p2, . . .)T.With this ordering, the
390
SYMPLECTIC MECHANICS
2n x 2n matrix S takes the form ( I 3.4.42) with each of the partitions being n x n . Partitioning M into 2 x 2 blocks, it and its symplectic conjugate are ( 13.4.43)
Subscripts a have been added as a reminder of the alternate coordinate ordering. This formula has the attractive property of resembling the formula for the inverse of a 2 x 2 matrix. With this ordering, the symplectic product defined in Eq. (13.4.9) becomes
This combination, which we have called a “symplectic product,” is sometimes called “the Poisson bracket” of the vectors &, and iQ,but it must be distinguished from the Poisson bracket of scalar functions to be defined in the next section. 13.5. POISSON BRACKETS OF SCALAR FUNCTIONS Many of the relations of Hamiltonian mechanics can be expressed compactly in terms of the Poisson brackets that we now define.
13.5.1. The Poisson Bracket of Two Scalar Functions Consider two functions f(z) = f(q, p) and g(z) = g(q, p) defined on phase space. From them can be formed $f and c&g and from them (using the symplectic 2-form G and the standard association) the vectors i f and i g . The “Poisson bracket” of functions f and g is then defined by
(f,8 ) =
aif,43).
(13.5.1)
Spelled out more explicitly as in Eq. (13.3.12), this becomes
where the scalar products have been obtained using Eqs. (13.3.24). Though the terms in this sum are individually coordinate-dependent,by its derivation the Poisson bracket is itself coordinate-independent.
397
POISSON BRACKETS OF SCALAR FUNCTIONS
One application of the Poisson bracket is to express time evolution of the system. Consider the evolution of a general function f(q(t), p(t), t) as its arguments follow the phase space system trajectory. Its time derivative is given by
= If, HI
af + at.
(13.5.3)
In the special case that the function f has no explicit time dependence, its time derivative is therefore given directly by its Poisson bracket with the Hamiltonian.
13.5.2. Properties of Poisson Brackets The following properties are easily derived:
Jacobi Identify
If,
Leibnitz Property
Ig, h ) I + Iflf2,
Is, Ih, f 1) + Ih, If,
g) = f1 If2, g )
a
Explicit Time Dependence - [f1, at
+ f2 If1
(13.5.4)
g ) ) = 0.
(13.5.5)
gl .
f2) =
Jacobi’s Theorem
THEOREM 13.5.1: If {H, f i ) = 0 and (H, f2) = 0, then (H, ( f l ,
f2))
= 0.
Proof:
Corollary: If f1 and f2 are “integrals of the motion,” then so also is { f l , the form in which Jacobi’s theorem is usually remembered.
f2).
This is
398
SYMPLECTIC MECHANICS
Perturbation Theory: Poisson brackets will be of particular importance in perturbation theory when motion close to integrable motion is studied. Using the term “orbit element” frequently used in celestial mechanics to describe an integral of the unperturbed motion, the coefficients in a “variation of constants” perturbative procedure will be expressible in terms of Poisson brackets of orbit elements, which are therefore themselves also orbit elements whose constancy throughout the motion will lead to important simplification.
13.5.3. The Poisson Bracket and QUantUm Mechanics 73.5.3.7. CommutationRelations: In Dirac’s formulation, there is a close correspondance between the Poisson brackets of classical mechanics and the commutation relations of quantum mechanics. In particular, if u and u are dynamical variables, their quantum mechanical “commutator” [u,U ] Q M E u u - uu is given by
where h is Planck’s constant (divided by 27r) and ( u , u ) is the classical Poisson bracket. Hence, for example,
In the Schrodinger representation of quantum mechanics, one has q --f q and p +. -iha/aq, where q and p are to be regarded as operators that operate on functions f (4).One can then check that ( 13.5.10)
in agreement with Eq. (13.5.9).
13.5.3.2. Time Evolution of Expectation Values: There needs to be “correspondence” between certain quantum mechanical and classical mechanical quantities in order to permit the “seamless” metamorphosis of a system as the conditions it satisfies are varied from being purely quantum mechanical to being classical. One such result is that the expectation values of quantum mechanical quantities should evolve according to classical laws. A quantum mechanical particle is characterized by a Hamiltonian H , a wave function 9,and the wave equation relating them:
a\t
ih-
at
= HYI.
(13.5.11)
INTEGRAL INVARIANTS
399
The expectation value of a function of position f ( q ) is given by ( 13.5.12)
Its time rate of change is then given by
?= J
(F* **-w
+ **f
af
f q J+
at
at
=/((?)* =JW*[:+pZf
+f H )
1
1
Wdq.
( 13.5.13)
In the final step the relation H* = H required for H to be a “Hermitean” operator has been used. To assure that
.
-
_
f =f,
(13.5.14)
we must then have
f
af
=at
+ -[iti1
f, HI.
(13.5.15)
When the quantum mechanical commutator [H, f ] is related to the classical Poisson bracket { H,f )as in Eq. (13.5.8), this result corresponds with the classical formula for f given in Eq.(13.5.3).
13.6. INTEGRAL INVARIANTS 13.6.1. Integral Invariants in Electricity and Magnetism In anticipation of some complications that will arise in studying integral invariants, it would be appropriate at this time to digress into the distinction between local and global topological properties in differential geometry. Unfortunately, discussions of this subject, known as “cohomology” in mathematics texts, are formidably abstract. Fortunately, physicists have already encountered some of the important notions in concrete instances. For this reason we digress to develop some analogies with vector integral calculus. Since it is assumed the reader has already encountered these results in the context of electricity and magnetism, we employ that terminology here, but with inessential constant factors set equal to 1; this includes not distinguishing between the magnetic vectors B and H. In the end the subject of electricity and magnetism will have played only a pedagogical role.
400
SYMPLECTIC MECHANICS
We have already encountered the sort of analysis to be performed in geometric optics. Because of the “eikonal equation” Eq. (10.1.11). n(dr/ds) = Vd was the gradient of the single-valued eikonal function 4. The invariance of the line integral of n ( d r / d s ) for different paths connecting the same endpoints then followed, which was the basis of the “principle of least time.” There was potential for fallacy in this line of reasoning however, as Problem 13.6.1 is intended to illustrate. One recalls from electrostatics that the electric field is derivable from a single-valued potential 9~such that E = -V@E, and from this one infers that $, E . ds = 0, where the integration is taken over a closed path called y . One then has the result that:/ E . ds is independent of the path from PI to 9.Poincark introduced the terminology of calling such a path-independent integral an “integral invariant” or an “absolute integral invariant.” But the single-valued requirement for @ E is not easy to apply in practice. It is more concise in electrostatics to start from V x E = 0, rather than E = - V i p ~ ,since this assures $, E ds = 0. (You can use Stokes’ theorem to show this.) Though V x E = 0 implies the existence of @ E such that E = -V@E, the converse does not follow.
-
Problem 13.6.1: The magnetic field H of a constant current flowing along the z-axis has only x and y components and depends only on x and y. Recalling (or looking up) the formula for H in this case, and ignoring constant factors, show that
H = V@M
where @M =tan-’
l. X
In terms of polar coordinates r and 8 , one has x = r cose and y = r sin8. After expressing H in polar coordinates, evaluate $, H . ds where y is a complete circle of radius ro centered on the origin. Comment on the vanishing or otherwise of this integral. After having completed Problem 13.6.1, let us consider magnetostatic fields. In current-free regions of space, the magnetostatic field can be derived from a “magnetic” potential, H = - V 9 w . Why is this consistent with Amphe’s law, $y Hads = I, where the line integral is taken over closed path y . and where I is the (nonzero in general) current linking y? In electromagnetism V x H = J, where “current density” J is, in general, nonvanishing. One has to distinguish between J’s vanishing over regions near the integration path into which the contour is allowed to be deformed and its vanishing everywhere on a surface bounded by the integration path. In the former case, the integral $, H . d s is independent of path, consistent with Am@re’s law, but the integral necessarily vanishes only in the latter case. This may be most likely to seem paradoxical when the current density vanishes everywhere except in an “infinitesimally fine” wire carrying finite current I and linking the integration path. Recapitulating, the vanishing of J everywhere near the path of integration is insufficient; for the vanishing of the line integral of H to be assured, the curl has to vanish everywhere on a surface bounded by the path of integration. Let us formalize these considerations.
INTEGRAL INVARIANTS
401
“Physical”Argument: For symplectic mechanics, it is the mathematical equivalent of Amphe’s law that we will need to employ. That law follows from the equation
VxH=J. Integrating this relation over a surface Stokes’s theorem, the result is
rl
bounded by closed curve y1 and using
1, L, H . ds =
(13.6.1)
(V x H) .da =
s,,
J . da,
(13.6.2)
giving the “flux” of J through surface rl. As shown in Fig. 13.6.1, since J is “current density,” it is natural to visualize the flow lines of J as being the paths of steady current. The flow lines through y1 form a “tube of current.” The flux of J through y1 would also be said to be the “total current” flowing through y1. If another closed loop y2 is drawn around the same tube of current, then it would be linked by the same total current. From this “physical” discussion, the constancy of this flux seems to be “coming from” the conservation of charge, but the next section will show that this may be a misleading interpretation.
“Mathematical”Argument: Much the same argument can be made with no reference whatsoever to the vector J. Rather, refemng again to Fig. 13.6.1, let H be any vector whatsoever, and consider the vector V x H obtained from it. The flow lines of V x H passing through closed curve y1 define a “tube.” Further along this tube
I
/
/
z surface
lLX
c4
/
’
Y,
flowlinesof
V x H through Y,
FIGURE 13.6.1. A “tube” formed by flowlines of V x H passing through closed curve y1. The part C between y1 and another closed curve y2 around the same tube forms a closed volume when it is “capped“ by surfaces r1 bounded by y1 and r2 bounded by y2.
402
SYMPLECTICMECHANICS
is another closed curve M linked by the same tube. Let the part of the tube’s surface between y1 and M be called X.The tube can be visualized as being “capped” at one end by a surface I’l bounded by y1 and at the other end by a surface r2 bounded by n to form a closed volume V. Because it is a curl, the vector V x H satisfies
V.(VxH)=O
(1 3.6.3)
throughout the volume, and it then follows from Gauss’s theorem that
where d V is a volume differential and da is a normal, outward-directed surface area differential. By construction, the integrand vanishes everywhere on the surface X.” Then applying Stokes’s theorem again yields
f, H . d s =
H .ds.
(13.6.5)
t.9
Arnold refers to this as “Stokes’s lemma.” Poincark introduced the terminology “relative integral invariant” for such quantities. Since H can be any (smooth) vector, the result is purely mathematical and does not necessarily have anything to do with the “source” of H. This same mathematics is important in hydrodynamics,where H is the velocity of fluid flow and the vector V x H is known as the “vorticity,” its flow lines are known as “vorticity lines,” and the tube formed from these lines is known as a “vorticity tube.” This terminology has been carried over into symplectic mechanics. One reason this is being mentioned is to point out the potential for misinterpretationof this terminology. The terminology is in one way apt and in another way misleading. What would be misleading would be to think of H as in any way representing particle velocity even though H stands for velocity in hydrodynamics. What is apt, though, is to think of H as a static magnetic field or rather to think of . I= V x H as the static current density causing the magnetic field. It is the flow lines of J that are to be thought of as the analog of the configuration space flow lines of a mechanical system, and these are the lines that will be called vortex lines and will form a vortex tube. H tends to wrap around the current Aow lines, and Amfire’s law relates its ‘‘circulation’’ H - ds for various curves y linked by the vortex tube.
13.6.2. The PoincarHartan Integral Invariant In spite of having identified a possible hazard, we boldly apply the same reasoning to mechanics as we applied in deriving the principle of least time in optics. In the space of g, p, and t-known as the (time) extended phase space-we continue to analyze “Later in the chapter there will be an analogous “surface” integral, whose vanishing will be similarly essential.
INTEGRAL INVARIANTS
403
the set of system trajectories describable by function S(q, t ) satisfying the HamiltonJacobi equation. The “gradient” relations of Eq. (1 1.1.12) were asla? = -H and a S / a q i = p i . If we assume that S is single-valued, it follows that the integral from Pl : (q(l),?l) to p : (q9 t > , (13.6.6) which measures the change in S in going from PI to P , is independent of path. This is called the “Poincar6-Cartan integral invariant,” which for brevity we designate by 1.1. The integration path is a curve in “extended configuration space,” which can also be regarded as the projection onto the extended coordinate space of a curve in extended phase space; it need not be a physically realizable orbit, but the functions pi and H must correspond to a particular function S such as in Eq. (1 1.1.12). Unfortunately it will turn out that the requirement that S be nonsingular and single-valued throughout space is too restrictive in practice, and a more careful statement of the invariance of 1.1. is (13.6.7) where the integrationpaths y1 and yz are closed (in phase space, though not necessarily in time-extended phase space) and encircle the same tube of system trajectories. The evaluation of 1.1. for a one-dimensional harmonic oscillator is illustrated in Fig. 13.6.2-in this case the solid curve is a valid system path in extended phase space. Because t k form in the integrand is expanded in terms of coordinates, the differential form dq can be replaced by ordinary differential d x . Energy conservation in simple harmonic motion is expressed by P2 + -1k x 2 = E,
2m
2
(13.6.8)
FIGURE 13.6.2. Extended phase space for a one-dimensional simple harmonic oscillator. The heavy curve is a valid system trajectory and also a possible path of integration for the evaluation, the PoincarMartan integral invariant.
404
SYMPLECTIC MECHANICS
as the figure illustrates. This is the equation of the ellipse, which is the projection of the trajectory onto a plane of constant 1. Its major and minor axes are and Integration of the first term of Eq. (13.6.6) yields
m.
f p ( x ) d x = / / d p d x = n m m = 2 n E m = ET,
(13.6.9)
since the period of oscillation is T = 2 n m . The second term of Eq. (13.6.6) is especially simple because H = E and it yields - E T . Altogether 1.1. = 0. If the path defining 1.1. is restricted to a hyperplane of fixed time t , like curve y1 in Fig. 13.63, then the second term of (13.6.6) vanishes. If the integral is performed over a closed path y . the integral is called the “Poincark relative integral invariant” R.I.I. (13.6.10)
This provides an invariant measure of the tube of trajectories bounded by curve y1 and illustrated in Fig. 13.6.3. Using the differential form terminology of Section 4.4.2, this quantity is written R.I.I.(t) =
f E(t)
(13.6.11)
and is called the circulation of p(t) about y . Since this integral is performed over a closed path, its value would seem to be zero under the conditions hypothesizedjust before Eq. (13.6.6), but we have found its value to be 2n E m , which seems to be a contradiction. Clearly the R.I.I. acquires a nonvanishing contribution because S is not single-valued in a region containing the
FIGURE 13.6.3. A bundle of trajectories in extended phase space, bounded at time tl by curve y1. The constancy of R.I.I., the Poincare relative integral invariant, expresses the equality of line integrals over y1 and w.This provides an invariant measure of the tube of trajectories bounded by curve y1.
INVARIANCE OF THE POINCARE-CARTAN INTEGRAL INVARIANT 1.1.
405
integration path. Looking at Eq. (1 1.2.37), obtained as the Hamilton-Jacobi equation was being solved for this system, one can see that the quantity aSo/aq is doubly defined for each value of q , This invalidates any inference about the R.I.I. integral that can be drawn from Eqs. (1 1.1.12). This shows that, though the Hamilton-Jacobi gradient relations for p and H provide an excellent mnemonic for the integrands in the 1.1. integral, it is improper to infer integral invariance properties from them.
*13.7. INVARIANCE OF THE POINCARE-CARTAN INTEGRAL INVARIANT 1.1.” 13.7.1. The Extended Phase Space Two-Form and Its Special Eigenvector Eq. (13.6.11) shows that the integral appearing in 1.1. is the circulation of a oneform ;(’). To analyze it using the analog of Stokes’s lemma that was used in the electromagneticexample described above, it is necessary to define the vortex tube of a one-form, and this requires first the_definitionof a vortex line of a one-form. We start by finding the exterior derivative dG(’) as defined in Eq. (4.4.10). To make this definite, let us analyze the “extended momentum one-form’’ (13.7.1) which is summed on i and is in fact the one-form appearing in 1.1. This is the canonical coordinate version of the standard momentum one-form with the one-form H d? subtracted. In this case, the integral is to be evaluated along an n 1-dimensional curve in the 2n 1-dimensional,time-extended phase space. For all the apparent similarities between Fig. 13.6.1 and Fig. 13.6.3, there are important differences, with the most important one being that the abscissa axis in the latter is the time t. Since all the other axes come in canonical conjugate pairs, the dimensionality of the space is necessarily odd. A two-form (E ;) can be obtained by exterior differentiation of -3(E1), as in Eq. (4.4.10);
+
+
“This section depends on essentially all the geometric concepts that have been introduced in the text. Since this makes it particularly difficult, it should perhaps only be skimmed initially. But because the proof of Liouville’s theorem and its generalizations, probably the most fundamental results in classical mechanics, and the method of canonical transformation depend on the proof, the section cannot be said to be unimportant. h o l d has shown that the more elementary treatment of this topic by Landau and Lifshitz is incorrect. Other texts, such as Goldstein, do not go beyond proving Liouville’s theorem, even though that just scratches the surface of the rigorous demands that being symplectic places on mechanical systems. This section depends especially on Section 4.4.
406
SYMPLECTIC MECHANICS
As in Eq.(13.4.9), the content of this two-form can be reexpressed by associating a
one-formZE = @(ZE.
a),
with arbitrary extended phase space displacement vector
dq2 dp;! d t ) T
% ~ = ( d q l dpl
(13.7.3)
One can then define an extended skew-scalar product of two vectors: l2Ebq 2 E a l E
4 2 )
-
( 13.7.4)
= U E (ZEb, i E a ) .
ti’[
This can in turn be expressed as a quadratic form as in Eq.(13.4.10):
o
-1
[ZEb, i E a l E =
o
dPb2
1 0
o
o
0
0
o
o
o o
-1 aH aH aH
aH
a41
apl
apZ
o
K?
1
-aH/aq’ -aH/apl -aH/aq2 - a ~ / a p ~ 0 (13.7.5)
The partial derivatives occurring as matrix elements are evaluated at the particular point in phase space that serves as origin from which the components in the vectors are reckoned.
Problem 13.7.1: Show that the determinant of the matrix in Eq. (13.7.5) vanishes but that the rank of the matrix is 4. Generalizing to arbitrary dimensionality n, show that the corresponding determinant vanishes and that the rank of the corresponding matrix is 2n. If one accepts the result of this problem the determinant vanishes and, as a result, it is clear that zero is one of the eigenvalues. One confirms this immediately by observing that the vector (1 3.7.6)
(or any constant multiple of this vector) is an eigenvector with eigenvalue 0. Furthermore, one notes from Hamilton’s equations that this vector is directed along the unique curve through the point under study. It will be significant that the vector uLH),because it is an eigenvector with eigenvalue 0, has the property that (13.7.7) for arbitrary vector WE. Recapitulating, it has been shown that the Hamiltonian system evolves in the direction given by the eigenvector of the (2n 1) x (2n 1) matrix derived from the 2-form dFE. This has been demonstrated explicitly only for the case n = 2,
+
+
407
INVARIANCE OF THE POINCARE-CARTAN INTEGRAL INVARIANT 1.1.
but it is not difficult to extend the arguments to spaces of arbitrary dimension. Also, though specific coordinates were used in the derivation, they no longer appear in the statement of the result. 13.7.2. Proof of tnvariance of the Poincar6 Relative Integral Invariant Though we have worked only on a particular two-form, we may apply the same reasoning to derive the following result known as Stokes’s lemma. Suppose that G(’) is an arbitrary two-form in a 2n 1 odd-dimensional space. For reasons that will become clear immediately, we start by seeking a vector uo having the property that d2)(uo,v) = 0 for arbitrary vector v. As before, working with specific coordinates, we can introduce a matrix A such that the skew scalar product of vectors u and v is given by
-
+
G(’)(u, V) = AU . V.
(13.7.8)
Problem 13.7.2: Following Eqs. (13.4.9), show that the matrix A is antisymmetric. Show also that the determinant of an arbitrary matrix and its transpose are equal, and also that, if it is odd-dimensional, changing the signs of every element has the effect of changing the sign of its determinant. Conclude therefore that A has zero as one eigenvalue. Accepting the result of the previous problem, we conclude that (if the stated conditions are met) a vector uo can always be found such that, for arbitrary v,
-
w(’)(uo, v) = 0.
(13.7.9)
This relation will be especially important when &(’) serves as the integrand of an area integral as in Eq. (4.4.9) and the vector uo lies in the surface over which the integration is being performed since this causes the corresponding contribution to the integral to vanish.
-
Vortex Lines ofa One-Form: If the two-form &(’) for which the vector uo was just found was itself derived from an arbitrary one-form G(’)according to G(’) = dG(’), then the flowlines of ~0 are said to be the “vortex lines” of ;(I). We now wish to employ Stokes’s theorem for forms (4.4.8)to a vortex tube such as is shown in Fig. 13.6.3. (Temporarily ignore that Fig. 13.6.3 illustrates phase space; think of the space as arbitrary and of the curves as vortex lines of the arbitrary form ;(I).) The curve y1 can be regarded, on the one hand, as bounding the surface rl and, on the other hand, as bounding the surface consisting of both )3 (formed from the vortex lines) and the surface I‘2 bounded by y2. Applying Stokes’s theorem (4.4.8) to curve y1, the area integrals for these two surfaces are equal. But we can see from the definition of the vortex lines that there is no contribution to the area integral coming from the area C. (The vortex lines belong to G(l),the integrand is &(I), and the grid by which the integral is calculated can be formed from differential areas each having one side aligned with a vertex line. Employing Eq.(13.7.9),
408
SYMPLECTIC MECHANICS
one finds that the contribution to the integral from every such area vanishes.) We conclude therefore that ( 1 3.7.10)
This is known as “Stokes’s lemma for forms.” It is the vanishing of the integral over C that has been essential to this argument, which should be reminiscent of the discussion given earlier of Ampthe’s law in electromagnetism. All that was required to prove that law was the vanishing of a surface integral, and it was anticipated in a footnote to that derivation that the argument would be repeated. We again specialize to phase space and consider a vortex tube belonging to the - H &. The vortex lines for this form are extended momentum one-form p , shown in Fig. 13.6.3, and we have seen that these same curves are valid trajectories of the Hamiltonian system. This puts us in a position to prove
G‘
R.I.I. =
1
pi dq = independent of time.
(13.7.11)
-i
In fact, this has already been done because Eq. (13.7.10) implies Eq. (13.7.11).This completes the proof of the constancy in time of the Poincark relative integral invariant R.I.I. under The constancy of R.I.I. is closely related to the invariance of Gi A coordinate transformations, as was shown earlier. The new result is that the system evolution in time preserves the invariance of this phase space area. This result is most readily applicable to the case in which many noninteracting systems are represented on the same figure, and the curve y encloses all of them. Since points initially within the tube will remain inside, and the tube area is preserved, the density of particles is preserved. The dimensionality of R.I.I. is
Gi
[R.I.I.] = [arbitrary] x
[arbitrary/time]
= [energy x time] = [action]. (13.7.12)
Knowing that Planck’s constant h is called “the quantum of action,” one anticipates connections between this invariant and quantum mechanics. Pursuit of this connection will lead first to the definition of adiabatic invariants as physical quantities subject to quantization. It will be shown that R.I.I. is an adiabatic invariant. It is interesting to reflect on the similarities and differences between the present derivation of the invariance of R.I.I. and the earlier derivation of Ampi?re’s law. Consider, for example, the flux through the differential element defined by the vectors uo and v shown in Fig. 13.6.3. (For simplicity we use the same figure for both electromagnetism and differential forms discussions.) Actually, two cases are shown. In the first case, both uo and v lie in the surface of the vortex tube; in the second case, though uo lies in the surface, v does not. In the two-form integration, the fact that uo is everywhere parallel to a vortex line already assures the vanishing of the differential
SYMPLECTIC SYSTEM EVOLUTION
409
contribution corresponding to uo and v whether or not v lies in surface of the vortex tube. In the case of a magnetic field of a straight current-carrying wire, the field H is parallel to the surface and hence its flux vanishes in the first case. This tempts one to leap to the (incorrect) conclusion that the vanishing of the surface integral is due to this. One might then be troubled by the nonvanishing of the flux of H in the second case since the vanishing in the calculation using forms depended only on uo, with no reference to the direction of v. The resolution of this “paradox” is that one leaped to the wrong conclusion in the magnetostatics case-the vanishing results from V x H lying in the surface, and the direction of H itself is irrelevant.
13.8. SYMPLECTIC SYSTEM EVOLUTION According to Stokes’s theorem for forms, Eq. (4.4.8), an integral over surface r is related to the integral over its bounding curve y by (13.8.1) Also, as in Eq.(13.7.2), we have
-d(pi a?;.)= -a?;’ &. A
(13.8.2)
With Cj = p i q , these relations yield
Since the left-hand side is an integral invariant, so also is the right-hand side. Because it is an integral over an open region, the latter integral is said to be an absolute integral invariant, unlike R.I.1, which is a relative integral invariant because its range is closed. It is not useful to allow the curve y of R.I.I. to become infinitesimal,but it is useful to extract the integrand of the absolute integral invariant in that limit, noting that it is the same quantity that has previously been called the canonical two-form -Yi
dq
A
- = canonical two-form G = invariant.
dpi
(13.8.4)
The “relative/absolute” terminology distinction does not seem particularly helpful to me, but the invariance of the canonical two-form will lead immediately to the conclusion that the evolution of a Hamiltonian system can be represented by a symplectic transformation. For the simple harmonic oscillator, the R.I.I. was derived in Eq. (13.6.9) using
k j p(x)dx =
11
dpdx.
(13.8.5)
W o important comments can be based on this formula. One is that, for area integrals in a plane, the relation Eq. (13.8.3) here reduces to the formula familiar from
410
SYMPLECTIC MECHANICS
elementary calculus by which areas (two-dimensional) are routinely evaluated by one-dimensional integrals. The other result is that the phase space area enclosed is independent of time. Because this system is simple enough to be analytically solvable, the constancy of this area is no surprise, but for more general systems this is an important result. One visualizes any particular mechanical system as one of a cloud of noninteracting systems, each one represented by one point on the surface r of Q.(13.8.3).Such a distribution of particles can be represented by a surface number density, which we may as well regard as uniform, since I' can be taken to be arbitrarily small. (For a relative integral invariant, there would be no useful similar picture.) As time increases, the systems move, always staying in the region r ( t )internal to the curve y ( t ) formed by the systems that were originally on the curve y . (It might be thought that points in the interior could in time change places with points originally on y , but that would require phase space trajectories to cross, which is not allowed.) Consider systems close to a reference system that is initially in configuration z(0) and later at z ( t ) , and let Az(t) be the time-varying displacement of a general system relative to the references system. By analogy with Eq. (13.4.6),the evolution can be represented by
Az(t) = M(t)Az(O).
(13.8.6)
We can now use the result derived in Section 13.4.2. As defined in Eiq. (13.4.2), the skew-scalar product [z, (t), Z b ( t ) ] formed from two systems evolving according to Eq. (13.8.6),is the quantity R.I.I. discussed in the previous section. To be consistent with its invariance, the matrix M(t) has to be symplectic. 13.8.1. Liouville's Theorem and Generalizations
In Sections 4.2.2 and 4.2.3, the geometry of bivectors and multivectors was discussed. This discussion, including Fig. 4.2.2, can be carried over to the geometry of the canonical two-form. Consider a two-rowed matrix (13.8.7)
whose elements are the elements of phase space vectors z, and Zb. By picking two columns at a time from this matrix and evaluating the determinants, one forms the elements x'j of a bivector z, A Zb:
(13.8.8)
By introducing p vectors and arraying their elements in rows, one can form p-index multivectors similarly. As in Eq. (4.2.9).after introducing a metric tensor g'j and us-
SYMPLECTIC SYSTEM EVOLUTION
411
ing it to produce covariant components x i j . . . k , one can define an “area” or “volume” (as the case may be) V by
We have used the notation of Eq. (13.4.2) to represent the skew-invariant products of phase space vectors. For p = 1, we obtain V;) = ZiZi = det I [z, Z] I = 0;
(13.8.10)
like the skew-invariant product of any vector with itself, it vanishes. For p = 2, we obtain
As we have seen previously, if the vectors Zi represent (time-varying) system configurations, the elements of the matrix in Eq. (13.8.9). such as
are invariant. (As shown in Fig. 13.3.1, the first term in this series can be interpreted as the area defined by the two vectors after projection onto the q I , p’ plane, and similarly for the other terms.) Since its elements are all invariant, it follows that V(p) is also invariant. In Section 4.2.3, this result was called “the Pythagorean relation for areas.” One should not overlook the fact that, though the original invariant given by Eq. (13.8.12) is a linear sum, of areas, the new invariants given by Eq. (13.8.9) are quadratic sums. The former result is a specifically symplectic feature, while the new invariants result from metric (actually skew-metric in our case) properties. A device to avoid forgetting the distinction is always to attach the adjective Pythagorean to the quadratic sums. By varying p , we obtain a sequence of invariants. For p = 2 we obtain the original invariant, which we now call V(2) = [zl , z21. Its (physical) dimensionality is [action] and the dimensionality of V(2p) is [actionlp. The sequence terminates at p = 2n, since beyond there all multivector components vanish, and for p = n, except for sign, all multivector components have the same value. Considering n = 2 as an example, the phase space is four-dimensional and the invariant is
412
SYMPLECTIC MECHANICS
If the vectors have been chosen so the first two lie in the 4’.p ’ plane and the last two lie in the q 2 , p 2 plane, the matrix elements in the upper right and lower left quadrants vanish and V(4)is equal to the product of areas defined by the first, second and third, fourth pairs. This is then the “volume” defined by the four vectors. It is the invariance of this volume that is known as Liouville’s theorem. If noninteracting systems, distributed uniformly over a small volume of phase space, are followed as time advances, the volume they populate remains constant. Since their number is also constant, their number density is also constant. Hence one also states Liouville’s theorem in the form the density of particles in phase space is invariant if their evolution is Hamiltonian. Liouville’s theorem itself could have been derived more simply, since it follows from the fact that the determinantof a symplecticmatrix is 1, but obtaining the other invariants requires the multivector algebra.
BIBLIOGRAPHY References 1 . V. I. Arnold, Mathematical Methods of Classical Mechanics, 2nd ed., Springer-Verlag. New York, 1989. 2. C. Lanczos, The Variational Principles of Mechanics, University of Toronto Press, Toronto, 1949.
VI APPROXIMATE METHODS
Hardly any problems of mechanics (or any other area of physics) are exactly solvable but many are close to exactly solvable systems and they constitute the bread and butter of the subject. Here the exactly solvable system will be called the “unperturbed” system, the forces causing the system to be no longer solvable will be called “perturbations,” and the actual system of interest will be called the “perturbed” system. There is a vast literature describing approximation methods and their application to practical problems. In the next chapter the analytic bases of some of these methods will be investigated. In the chapter after that linear systems are analyzed. Because linear systems can almost always be solved exactly, it may seem artificial to include them among approximate methods. But usually systems are linear only because terms making them nonlinear have been dropped. Furthermore, most nonlinear methods assume that some degree of linear analysis has preceded the application of nonlinear methods. In the final chapter, practical methods of solution and examples will be studied in greater detail. Most of the early developments in this field came in the field of celestial mechanics, and some of the terminology derives from that source. A valid trajectory of the solvable system is known as the unperturbed “orbit,” and the constants specifying the orbit geometry are known as “orbit elements.” The method of “variation of constants” is the closest thing there is to a universal approach to analyzing the perturbed motion. At every instant, the dynamic values of the system are matched to the newly-allowed-to-be-variableorbit element values.
This Page Intentionally Left Blank
14 ANALYTIC BASIS FOR AP PROXlMATI0N Once equations of motion have been found they can usually be solved by straightforward numerical methods, but numerical results rarely provide much general insight and it is productive to develop analytic results to the extent possible. Since it is usually believed that the most essential “physics” is Hamiltonian, considerable effort is justified in advancing the analytic formulation to the extent possible without violating Hamiltonian requirements. One must constantly ask, “Is it symplectic?” In this chapter, the method of canonical transformation will be introduced and then exercised by being applied to nonlinear oscillators. Oscillators of one kind or another are probably the systems most frequently analyzed using classical mechanics. Some, such as relaxation oscillators, are inherently nonsinusoidal, but many exhibit motion that is approximately simple harmonic. Some of the sources of deviation from harmonicity are (usually weak) damping, Hooke’s law violating restoring forces, and parametric drive. Hamiltonian methods, and in particular phase space representation, are especially effective at treating these systems, and adiabatic invariance, to be defined shortly, is even more important than energy conservation.
14.1. CANONICAL TRANSFORMATIONS 14.1.l.The Action as a Generator of Canonical Transformations We have encountered the Jacobi method in relation to Hamilton-Jacobi theory while developing analogies between optics and mechanics. However, one may also come upon this procedure in a more formal context while developing the theory of “canonical transformation,” which means transforming the equations in such a way that Hamilton’s equations remain valid. The motivation for restricting the field of acceptable transformations in this way is provided by the large body of certain knowledge one has about Hamiltonian systems, much of it described in the previous chapter. 415
416
ANALYTIC BASIS FOR APPROXIMATION
From a Hamiltonian system initially described by “old” coordinates q ’ , q 2 , . . . ,q“ and “old” momenta p1, p2, . . . , p,,,we seek appropriate transformations ( q ’ , q 2,...,q”;pi,p2 ,... , p d +
( Q ’ , Q’,..., Q”;Pi,P2,..., Pfl) (14.1.1)
to “new coordinates” Q’ , Q 2 , . . . , Q“ and “new momenta” P I , P2,. . . , P,,.’ (Within the Jacobi procedure these would have been known as j?-parameters and a-parameters, respectively.) Within Lagrangian mechanics we have seen the importance of variational principles in establishing the invariance to coordinate transformation of the form of the Lagrange equations. Since we have assigned ourselves essentially the same task in Hamiltonian mechanics, it is appropriate to investigate Hamiltonian variational principles. This method will prove to be successful in establishing conditions that must be satisfied by the new Q and P variables. Recall the Poincark-Cartan integral invariant 1.1. defined in Eq. (13.6.6). and from it write the closely related “Hamiltonian, variational” line integral H.I.:2 (14.1.2) Other than starting at PI and ending at P2 (and not being “pathological”), the path of integration is arbitrary in the extended phase space q’, p ; , and r. It is necessary, however, for the given Hamiltonian H(q, p, t) to be appropriately evaluated at every point along the path of integration. Here we use the modified symbol H.I. to indicate that p and H are not assumed to have been derived from a solution of the HamiltonJacobi equation, as they were in Eq. (13.6.6). In particular, the path of integration is not necessarily a solution path for the systed H.I. has the dimensions of action and we now subject it to analysis something like that used in deriving the Lagrange equations from 1L dt. In particular, we seek the integration path for which H.I. achieves an extreme value. In contrast to coordinate, velocity space where the principle of extreme action was previously analyzed, consider independent smooth phase space variations (SqJp) away from an arbitrary integration path through fixed endpoints ( P I ,t l ) and (9, t 2 ) . (Forgive the fact that P is being used both for momentum components and to label endpoints.) Evaluating the variations of its two terms individually, the condition for H.I. to achieve an extreme value is
o=
’
l:tr(.;+ dq’
aH
pid(6q’) - -6q’ 34’
aH d t - -6p; api
)
dt .
(14.1.3)
It would be consistent with the more formally correct mathematical notation introduced previously to use the symbol F; for momentum pi since the momenta are more properly thought of asfonns, but this is rarely done. *It was noted in the previous chapter that, when specific coordinates are in use, the differential forms are eventually replaced by the old-fashioned differentials dq’ and similarly for the other differential forms appearing in the theory. Because we will not be insisting on inrrinsic description, we make the replacement from the start.
a.
CANONICALTRANSFORMATIONS 6 P dq
P
417
varied path unvaried path
db; P
4
-
*
FIGURE 14.1.l.Areas representingterms Sp dq+p d(Sq) in the Harniltonian variational integral.
1
The last two terms come from H d t just the way two terns came from S L d t in the Lagrangian derivation. Where the first two terms come from is illustrated in Fig. 14.1.1. At each point on the unvaried curve, incremental displacements Sq(q) and 6 p ( q ) locate points on the varied curve. Since the endpoints are fixed, the deviation Sp vanishes at the ends and d ( 6 q i )must average to zero in addition to vanishing at the ends. With a view toward obtaining a common multiplicative factor in the integrand, using the fact that the endpoints are fixed, the factor pi d(6q') can be replaced by -6q' d p i , as the difference d(pi6q') is a total differential. Then, since the variations 6q' and 6pi are arbitrary, Hamiltons equations follow:
aH q'. = ,
and
pi
aH =--.
api
(14.1.4)
34'
It has therefore been proved that Hamilton's equations are implied by applying the variational principle to integral H.I. But that has not been our real purpose. Rather, as stated previously, our purpose is to derive canonical transformations. Toward that end we introduce3 an arbitrary function G ( q ,Q, t ) , of old coordinates q and new coordinates Q and alter H.I. slightly by subtracting the total derivative dG from its integrand:
=
s,"
aG (pidq' - H d t - -dq' 84'
'
aG
- -dQ, aQi
3Goldsteinuses the notation F l ( q , Q, t ) for our function G(q, Q,t ) ,
-
at
418
ANALYTIC BASIS FOR APPROXIMATION
This alteration cannot change the extremal path obtained by applying the same variational principle since the integral over the added term is independent of path. We could subject H.I.’ to a variational calculation like that applied to I, but instead we take advantage of the fact that G is arbitrary to simplify the integrand by imposing on it the condition (14.1.6)
This simplifies Eq. (14.1.5) to H.I.’ =
S,,
4 (PidQi - H ’ d t ) ,
(14.1.7)
where we have introduced the abbreviations Pi = - aG(s, Q, 0 , a Qi
aG and Hf(Q,P,r)= H+-. at
(14.1.8)
The former equation, with Eq. (14.1.6),defines the coordinate transformation and the latter equation gives the Hamiltonian in the new coordinates. The motivation for this choice of transformation is that Eq. (14.1.7) has the same form in the new variables that Eq. ( 14.1.2) had in the old variables. The equations of motion are therefore
a H‘ and Pi = --, aHf =(14.1.9) api ’ a Q‘ Since these are Hamilton’s equations in the new variables, we have achieved our goal. The function G(q, Q, r ) is known as the “generating function” of the canonical transformation defined by Eq. (14.1.6) and the first of Eqs. (14.1.8). The transformations have a kind of hybrid form (and this is an inelegance inherent to the generating function procedure) with G depending as it does on old coordinates and new momenta. Also, there is still “housekeeping” to be done, expressing the new Hamiltonian H’ in terms of the new variables, and there is no assurance that it will be possible to do this in closed form. Condition (14.1.6), which has been imposed on the function G, is reminiscent of the formula for p in the Hamilton-Jacobi theory, with G taking the place of action function S. Though G could have been any function consistent with Eq. (14.1.6), if we conjecturethat G is a solution of the Hamilton-Jacobi equation H + %/at = 0, we can determine from Eq. (14.1.8) that the new Hamiltonian is given by H’ = 0. Nothing could be better than a vanishing Hamiltonian because, by Eqs. (14.1.9), it implies that the new coordinates and momenta are constants of the motion. Stated conversely, if we had initially assigned ourselves the task of finding coordinates that were constants of the motion, we would have been led to the Hamilton-Jacobi equation as the condition to be applied to generating function G. The other equation defining the canonical transformationis the first of Eqs. (1 4.1.8): Qi
(14.1.10)
TIME-INDEPENDENT CANONICAL TRANSFORMATION
419
Without being quite the same, this relation resembles the Jacobi-prescription formula B = aS/act for extracting constants of the motion /3 corresponding to separation constant ct in a complete integral of the Hamilton-Jacobi equation. It is certainly true that if G is a complete integral and the Pi are interpreted as the separation constants in that solution, then the quantities defined by Eq. (14.1.10) are constants of the motion. But, relative to the earlier procedure, coordinates and momenta are interchanged. The reason is that the second arguments of G have been taken to be coordinates rather than momenta. We are therefore motivated to subtract the total differential of an arbitrary function d S ( q , P, r)4 (or rather, for reasons that will become clear immediately, the function d ( S - Pi Q’)) from the variational integrand:
H.I.’ =
l:
(pi&; - H d t - -dq‘ as 84’
as a pi
. - -dPi
- -dt as
ar
+ PidQ’ + Q’dPi (14.1.1 1)
where we have required
(It was only with the extra subtraction of d(P;Q i ) that the required final form was obtained.) We have now reconstructed the entire Jacobi prescription. If dS(q,P, r ) is a complete integral of the Hamilton-Jacobi equation, with the Pi defined to be the a;separation constants, then the pi 3 Qi obtained from the second of Eqs. (14.2.1) are constants of the motion. To recapitulate, a complete integral of the Hamilton-Jacobi equation provides a generator for performing a canonical transformation to new variables for which the Hamiltonian has the simplest conceivable form-it vanishes--causing all coordinates and all momenta to be constants of the motion.
14.2. TIME-INDEPENDENT CANONICAL TRANSFORMATION Just as the Hamilton-Jacobi equation is the short-wavelengthlimit of the Schrodinger equation, the time-independent Hamilton-Jacobi equation is the same limit of the time-independent Schrijdingerequation. As in the quantum case, methods of treating the two cases appear to be rather different even though time independence is just a special case. 4Goldstein uses the notation F2(q, P, t ) for our function S(q, P. t ) . This function is also known as “Hamilton’sprincipal function.” Other generating functions, F3(p. Q, t ) and F4(p, P, r ) in Goldstein’s notation, can also be used.
420
ANALYTIC BASIS FOR APPROXIMATION
When it does not depend explicitly on time, the Hamiltonian is conserved, H(q, p) = E, and a complete integral of the Hamilton-Jacobi equation takes the form
where the independent parameters are listed as P. The term acrion, applied to S up to this point, is commonly also used to refer to So.5 In this case the Hamilton-Jacobi equation becomes
3
=E,
H q,-
(14.2.2)
and a complete integral is defined to be a solution of the form So = So(q, P)
+ const.,
(14.2.3)
with as many new parameters Pi as there are coordinates. It is important to recognize, though, that the energy E can itself be regarded as a Jacobi parameter, in which case the parameter set P is taken to include E. In this time-independentcase it is customary to use So(q, P) (rather than S(q, P,r ) ) as the generating function. By the general theory, new variables are then related to old by (14.2.4)
In particular, taking E itself as one of the new momenta, its corresponding new coordinate is (14.2.5)
which is nonvanishing because the parameter set P includes E. Defined in this way, Q E is therefore not consrani. The quantity whose constancy is assured by the Jacobi theory is
as
-= QE- t aE
+ to = constant.
(14.2.6)
This shows that Q E and time t are essentially equivalent, differing at most by the choice of what constitutes initial time. Eq. (14.2.6) is the basis of the statement that E and t are canonically conjugate variables, Continuing with the canonical transfor5Goldsteinuses the notation W(q, P) for our function So(q. P). This function is also known as “Hamilton’s characteristic function.”The possible basis for this terminology has been discussed earlier in connection with Problem 10.3.3. Landau and Lifshitz call SO the “abbreviated action.”
ACTION-ANGLEVARIABLES
421
mation, the new Hamiltonian is
H’(Q, P, t ) = H
a so = E . +at
(14.2.7)
We have obtained the superficiallycurious result that in this simpler, time-independent case the Hamiltonian is less simple, because nonvanishing, than in the timedependent case. This is due to our use of So rather than S as the generating function. But H’ is constant, which is good enough.6 We can already test one of the Hamilton equations, namely the equation for b ~
.
aH’ aE
Q E = -= 1 ,
(14.2.8)
which is in agreement with Eq. (14.2.6). For the other momenta, not including E , Hamilton’s equations are Pi = O
and
. ’ aE Q‘ = - = O . a pi
(14.2.9)
Hence finding a complete integral of the time-independentHamilton-Jacobi equation is tantamount to having solved the problem.
14.3. ACTION-ANGLE VARIABLES 14.3.1. The Action Variable of a Simple Harmonic Oscillator Recall that the variation of action S along a true trajectory is given, as in Eq. (1 I . 1. lo), by dS = pjdq‘
-Hdt,
or S ( P ) =
(14.3.1)
Applying this formula to the simple harmonic oscillator, since the path of integration is a true particle trajectory, H = E, the second term integrates to - E ( t - to). Comparing this with Eq. (14.2.1). we obtain for the abbreviated action So(q) = 1; pidq’, or, in one dimension, (14.3.2) The word “action” has already been used to define the basic Lagrangian variational integral and as a name for the function satisfying the Hamilton-Jacobi equation, but it now acquires yet another meaning as “1/2n times the phase space area enclosed 6When applying the Jacobi prescription in the time-independent case, one must be careful not to treat E as functionally dependent on any of the other Pi, though.
,
422
ANALYTIC BASIS FOR APPROXIMATION
after one cycle.” Because this quantity will be used as a dynamic variable, it is called the “action variable” I of the oscillator.’ For simple harmonic motion 1
2n
2n
dpdq=-nJ2mE 2n
2E
-
E
.
(14.3.3)
The first form of integral here is a line integral along the phase space trajectory, the second is the area in ( q , p) phase space enclosed by that curve. In quantum mechanics, Planck‘s constant h specifies a definite area of phase space, and the number of quantum states is given by d p d q / h. (Reviewing the solution of the Schrodinger equation for a particle in a box would confirm this at least approximately.) Commonly, units are employed for which A = h/(21r) = 1, and in those units the number of states is given by 1d p d q . This is a possible justification for, or at least mnemonic to remember, the factor 1 / ( 2 n )entering the conventional definition of I. This factor will also give the “right” period, namely 231, for the motion expressed in terms of “angle variables” (to be introduced shortly).
i
&
14.3.2. Adiabatic lnvariance of the Action I
Consider a one-dimensional system that is an “oscillator” in the sense that coordinate q returns to its starting point at some time. If the Hamiltonian is time-independent, the energy is conserved, and the momentum p returns to its initial value when q does. In this situation, the area within the phase space trajectory is closed and the action variable I just introduced is unambiguously defined. Suppose however that the Hamiltonian H (4,p , r) and hence the energy E(r) have a weak dependence on time that is indicated by writing
The variable A. has been introduced artificially to consolidate whatever time dependence exists into a single parameter for purposes of the following discussion. At any time r the energy E ( t ) is defined to have the value it would have if k ( t ) were held constant at its current instantaneous value. Any nonconstancy of E ( r ) reflects the time dependence of H.The prototypical example of this sort of time dependency is pururnetric variation-for example, the “spring constant” k, a “parameter” in simple harmonic motion, might vary slowly with time, k = k ( r ) . Eventually what constitutes “slow” will be made more precise but, much like the short-wavelength approximations previously encountered, the fractional change of frequency during 7The terminology is certainly strained since I is usually called the “action variable,” in spite of the fact that it is constant, but “variable” does not accompany “action” when describing So, which actually does vary. Next we will consider a situation in which I might be expected to vary, but will find (to high accuracy) that it does not. Hence the name “action nonvariable” would be more appropriate. Curiously enough the word “amplitude” in physics suffers from the same ambiguity; in the relation x = a coswr it is ambiguous whether the “amplitude” is x or a.
ACTION-ANGLE VARIABLES
423
one oscillation period is required to be small. Motion with )I fixedvariable will be called “unperturbed/perturbed.” During perturbed motion the particle energy, (14.3.5)
E(t) = H ( q , p , h(t)),
varies, possibly increasing during some parts of the cycle and decreasing during others, and probably accumulating appreciably over many cycles. We are now interested in the systematic or averaged-over-one-cycle variation of quantities like E (I) and Z(r). The “time average” f ( t ) of a variable f ( t ) that describes some property of a periodic oscillating system having period T is defined to be t+T
fo= f [
f (t‘)dt’.
(14.3.6)
From here on we take t = 0. Let us start by estimating the rate of change of E as h varies. Since h ( t ) is assumed to vary slowly and monotonically over many cycles, its average rate of change dh/dt and its instantaneous rate of change dhldt differ negligibly, making it unnecessary to distinguish between these two quantities. But the variation of E will tend to be correlated with the instantaneous values of q and p , so E can be expected to be above average at some times and below average at others. We seek the time-averaged value d E / d t . To a lowest approximation we anticipate d E / d t d k / d t , unless it should happen (which it won’t) that d E / d t vanishes to this order of approximation. Two features that complicate the present calculation are that the perturbed period T is in general different from the unperturbed period and that the phase space orbit is not in general closed, so its enclosed area is poorly defined. To overcome these problems, the integrals will be recast as integrals over one cycle of coordinate q , since q necessarily returns to its starting value, say q = 0. (We assume q ( t = 0) # 0.) The action variable
-
I ( E , A) = - p ( q , E , 2n l f
(14.3.7)
is already written in this form. From Eq. (14.3.5) the instantaneous rate of change of energy is given by (14.3.8) and its time average is therefore given by (14.3.9) (Because of the assumed slow, monotonic variation of h(r), it is legitimate to move the $ factor outside the integral in this way.) To work around the dependence of T on
424
ANALYTIC BASIS FOR APPROXIMATION
A, we recast this expression in terms of phase space line integrals. Using Hamilton’s equations we obtain
dt,
1
and hence T =
4.
(14.3.10)
This formula was encountered first in part (c) of Problem 1.2.1. Here we must respect the assumed functional form H ( q , p , A) and, to emphasize the point, have indicated explicitly what variables are being held constant for the partial differentiation. (To be consistent, we should have similarly written aH/aJ.l,,, in the integrand of Eq. (14.3.9).) Making the same substitution (14.3.10) in the numerator, formula (14.3.9) can be written (14.3.1 1)
Because this expression is already proportional to dA/dt, which is the order to which we are working, it is legitimate to evaluate the two integrals using the unperturbed motion. Terms neglected by this procedure are proportional to d A / d t and give only contributions of order (dA/dr)2 to d E / d t . (This is the sort of maneuver that one always resorts to in perturbation theory.) The unperturbed motion is characterized by functional relation (14.3.5) and its “inverse” E = H ( q , p , A)
and p = p ( 4 , A, E ) ,
or E = H ( q , p ( 4 . A, E l , A). (14.3.12)
From now on, since A is constant because unperturbed motion is being described, it will be unnecessary to list it among the variables being held fixed during differentiation. Differentiatingthe third formula with respect to E yields (14.3.13)
which provides a more convenient form for one of the factors appearing in the integrands of Eq. (14.3.11). Differentiating the third of Eqs. (14.3.12) with respect to A yields
Finally, substituting these expressions into Eq. (14.3.11) yields
-=-----#El
__
dE dt
dA I dt T
4.E
d4.
(14.3.15)
ACTION-ANGLE VARIABLES
425
As stated previously, the integral is to be performed over the presumed-to-be-known unperturbed motion. We turn next to the similar calculation of d l fdt. Differentiating Eq. (14.3.7) with respect to t using Eq. (14.3.8) and the first of Eqs. (14.3.10) yields
dt
2n
From the second of Eqs. (14.3.14) it can then be seen that -
d _l -- 0. dt
(14.3.17)
Of course this is only approximate, as terms of order (dh/dt)2have been dropped. Even so, this is one of the most important formulas in mechanics. It is usually stated as the action variable is an adiabatic invariant. That this is not an exact result might be regarded as detracting from its elegance, utility and importance. In fact the opposite is true since, as we shall see, it is often an extremely accurate result, with accuracy in parts per million not uncommon. This would make it perhaps unique in physics-an approximation that is as good as an exact result-except that the same thing can be said for the whole of Newtonian mechanics. It is still possible for I to vary throughout the cycle, as an example in Section 14.3.4 will show, but its average is constant. There is an important relation between action I and period T (or equivalently frequency w = 23712') of an oscillator. Differentiatingthe defining equation (14.3.7) for I with respect to E and using Eq. (14.3.13) and Eq. (14.3.10) yields
ar aE
1 2n
T
1
4.).
(14.3.18)
This formula can be checked immediately for simple harmonic motion. In Eq. (14.3.3) . hence we had I = E f w ~and (14.3.19)
To recapitulate, we have considered a system with weakly time-dependent Hamiltonian H , with initial energy Eo determined by initial conditions. Following the continuing evolution of the motion, the energy, because it is not conserved, may have evolved appreciably to a different value E. Accompanying the same evolution, other quantities such as (a priori) action I and oscillation period T also vary. The rates d E fdt, d l fdt, dh fdt, etc., are all proportional to dh fdt-doubling d h f d t doubles
426
ANALYTIC BASIS FOR APPROXIMATION
all rates for small dA/dt. Since these rates are all proportional, it should be possible to find some combination that exhibits a first-order cancellation, and such a quantity is an “adiabatic invariant” that can be expected to vary only weakly as A is varied. It has been shown that I itself is this adiabatic invariant. In thermodynamics one considers “quasistatic” variations in which a system is treated as static even if it is changing slowly, and this is what we have been doing here, so “quasistatic” invariant would be slightly more apt than “adiabatic,” which in thermodynamics means that the system under discussion is isolated in the sense that heat is neither added nor subtracted from the system. But the tenninology is not entirely inappropriate, as we are considering the effect of purely mechanical external intervention on the system under discussion; heat is not even contemplated, and certainly neither added nor removed. There is an important connection between quantized variables in quantum mechanics and the adiabatic invariants of the corresponding classical system. Suppose a quantum system in a state with given quantum numbers is placed in an environment with varying parameters (such as a time-varying magnetic field, for example), but that the variation is never quick enough to induce a transition. Let the external parameters vary through a cycle that ends with the same values as they started with. Since the system has never changed state, it is important that the physical properties of that state should have returned to their starting values-not just approximately, but exactly. That it does this is what distinguishes an adiabatic invariant. This strongly suggests that the dynamical variables whose quantum numbers characterize the stationary states of quantum systems have adiabatic invariants as classical analogs. The Bohr-Somerfeld atomic theory, which predated slightly the discovery of quantum mechanics, was based on this principle. Though it became immediately obsolete, this theory was not at all ad hoc and hence has little in common with what passes for “the Bohr-Somerfeld model” in modem sophomore physics courses. In short, the fact that the action is an adiabatic invariant makes it no coincidence that Planck’s constant is called “the quantum of action.”
14.3.3. ActiodAngle Conjugate Variables
Because of its adiabatic invariance, the action variable I is an especially appropriate choice as parameter in applying the Jacobi procedure to a system with slowly varying parameters. We continue to focus on oscillating systems. Recalling the discussion of Section 14.2. we introduce the abbreviated action (14.3.20) Until further notice, A will be taken as constant, but it will be carried along explicitly in preparation for allowing it to vary later on. Since A is constant, both E and I are constant, and either can be taken as the Jacobi “momentum” parameter; previously we have taken E, now we take I, which is why the arguments of So have been given
ACTION-ANGLE VARIABLES
427
as (q, I, A). Since holding E fixed and holding I fixed are equivalent, (14.3.21) Being a function of q through the upper limit of its defining equation, So(q, I, A) increases by 2n I as q completes one cycle of oscillation since, as in Eq.(14.3.7), (14.3.22) Using So(q, I, A), defined by Eq. (14.3.20) as the generator of a canonical transformation, Eqs. (14.2.4) become (14.3.23) where (p, the new coordinate conjugate to new momentum I, is called an “angle variable.” For the procedure presently under discussion to be useful, it is necessary for these equations to be reduced to explicit transformation equations ( q , p) -+ (I, cp), such as Eqs. (14.3.30) of the next section. By Eq. (14.2.7), the new Hamiltonian is equal to the energy (expressed as a function of I)
H I U , (p, A) = E ( I , A),
(14.3.24)
and Hamilton’s equations are (14.3.25) where Eq. (14.3.18) has been used, and the symbol @ ( I , h ) has been introduced to stand for the oscillator frequency. Integrating the second equation yields (p
= w(Z,A ) ( ? - to).
(14.3.26)
This is the basis for the name “angle” given to cp. It is an angle that advances through 217 as the oscillator advances through one period. In these ( q , p) + ((p, I) transformation formulas, h has appeared simply as a fixed parameter. One way to exploit the concept of adiabatic invariance is now to permit A to depend on time in a formula such as the second of Eqs. (14.3.25), I$ = @ ( I , h ( t ) ) .This formula, giving the angular frequency of the oscillator when A is constant, will continue to be valid with the value of I remaining constant, even if A varies arbitrarily, as long as its variation over one cycle is negligible when the frequency is being observed. A more powerful way of proceeding is to recognize that it is legitimate to continue using Eqs. (14.3.23) as transformation equations, even if A. varies, provided A is replaced by A ( ? ) everywhere it appears. The generating function is then &(q, I, A ( ? ) ) , and (p will still be called the “angle variable,” conjugate to I. Using Eq. (14.1.7), and
428
ANALYTIC BASIS FOR APPROXIMATION
taking account of the fact that the old Hamiltonian is now time-dependent, the new Hamiltonian is H'(q, I , t ) = H
a so = E +at
an
(14.3.27)
The new Hamilton equations are
a ~ .u ., ) i )
. q=-+-
,
a1
(14.3.28) 1.1
Since no approximations have been made, these are exact equations of motion provided the function SOhas been derived without approximation. 14.3.4. Parametrically Driven Simple Harmonic Motion Generalizingthe simple harmonic motion analyzed in Section 11$2.7by allowing the spring constant k ( t ) 3 mh2(t) to be time-dependent, the Hamiltonian is
- + -mA2(t)q2. 1 2
~ ( qp , ,t ) = P 2 2m
(14.3.29)
Though time-dependent, this Hamiltonian represents a linear oscillator because the frequency is independent of amplitude. The time-independent transformations corresponding to Eqs. (14.3.23) can be adapted from Eq. (1 1.2.36)by substituting00 = X, E = IWO = I A , andwg(t - to) = (p:
(14.3.30)
The abbreviated action is given by
The dependence on q is through its presence in the upper limit. This dependence can be rearranged as h = __ 21 sin2 p.
q2m
(14.3.32)
ACTION-ANGLE VARIABLES
429
This can be used to calculate the quantity (14.3.33) which can then be substituted into Eqs. (14.3.28);
i i i= A. + sin2(a-. A
(14.3.34)
Here the frequency @ ( I , A) has been calculated as if A were time-independent; that is, w ( I , A) = A. Since in this case the slowly varying parameter has been chosen as A = w, one can simply replace A by w, eliminating the artificially introduced A. The first equation shows that d l l d t is not identically zero, but the fact that cos 2(a averages to zero shows that the equation implies that d l l d t averages to zero to the extent that I is constant over one cycle and can therefore be taken outside the averaging. Though this statement may seem a bit circular-if I is constant then I is constant-it shows why I is approximately constant and can be the starting point of an estimate of the accuracy to which this is true. The new Hamiltonian is obtained from Eqs. (14.3.27) and (14.3.33),
where the time dependence is expressed as the dependence on time (but not amplitude) of the “natural frequency” o(t).The linearity of the oscillator is here reflected by the fact that H’depends linearly on I. Problems below illustrate how this can be exploited to complete the solution in this circumstance. Fiq. (14.3.35) can be used to check Eqs. (14.3.34) by substituting into Hamilton’s equations, although that is not different from what has already been done. The angle (a has appeared in these equations only in the forms sin (a, cos (a, sin 2(a, cos 260. This is not an accident because, although the abbreviated action is augmented by 2rr I every period, with this term subtracted the action is necessarily a periodic function of (a. The accumulating part does not contribute to Iq,l because I is held constant. It follows that H’is a periodic function of (a with period 2 n and can therefore be expanded in a Fourier series with period 2n in variable (a. For the particular system under study this Fourier series has a single term, sin 260, augmenting its constant part.
2
Problem 14.3.1: Eq. (14.3.30) gives a transformation ( q , p) -+ (I,(a). Derive the inverse transformation ( I , 9) + (q, p ) . Using a result from Section 13.4.2, show that both of these transformations are symplectic.
430
ANALYTIC BASIS FOR APPROXIMATION
Problem 14.3.2: Consider a one-dimensional oscillator for which the Hamiltonian expressed in action-angle variables is
H
=
w
~
+
t
~
~
~
~
2
~
,
where w and t are constants (with E not allowed to be arbitrarily large). From Hamilton’s equations, express the time dependence p(t) as an indefinite integral and perform the integration. Then express I (?) as an indefinite integral.
&
Problem 14.3.3: For the system with Hamiltonian given by H ( q , p , t ) = -Iirn12(t)q2as in Eq. (14.3.29),consider the transformation ( q , p ) -+ ( Q , P) given by
(14.3.36) where r ( t ) will be specified more precisely in a later problem but is, for now, an arbitrary function of time. Show that this transformation is symplectic.
Problem 14.3.4: For the same system, in preparation for finding the generating function G ( q , Q, t ) defined in Eqs. (14.1.6) and (14.1.8), rearrange the transformation equations of the previous problem into the form P = P ( q , Q , t ) and p = p ( q , Q,t). Then find G ( q , Q,t) such that p = -
aG a4 ’
p=--
aG
aQ‘
(14.3.37)
Problem 14.3.5: In preparation for finding the new Hamiltonian H ’ ( Q . P,t) and expressing it (as is obligatory) explicitly in terms of Q and P,invert the same transformation equations into the form q = q ( Q , P, t ) and p = p ( Q , P, t ) . Then find H ’ ( Q , P, t ) and simplify it by assuming that r ( t ) satisfies the equation i-’
+ 1 2 ( t ) r - r p 3 = 0.
(14.3.38)
Then show that Q is ignorable and hence that P is conserved.
Problem 14.3.6: Assuming that the system studied in the previous series of problems is oscillatory. find its action variable and relate it to the action variable E / w of simple harmonic motion.
EXAMPLES OF ADIABATIC INVARIANCE
431
14.4. EXAMPLES OF ADIABATIC INVARIANCE
14.4.1. Variable-Length Pendulum Consider the variable-length pendulum shown in Fig. 14.4.1. Tension T holds the string, which passes over a frictionless peg, the length of the string below the peg being Z(t). Assuming small-amplitude motion, the “oscillatory energy” of the system Eosc is defined so that the potential energy (with the pendulum hanging straight down) plus kinetic energy of the system is - m g l ( t ) Eosc. With fixed I ,
+
Eosc = -mgl8,,, 1 2 2
(14.4.1)
If the pendulum is not swinging, E,,, continues to vanish when the length is varied slowly enough that the vertical kinetic energy can be neglected. We assume the length changes slowly enough that i2 and i’ can be neglected throughout. The equation of motion is
id
g
1
1
8 + - + - sin8 = 0.
(14.4.2)
For “unperturbed” motion, the second term is neglected, and the (small-amplitude) action is given by (14.4.3) Change dZ in the pendulum length causes change d8,, in maximum angular amplitude. The only real complication in the problem is that the ratio of these quantities depends on 8. The instantaneous string tension is given by mg cos 8 +m1d2 -mi: but we will neglect the last term. The energy change d E,,, for length change dl is equal to the work done -Tdl by the external agent acting on the system less the change in
FIGURE 14.4.1. Variable-length pendulum. The fractional change of length during one oscillation period is less than a few percent.
432
ANALYTIC BASIS FOR APPROXIMATION
potential energy:
d E,,, = -(mg cos 0
+ rnld2) dl + mg d l .
(14.4.4)
Continuing to assume small oscillation amplitudes,
dEosc 1 2 - m1d2. -- - -mgo dl 2
(14.4.5)
The right-hand side can be estimated by averaging over a complete cycle of the unperturbed motion and, for that motion, (14.4.6) As a result, using Q. (14.4.1). we have
(14.4.7)
Then from Eq.(14.4.3), (14.4.8)
Here we have treated both 1 and E,,, as constant and moved them outside the averages. The result is that I is conserved, in agreement with the general theory.
14.4.2. Charged Particle In Magnetic Field Consider a charged particle moving in a uniform magnetic field B ( t ) that varies slowly enough that the Faraday law electric field can be neglected and also slowly enough that the adiabatic condition is satisfied. With the coordinate system as defined in Fig. 14.4.2, the vector potential of such a field is 1 A, = - 5 y B ,
1 A,=-xB,
2
Az=O,
(14.4.9)
because
i VxA=l& Ax
j
e
&
t/=s”.
A,
A,
( 14.4.1 0)
Introducing cylindrical coordinates, from Eq. (12.6.4) the (nonrelativistic) Lagrangian is
EXAMPLES OF ADIABATIC INVARIANCE
433
FIGURE 14.4.2. A charged particle moves in a slowly varying, uniform magnetic field.
1 u L=-m 2
2+
eA-v
(14.4.1I )
2
Because this is independent of 8 , the conjugate momentum,8 1
Pe = mr2e + -eB(r)r2, 2
(14.4.12)
is conserved. With B fixed, and the instantaneouscenter of rotation chosen as origin, a condition on the unperturbed motion is obtained by equating the centripetal force to the magnetic force: m8 = -eB,
(14.4.13)
1 20, ' Po = -mr 2
(14.4.14)
with the result that
and the action variable is (14.4.15)
SRecall that (uppercase) P stands for conjugate momentum, which differs from (lowercase) p, which is the mechanical momentum.
434
ANALYTIC BASIS FOR APPROXIMATION
It is useful to express Ze in terms of quantities that are independent of the origin using Eq. (14.4.13), (14.4.16) where u l is the component of particle velocity normal to the magnetic field. To recapitulate, u i / B is an adiabatic invariant. The important result is not that PO is conserved when B is constant, which we already knew, but that it is conserved even when B varies (slowly enough) with time. Furthermore, since the change in B is to be evaluated at the particle’s nominal position, changes in B can be due either to changes in time of the external sources of B or to spatial variation of B in conjunction with displacement of the moving particle’s center of rotation (for example parallel to B). Po is one of the important invariants controlling the trapping of charged particles in a magnetic “bottle.” This is pursued in the next section. 14.4.3. Charged Particle in a Magnetlc Trap A particle of charge e moves in a time-independent, axially symmetric magnetic field B(R). Symbolizing the component of particle velocity normal to B by w, the approximate particle motion follows a circle of radius p with angular rotation frequency wc (known as the “cyclotron frequency”). These quantities are given by mw P=-,
eB
w eB and w C = 2 n = -, 2np m
(14.4.17)
with the latter being independent of the speed of the particle. The field is assumed to be nonuniform but nor too nonuniform. This is expressed by the condition
p
y