PROCEEDINGS OF SYMPOSIA IN APPLIED MATHEMATICS VOLUME VIII
CALCULUS OF VARIATIONS AND ITS APPLICATIONS
McGRAW-HILL BOO...
66 downloads
462 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
PROCEEDINGS OF SYMPOSIA IN APPLIED MATHEMATICS VOLUME VIII
CALCULUS OF VARIATIONS AND ITS APPLICATIONS
McGRAW-HILL BOOK COMPANY, INC. NEW YORK
TORONTO
LONDON
1958
FOR THE AMERICAN MATHEMATICAL SOCIETY 80 WATERMAN STREET, PROVIDENCE, RHODE ISLAND
PROCEEDINGS OF THE EIGHTH SYMPOSIUM IN APPLIED MATHEMATICS OF THE AMERICAN MATHEMATICAL SOCIETY Held at the University of Chicago April 12-13, 1956
COSPONSORED BY
THE OFFICE OF ORDNANCE RESEARCH
Lawrence M. Graves EDITOR
Prepared by the American Mathematical Society under Contract No. DA-19-020-ORD-3777 with the Ordnance Corps, U.S. Army.
Printed in the United States of America. All rights reserved except those granted to the United States Government. Otherwise, this book, or parts thereof, may not be reproduced in any form without permission of the publishers. Copyright © 1958 by the McGraw-Hill Book Company, Inc.
Library of Congress Catalog Card Number 50-1183
CONTENTS EDITOR'S PREFACE
V
On Variational Principles in Elasticity
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
BY ERIC REISSNER
Variational Principles in the Mathematical Theory of Plasticity.
7
BY D. C. DRUC%ER
Discussion of D. C. Drucker's Paper "Variational Principles in the Mathematical Theory of Plasticity" . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
23
BY P. G. HODGE, JR. A Geometrical Theory of Diffraction .
.
.
27
.
.
53
.
.
79
.
.
89
.
.
93
Dynamic Programming and Its Application to Variational Problems in Mathematical . Economics . . . . . . . . . . . . . . . . .
115
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
BY JOSEPH B. KELLER
Upper and Lower Bounds for Eigenvalues BY J. B. DIAZ
.
.
.
.
.
.
.
.
Stationary Principles for Forced Vibrations in Elasticity and Electromagnetism By J. L. SYNGE A Variational Computation Method for Forced-vibration Problems.
.
.
.
BY H. F. WEINBERGER
Applications of Variational Methods in the Theory of Conformal Mapping BY M. M. SCHIFFER
.
.
.
.
.
BY RICHARD BELLMAN
Variational Methods in Hydrodynamics .
.
.
.
.
.
.
.
.
.
.
.
139
.
.
.
.
143
BY S. CHANDRASERHAR
Some Applications of Functional Analysis to the Calculus of Variations BY E. H. ,ROTHE INDEX
153
111
EDITOR'S PREFACE This volume contains the papers presented at the Eighth Symposium in Applied Mathematics, sponsored by the American Mathematical Society and the Office of Ordnance Research, and devoted to The Calculus of Variations and Its Applications. In addition to the nine invited addresses, there are included two brief notes, by P. G. Hodge, Jr., and by H. F. Weinberger, which were invited by the Program Committee and which embody discussion of the papers by D. C. Drucker and by J. L. Synge, respectively. It seems obvious that one symposium could not profitably pay attention to all the directions in which variational methods have been applied. From the consultations of the Program Committee there resulted a group of addresses principally directed to applications in dynamics, but treating several other topics also. The editor wishes to make special acknowledgment to the McGraw-Hill Book Company for their care in the production of the volume, and to all the authors for the careful preparation of their manuscripts. As a result the editor's task has been a comparatively light one. LAWRENCE M. GRAVES
Editor
V
ON VARIATIONAL PRINCIPLES IN ELASTICITY' BY
ERIC REISSNER
1. Introduction. Boundary-value problems for the differential equations of the theory of elasticity have in common with many other differentialequation problems the property of being equivalent to problems of the calculus of variations. Recognition of this fact, for the problems of the elastic rod, goes back to Euler and Daniel Bernoulli. The general three-dimensional problem was first considered in this fashion by Green, in 1837. We may, in the discussion of variational principles in elasticity, distinguish a number of phases as follows: 1. The formulation of different variational principles and their interrelation. The best-known examples of this are Green's minimum principle for displacements and Castigliano's maximum principle for stresses. 2. The application of variational principles to the establishment of approximate two- and one-dimensional theories for three-dimensional problems. A classical example of this is Kirchhoff's treatment of the differential equations and boundary conditions for transverse bending of thin plates. 3. The application of variational principles for the determination of numerical values of the solution of boundary-value problems. 4. The simultaneous use of different variational principles for the determination of upper and lower bounds of numerical values. 5. The use of variational principles for the proof of uniqueness and existence theorems in elasticity theory. The present paper has as its object the consideration of some of the ques-
tions associated with phases 1 and 4, as they have been of interest to the author.
2. The boundary-value problem. We consider the following system of nine differential equations for six components of stress, Ti, = Tji, and three components of displacement, ui: (1) (2)
A-i = 0, Tij,j (uij -i+ wi) = W,rii.
In these equations and in what follows we make use of the summation convention according to which one sums over repeated subscripts. A comma in front of a subscript denotes partial differentiation with respect to the variable in question, except that f,i indicates differentiation of f with respect to the Cartesian coordinate xi. 1 The work leading to this paper has been supported by the Office of Naval Research under Contract No. Nonr-1841(17) with the Massachusetts Institute of Technology. 1
ERIC REISSNER
2
The function (3)
in the equilibrium equations is taken in the form Y' = Xiui + YYijuiu7,
where the Xi and Yij = Yji are given functions of the coordinates xi. The function W in the stress-strain relations (2) is taken in the form (4)
W = Ai,7ij + 2Bijklrirkl,
where the Aij = Aji and Bijkl = Bjikl = Bijlk = Bjilk are given functions of xi.
The system (1) and (2) is to be solved in the interior of a region V with boundary surface S. We divide the surface S in two parts, S. and SP, and consider the following system of conditions: (5)
On S..: On SP:
ui = O.Pi,
pi = x,ui.
The functions ¢ and x are taken in the form = uipi + v iipipj, X = AN + Icijuiujf where ui, pi, bij = bij, cij = cji are given functions of position on S. and SP, respectively. The quantities pi are the x; components of the surface-stress intensity, given by (6)
(7)
pi = cos (n,xi) rij,
where n is the outward normal direction to the surface S. The system of equations (1), (2), and (5), with ,', W, 0, and x defined by (3), (4), and (6), may be shown to represent the Euler equations and natural boundary conditions of a variational problem as stated below. 3. The general variational equation. Appropriate synthesis leads to the conclusion that a variational problem which has the differential equations (1) and (2) as Euler (differential) equations and the boundary conditions (5) as natural (or Euler) boundary conditions is the problem
SI = 0,
(8)
where (9)
I = fy (yijij - G - W) dV - fsp x dS - fsu (piui - -0) dS,
the quantities yij being defined by (10)
yij = $(ui.j + u1,i),
and where the rij and ui are varied independently.' 2 A variational theorem in which equations (10) are not used as definitions but are considered six of a total of fifteen differential equations for stresses, strains, and displacements has recently been formulated by K. Washizu in Technical Report 25-18 of the Aeroelastic and Structures Research Laboratory of the Massachusetts Institute of Technology (March, 1955).
ON VARIATIONAL PRINCIPLES IN ELASTICITY
3
To verify the/ correctness of the above statement, we write (11)
SI = fv (yij Srij + rij Syij - ik,u, Sui - W,,i, 57-ii) dV Y
-fsP X,u; Su; dS - fs
(ui Spi + pi Sui - 4,P; Spi) dS,
and we transform the second term in the volume integral by integration by parts, as follows: (12)
-- f rij S(ui,j + uj,i) dV = - f rij,,i Sui dV + f pi Sui dS.
Combination of (11) and (12) gives (13)
SI = f [(yij - W,z,j) Srij - (rij,j
/
L ,P,u;) Sui] dV I
Sui dS - jrsu (ui - 4,p;) Bpi dS,
+ fsP (pi
and this shows that the Euler equations of the problem are the differential equations (1) and (2) and the boundary conditions (5).
The variational theorem implied by (8) to (10) is a generalization of a theorem which was formulated earlier.' It reduces to the earlier theorem if it is assumed that the body force function ¢ is absent and that the functions 0 and x in the boundary conditions are of the form (6')
0 = icipi,
x = piui.
What we have done in going from (6') to (6) is to take the step from having either stress or displacement boundary conditions on S p and S. to a system of mixed boundary conditions on both S, and S. in such a manner as to preserve the form of the original theorem as a special case. Were it not for the desirability of accomplishing this within the framework of the generalized problem, there would, for the generalized problem, be no need for a separate consideration of the boundary portions S, and Su. 4. Variational equations for displacements or stresses. In order to bring out the significance of the general variational equation (8) for displacements and stresses, we state separately the less-general variational equations for displacements or stresses. In doing this, we are limiting ourselves here to stress and displacement boundary conditions of the form (6'). a. Variational principle for displacements (Green). The stress-strain relations (2) are considered as equations of definition for the stresses (so that stress variations are dependent on displacement variations), and displacement variations are limited such that Sui = 0 on S.. Equations (2) are inverted and written, with the help of a function U, in the form (14)
rij = V, q.
3 E. Reissner. On a variational theorem in elasticity, J. Math. Phys. vol. 29 (1950) pp. 90-95.
ERIC REISSNER
4
We further find that
'yjjrjj - W = U,
(15)
and that the variational equation which has the equilibrium equations (1) and the stress boundary conditions in (6') as Euler equations is of the form
SIu=0,
(16)
where (17)
I = fV (U - G) dV - f piui dS.
b. Variational principle for stresses (Castigliano). We now assume that stress variations and displacement variations are such that all comparison states are equilibrium states. (We then have Spi = 0 on S and
S(Tij.j +'A.) = 0
(18)
in the interior of the body. We find that the variational equation which has the stress-strain relations (2) and the displacement boundary conditions in (6') as Euler equations is of the form
SIT = 0,
(19)
where (20)
IT = f (-W - + uj¢,u,) dV + fSu uipi dS.
We may note that, as long as is a linear function of the ui, which corresponds to the case of body forces independent of displacements, we have that
- uiy',u; = 0 and therewith the disappearance of body-force terms in the variational equation. The extension of the principle to the case where ' is a special quadratic function of the ui, which allows use of the principle in connection with vibration problems, has been stated previously.4
5. A transformation and two inequalities. Useful information may be deduced from a comparison of the values of I for functions Tij and u; which are not solutions of SI = 0 and for the functions Tij and ui which are determined from SI = 0. We may designate the solution functions of SI = 0 by 'rij and 9i and write (21)
Tij = Tij + STij,
ui = ii + Sui.
If we introduce (21) in (9), we shall have (22)
I = fV [(yij + Syij)(Tij + 8rij) - 4,(9 + Su) - W(T + ST)] dV Y
-1sn x(u + Su) dS - f5u [(pi + Spi)(ui + 3W - -O(p + op)] dS. 4 E. Reissner, Note on the method of complementary energy, J. Math. Phys. vol. 27 (1948) pp. 159-160.
ON VARIATIONAL PRINCIPLES IN ELASTICITY
5
We shall from now on in this section limit ourselves to the case for which 'i, X, and 0 are linear functions and W is a homogeneous second-degree function. We then have
'(u + Su) = Xiui + Xi Sue, X(u + Su) = piui + pi Sui, 0(p + Sp) = uipi + ui Spi,
(23a) (23b) (23c)
and T (T + ST) =
(24)
T'
W,rj, STij + W(ST).
Y YT
We further write
I=I+61+621,
(25)
where I is the value of I when Tij = Tij and ui = 9.Li, where 31 contains all terms linear in the variations Srij and Sui, and where 521 contains all terms of second degree in the variations. Rearrangement of terms in (22) gives us
I=
fv [yijTij - Xiui - W(T)] dV - fsp piui dS - fsu (i - uti)pi dS, SI = 0, (27) (as it should be), and (26)
82I
fv [S-yij STij - TV(ST)] dV -
Sui dS.
,Is,. Spi = Equations (26) and (28) may be simplified if account is taken of some of the basic relations. Since W is homogeneous of the second degree, we have
(28)
(29)
W (,T) _W,TijTij
Furthermore, 'yij = W,T;; while ui = ui on S.., and pi = pi on S,,.
1 = fv
(30)
Therewith
Xiui) dV - fs piu; dS. P
We further have f?ijrij dV = -JTij,jui dV + f piui dS,
(31)
and, since Tij,j + Xi = 0, finally (32)
I
f,v Xiui dV -
fs,
p'2"ci dS +'- f s. piui dS.
In order to transform 6222I as given by (28), we have at our disposal the relations (33)
W(ST) _ YW.az,jSTij
and (34)
f Syij STij dV = - f Srij,j Sufi dV + f Spi Sui dS.
ERIC REISSNER
6
It is not immediately apparent in which way to utilize these two facts. However, let us write 62I in the following two alternate forms: (35)
S2I = f
Srij,j Sui - W(Sr)] dV + f8v Spi Sui dS
or (36)
82I = f [(Syii - W,,,,,) Si, + W(Sr)] dV - fsu Spi Sui dS.
In general, the quantity 52I may be made both positive and negative by a suitable choice of the integrands. There are two exceptional cases where this is not so. These cases are given when (37)
Bpi = 0 on SP,
and
Sri;,; = 0 in V
or when (38)
5yi; - W,8r;i = 0 in V
and
Sui = 0 on S..
We now take account of the fact that the function W is positive-definite. Accordingly, when (37) holds, we have S2I < 0, and when (38) holds, we have 0 < 82I. We note that (37) represents the same limitations on variations as those associated with the variational principle for stresses [equation (19)] and
that (38) represents the same limitations on variations as those associated with the variational principle for displacements [equation (16)]. We conclude then from (25) and (27) that the following basic inequality holds: (39)
Ir 0,
/I
and because the elastic component is recoverable, (24)
FIG. 5. Two loadings may combine to an unloading at a corner.
arQ.E%p = '1l'Bjjkac'ijQka > 0
for a work-hardening material, unless
In the demonstration of the uniqueness theorem for stress and strain rates [7-10], the entire point lies in the proof of (25)
(a.k.
(a 0 b),
beg.) > 0,
ba'ij) (aei.
where a and b are two assumed solutions for the rates from the stress point v .
If an infinitesimal time, arbitrarily chosen as unity, is permitted to elapse, elP
a rp
b E .P
E it
(o) Permissible path
(c)
(d)
aTb
b-b-a for IE'p
(b)
b --e - a
a T b
a-b for bErp
FIG. 6. Permissible paths.
the two stress states are u + ao; and u +
(Fig. 6). At a smooth point of the loading surface it is possible to go from stress point b to stress point a 13
(Fig. 6a and b) or from stress point a to b (Fig. 6a and c) and change the strain by a4.ri; - bE or bEa - aE , respectively, in accord with (20) and (23). The work postulate for the b-to-a case then gives [see (9)] (26)
l
b
I_ bvj) dEjj
a
r Qgj) dEg; > 0.
r
(a - °y'
b o-1
The result (25) therefore is established [7], but the value of the integral itself is of interest here. (27)
0 < 1 aa.r aEr. v
$3
+
1 bQr $3 ijbE - b0U
-a'
U
(aa' 13
(aE . - bEr ) 13
V
ij
7
VARIATIONAL PRINCIPLES IN PLASTICITY
15
so that, as for (18), (28)
0
.-, it is the desired minimum. Figure 3 shows the displacement at W = 0 computed from equations (6) and compared with the exact solution. The close agreement indi-
cates that the approximate solution obtained in this manner is reasonably accurate. POLYTECHNIC INSTITUTE OF BROOKLYN,
BROOKLYN, N.Y. NOW AT ILLINOIS INSTITUTE OF TECHNOLOGY, CHICAGO, ILLINOIS.
A GEOMETRICAL THEORY OF DIFFRACTION' BY
JOSEPH B. KELLER
1. Introduction. Geometrical optics is a theory of light propagation based on the assumption that light travels along certain curves, called rays, which are determined by the laws of geometrical optics. Although experience has shown that this theory is essentially correct, there are still many cases in which light appears in places where there are no rays (i.e., in shadows). Such discrepancies between experience and geometrical optics are called diffraction effects. It is the purpose of this article to show that geometrical optics can be so modified as to include diffraction. The modification consists in introducing new rays, called diffracted rays, by extending the laws of geometrical optics. These new rays account for the appearance of light in shadows and also alter the light in lit regions. It seems evident that diffracted rays should be produced when a ray hits an edge or a vertex or when a ray grazes an interface or a boundary. Geometrical optics does not describe what happens in any of these cases; hence we will extend it to do so. Then it will yield the diffracted rays. Our extension of the laws of optics will be presented in two equivalent forms. The first is the explicit form, in which we enumerate the different situations in which diffracted rays are produced and describe the different kinds of diffracted
rays which occur in each case. The second formulation is based upon an extension of Fermat's principle. The equivalence of the two formulations follows from the usual considerations of the calculus of variations. Once the diffracted rays have been introduced, we shall define diffracted
wavefronts and the phase, or eiconal, function by means of them. In this way we shall obtain new solutions of the eiconal equation. Conversely, from appropriate solutions of this equation, diffracted wavefronts and rays can be determined. A number of examples in which diffracted rays occur will be described to illustrate this part of our theory. In these examples the diffracted rays cover the shadows of ordinary geometrical optics. However, we shall also find certain cases in which shadows remain even after the introduction of diffracted rays. To obtain rays in such shadow regions, we shall further extend the concept of a ray by introducing imaginary rays. These rays can be used in much the same way as real rays. For example, a complex phase function can be 1 This theory was first presented at the Symposium on Microwave Optics at McGill University, Montreal, Canada, in June, 1953. A brief account appeared in the program of the symposium. It was also presented, with various extensions, at the Symposium on the Calculus of Variations and Its Applications, University of Chicago, April, 1956. The research reported in this article has been sponsored by the Air Force Cambridge Research Center, Air Research and Development Command, under Contract No. AF 19(604)1717. 27
JOSEPH B. KELLER
28
defined in terms of them. It provides the analytic continuation, from a lit region into the shadow, of a solution of the eiconal equation. The second part of our theory shows how the rays and wavefronts can be used for the quantitative description of the light distribution. This necessitates the introduction of an amplitude function and certain principles for its determination. Finally we shall discuss the relation of our theory to previous work on diffraction. This may partially justify the introduction of the new rays by showing how some kinds of them have already appeared in special cases. -_ _ All our considerations can be applied to ------other single-integral variational problems and to other first-order partial differential
equations in any number of variables. Diffracted rays
They lead to the introduction of diffracted
_
extremals and diffracted characteristics and of complex characteristics. With the aid of these characteristics, additional branches of solutions of first-order equa-
Edge
Incident
Screen
rays
tions can be constructed. Complex solutions, which are analytic continuations of certain real solutions, can also be obtained for analytic equations. As an example, consider the HamiltonJacobi equation of classical mechanics.
The characteristics of this equation are
FIG. 1. The cone of diffracted rays pro-
duced by an incident ray which hits
the classical-mechanical trajectories which satisfy Hamilton's canonical equations or Newton's equations of motion. The com-
plex characteristics are complex-valued
solutions of Newton's or Hamilton's equations. These complex trajectories enter the "forbidden regions" into which real trajectories cannot penetrate. They the edge of a thin screen.
enable us to continue solutions of the Hamilton-Jacobi equation into these regions, and yield complex values for the solutions there. The appearance of trajectories in forbidden regions is related to the "tunnel effect" of quantum mechanics. In fact, the present considerations provide a new classical interpretation of this effect. 2. Diffracted rays. The first kind of diffracted ray is produced when an incident ray hits an edge (see Figs. 1 to 4). The incident ray produces infinitely many diffracted rays, traveling in directions determined by the law of diffraction. This law states that each diffracted ray which lies in the same medium as the incident ray makes the same angle with the edge as does the incident ray. Furthermore, the incident and diffracted rays lie on opposite
A GEOMETRICAL THEORY OF DIFFRACTION
29
sides of the plane normal to the edge at the point of diffraction. However, the diffracted ray need not lie in the same plane as the incident ray and the edge. Therefore the diffracted rays form the surface of a cone with its vertex at the point of diffraction. If a diffracted ray and the incident ray lie in different media, the angle between the diffracted ray and the edge is related to the angle between the incident ray and the edge by Snell's law (see Fig. 5a). But here again, the diffracted ray is not restricted to lie in the same plane as the incident ray and the edge.
Therefore these diffracted rays also form the surface of a cone.
FIG. 2. The plane of diffracted rays produced by a ray normally incident on the edge of a thin screen.
When an incident ray hits a vertex (e.g., a junction of two or more edges), it produces infinitely many diffracted rays which leave the vertex in all direc-
tions (see Fig. 6). Thus at a vertex a single incident ray produces a twoparameter family of diffracted rays. When a ray grazes an interface or boundary surface (i.e., when it is tangent to the surface), the ray splits in two (see Fig. 7). One part continues, unaffected by the surface, as an ordinary ray. The other part travels along the surface. Its path on the surface is a surface ray, i.e., a curve which satisfies the differential equations for a ray, when these equations are specialized to a surface. A surface ray also makes Fermat's integral stationary among all curves lying on the surface. At every point on its path this ray again splits in two, one part continuing along the surface and the other part leaving the surface along the tangent to the surface ray, provided that the tangent lies on the same side
30
JOSEPH B. KELLER
of the surface as the surface ray. In defining a surface ray, one must use the value of the index of refraction appropriate to that side of the surface from which the incident ray comes. If the surface ray lies on that side of the surface having the lesser index of refraction, it also sheds another diffracted ray at each point on its path (see
Incident/ray FIG. 3. The diffracted rays produced by a plane wave obliquely incident upon a slit in a thin screen. The two incident rays which hit the slit edges are shown, along with some of the singly diffracted rays which they produce. One diffracted ray from each edge is shown crossing the slit and hitting the opposite edge, producing doubly diffracted rays and then triply diffracted rays.
Fig. 8). This diffracted ray is a critically refracted ray, which leaves the surface at the critical angle on the side opposite the surface ray. Conversely, a surface ray is produced when a ray is incident at the critical angle on that side of the surface having the greater index of refraction (see Fig. 9). In this case
the refracted ray is initially tangent to the surface and then proceeds along it
A GEOMETRICAL THEORY OF DIFFRACTION
31
as a surface ray. Surface rays are also produced by a ray incident at an edge or a vertex, since in these cases some of the diffracted rays leave the edge or vertex along the surfaces meeting there (see Fig. 5). When a ray is incident on a surface of discontinuity of any derivative of the index of refraction, it is reflected and refracted just as at a surface of discontinuity of the index itself. Similarly, diffracted rays are produced when incident rays hit edges or vertices of such surfaces or when they graze these surfaces. In other words, such surfaces behave in all respects like discontinuity surfaces of the index of refraction itself. Just as discontinuity surfaces of deriv-
Screen
atives of the index of refraction must be included as surfaces, so must lines of discontinuity of any derivatives of surfaces be counted as edges. Thus an ordinary edge is a line along which first derivatives
of a surface (i.e., the slopes) are discontinuous. Similarly, a line along which second derivatives of a surface (i.e., the curvatures) are discontinuous must also be counted as an edge. Discontinuities
Edge
Diffracted rays
Incident rays
30
in higher derivatives of a surface also yield edges. In the same way, discontinuities in any derivatives of edges must FIG. 4. A plane wave normally incident be counted as vertices. In addition, iso- upon an aperture in a plane screen. The lated points (not on edges) at which de- incident rays are normal to the edge; rivatives of a surface are discontinuous hence the rays diffracted from each point of the edge lie in a plane normal to the also play the role of vertices. The apex edge. of a cone is an example of such a point. Diffracted rays are produced when rays hit any of these edges or vertices, in the same way as they are produced at ordinary edges and vertices.
On the basis of the preceding descriptions, we may say that a ray is diffracted whenever it hits an edge or vertex or grazes a surface. In every such case the ray produces infinitely many diffracted rays. Thus the process of diffraction, which occurs in these cases, splits a single ray into infinitely many diffracted rays. It is natural to expect that the light intensity associated with these diffracted rays, as well as with rays reflected from discontinuity surfaces of derivatives of the refractive index, is much smaller than that associated with the incident ray. This is indeed the case, and is of particular importance in using our theory for the quantitative calculation of intensities. Away from discontinuity or boundary surfaces, all rays-both ordinary and diffracted-are determined by the usual laws of geometrical optics. These laws, plus the laws of reflection and refraction and the foregoing laws governing diffraction, completely determine all real rays.
32
JOSEPH B. KELLER
As a consequence of the theory of diffraction just described, diffracted rays will exist, in addition to ordinary rays, in any medium which is bounded or in
which the refractive index or any of its derivatives is discontinuous. Of course, in specific examples the incident rays might be so arranged that no diffraction occurs (see Fig. 10). In such examples no incident ray can hit an
(b) Slow
Incident
\ Refracted
Reflected
(c) (a) FIG. 5. (a) Some of the diffracted rays produced by ray incident on the edge of a wedge. The light velocity within the wedge is greater than it is outside the wedge. Therefore the angle
between the refracted rays and the edge is less than that between the incident ray and the edge.
(b) A section of the wavefront resulting from incidence of a plane wave upon a wedge.
The velocity within the wedge is greater than that outside. The incident, reflected, refracted, and diffracted wavefronts are shown. The plane wavefronts produced by rays critically refracted into the outer region from the diffracted wavefront inside the wedge are also shown.
(c) A section of the wavefront resulting from incidence of a plane wave on a wedge. The velocity within the wedge is less than that outside. and diffracted wavefronts are shown.
The incident, reflected, refracted,
edge or graze a surface; hence no shadows can be formed. Therefore the absence of diffracted rays in these examples is not unexpected. 3. Examples. Let us now consider some examples of diffracted rays. First, consider a plane wave (i.e., a set of parallel rays) incident obliquely upon a thin opaque screen in the form of a half plane (see Fig. 1). If the medium is homogeneous, then all the rays, incident, reflected, and diffracted, are straight lines. The only diffracted rays are produced by those incident rays which hit
A GEOMETRICAL THEORY OF DIFFRACTION
33
Diffracted rays
FIG. 6. Diffracted rays produced by a ray hitting the tip of an opaque cone. The diffracted rays emanate from this tip in all directions.
FIG. 7. Some of the diffracted and reflected rays produced when a plane wave hits an opaque convex cylinder. One of the two grazing (tangent) rays is shown. This ray splits, part con-
tinuing unaffected and part running along the cylinder surface. At each point of its path this surface ray sheds a diffracted ray along the tangent to its path.
JOSEPH B. KELLER
34
the edge of the screen. Since the incident rays are parallel to each other, the cones of diffracted rays will also be parallel to each other. Each point of the edge will be the vertex of a cone. One diffracted ray will reach each point in the medium, accounting for the appearance of light in the "shadow" and also providing additional light rays in the illuminated region. If the incident rays are perpendicular to the edge instead of being oblique, the diffracted rays will also be perpendicular to the edge (see Fig. 2). Thus in
Incident rays
f Diffracted
Reflected /rays
rays
Fin. 8. Some of the diffracted, reflected, transmitted, and critically refracted rays produced
when a plane wave hits a convex cylinder of lower light velocity than the surrounding medium. One of the two grazing rays is shown, along with some of the diffracted and critically refracted rays it produces. Critically incident ray \
Diffracted
rays
Slow Fast
FIG. 9. A ray incident at the critical angle on a plane interface between two media. Some of the resulting diffracted rays are shown. The dotted line is a section of the diffracted wavefront.
this case each cone of diffracted rays is opened up to become a plane of diffracted rays. As a third example, consider a plane wave incident obliquely upon a slit in a thin opaque screen (see Fig. 3). The rays which hit the edges of the screen give rise to diffracted rays. Some of the rays diffracted from one edge will hit the other edge and give rise to a new set of diffracted rays. Some of these new rays will in turn hit the opposite edge, producing still other diffracted rays, etc. Thus, in this case there is an infinite set of multiply diffracted rays. Some of these rays are shown in Fig. 3. These singly and multiply diffracted rays are
A GEOMETRICAL THEORY OF DIFFRACTION
35
the only diffracted rays which occur in this problem, and they account for the occurrence of light in the shadow behind the screen. In addition, the usual incident and reflected rays are present. As a fourth example, let us consider a plane wave normally incident upon a plane screen which contains an aperture with a smooth rim (see Fig. 4). Each incident ray which hits the rim is perpendicular to it. Therefore each set of diffracted rays lies in the plane perpendicular to the edge at the point of diffraction. As in the case of the slit, multiply diffracted rays will also be produced. However, the cones of multiply diffracted rays will in general not be planes, since the diffracted rays which produce them will in general not be perpendicular to the edge. Reflected rays
FIG. 10. A case in which no diffracted rays occur. ing surface.
No incident ray is tangent to the reflect-
In all the preceding examples, the diffracting edge is a caustic of the diffracted rays, i.e., a locus of points of intersection of neighboring rays. Obviously, this is always the case with a diffracting edge. However, in the last example above, the singly and multiply diffracted rays also possess other caustics in addition to the edge. The caustic of the singly diffracted rays can be determined very simply, because these rays lie in planes normal to the edge. Therefore, to locate the intersections of neighboring rays, it suffices to consider the intersection of the neighboring planes which contain the rays. Since all these planes are perpendicular to the plane of the screen, any two of them must intersect in a straight line which is also perpendicular to the screen. As the caustic is made up of such lines, it is a cylinder with generators perpendicular to the screen. To determine its cross section, we consider the curve of intersection of the cylinder and the plane of the screen. In the plane of the screen, the diffracted
rays are perpendicular to the rim, and therefore their envelope is just the
36
JOSEPH B. KELLER
envelope of the normals to the rim. But this envelope is called the evolute of the rim. Thus we see that the caustic of the singly diffracted rays is a cylinder with generators normal to the screen and with the evolute of the rim as its cross section. If the screen is removed and replaced by a thin plate having the same rim as the aperture, the preceding considerations apply equally well. This fact is a geometrical form of Babinet's principle. Furthermore, in the case of the plate, the caustic just described will lie (at least partly) in the shadow of the plate. A cross section of it would appear as a bright line in the shadow, since light intensity is greater at a caustic than elsewhere. If the plate has a circular rim, the evolute is a single point, the center, and the caustic is just the axis of the circular plate. The resulting bright spot on the axis is well known experimentally. The bright lines in the shadows of plates of other shapes have also been observed and found to be the evolutes of the rims [1], as the above considerations predict. In the case of a plane wave obliquely incident upon a flat plate, a more detailed analysis shows that the caustic of the singly diffracted rays approaches a cylindrical shape far behind the plate. The generators of this limiting cylinder are parallel to the shadow boundary (i.e., to the incident-ray direction). The cross section of this cylinder is the evolute of the cross section of the shadow, i.e., of the curve obtained by cutting the shadow with a plane perpendicular to the incident-ray direction. This shadow cross section is just a
projection of the rim of the plate. Thus, far behind a plate illuminated obliquely, there should be a bright line in any cross section of the shadow, and it should be the evolute of the boundary of the shadow. This prediction of the theory is also in agreement with the experimental observations of the bright lines in the shadows of various plates illuminated obliquely [2]. If the bright lines had also been observed closer to the plates, they should have been found to differ from the evolutes described above. However, such observations were not made. Let us now examine another example, in which a plane wave in a homogeneous medium is incident upon a wedge of a different material. The incident
rays corresponding to the plane wave will be reflected and refracted at the surfaces of the wedge in the usual manner. In addition, the rays which hit the edge will produce cones of diffracted rays, as in the preceding examples. However, now some diffracted rays will also be produced inside the wedge (see Fig. 5a). If i denotes the angle between an incident ray and the edge and r
denotes the angle between one of the resulting diffracted rays and the edge, then r is determined by the equation cos r = (1/n) cos i. Here n is the relative refractive index of the two materials; i.e., n is the velocity in the surrounding medium divided by the velocity in the wedge. Thus if the light velocity is faster within the wedge material, the angle r is smaller than i. Of the diffracted rays produced inside the wedge by a single incident ray, one proceeds along each wall of the wedge. These two rays are surface rays
A GEOMETRICAL THEORY OF DIFFRACTION
37
within the faster medium, if n < 1. Therefore they shed refracted rays back into the surrounding medium all along their paths. These refracted rays leave the surface at the critical angle i, determined by cos i, = n. They also lie in a plane normal to the surface. The "first" critically refracted ray coincides with one of the diffracted rays produced by the incident ray in the outer medium, as can be shown by simple trigonometry. Thus the shed refracted rays due to a single incident ray lie in a plane sector bounded by the outer diffracted cone and a wall of the wedge. The refracted rays shed from one wall of the wedge due to all the incident rays are all parallel to each other and thus form a plane wave. In each medium the cones of diffracted rays from the different points on the edge are also parallel to each other and form a conical wave. Some sections of these wavefronts are shown in Fig. 5b and 5c. When
n > 1, that is, when the faster medium is outside the wedge, the refracted rays are shed into the wedge, and then the resulting plane waves lie inside the wedge.
Next, suppose that a plane wave is incident upon an opaque cone, as in Fig. 6. Then, in addition to the usual reflected rays, diffracted rays will be produced by that incident ray which hits the vertex or tip of the cone. These
rays will go in all directions from the tip, and the corresponding diffracted wavefronts will be spheres with the tip as center. Let us now consider a plane wave normally incident upon an opaque cylinder of convex cross section in a homogeneous medium (see Fig. 7). In this case, in addition to the reflected rays, two surface rays will be produced by the two incident rays tangent to the cylinder. These rays will lie in a plane normal to the axis of the cylinder and will encircle the cylinder in opposite directions. At each point on its path each surface ray will shed a diffracted ray along the tangent to the cross section of the cylinder (see Fig. 7). Two diffracted rays will pass through each point in space, one coming from each of the surface rays. The two rays through a given point are the two tangents to the cross section which pass through that point. Actually, each of these rays represents an infinite number of diffracted rays, one being shed by the surface ray each time it encircles the cylinder. These rays account for the
illumination in the shadow region and also provide additional light in the lit region. The cross sections of the diffracted wavefronts, i.e., the surfaces orthogonal to the diffracted rays, are just the involutes of the cross section of the cylinder. Suppose that the cylinder of the previous example is composed of a homogeneous material with a lower light velocity and therefore a higher refractive index than the surrounding medium. Then the incident rays which hit the
cylinder will produce refracted rays inside the cylinder in addition to the reflected rays in the surrounding medium (see Fig. 8). These refracted rays will hit the cylinder surface and again produce reflected and transmitted rays, and this process will be repeated ad infinitum. All of these multiply reflected and transmitted rays are ordinary rays. In addition to them, the grazing rays will produce diffracted rays in the outer medium as before. But now the sur-
38
JOSEPH B. KELLER
face rays will also shed refracted rays into the cylinder. These refracted rays will leave the surface at the critical angle of refraction. Once inside the cylin-
der, these diffracted rays will hit the opposite surface and be reflected and refracted, etc. As a final example, let us consider a spherical wave incident upon the plane interface between two homogeneous media, as shown in Fig. 9. The corre-
sponding incident rays are straight lines emanating from the center of the spherical wave. They are reflected and refracted at the interface in the usual way. However, if the second medium has the faster light velocity, there is a critical angle at which the refracted ray is parallel to the interface.
Therefore
this critically refracted ray is a surface ray in the faster medium. Consequently, it sheds refracted rays back into the slower medium. These shed rays leave the interface at the critical angle. The corresponding diffracted wavefronts are cones, one of which is shown in cross section in Fig. 9. It appears there as a straight-line segment. 4. A generalization of Fermat's principle. In Sec. 2 we extended the laws of geometrical optics by introducing diffracted rays and giving an explicit characterization of them. Now we shall consider an alternative (but equivalent) extension of the laws of geometrical optics based upon a generalization of Fermat's principle. This principle, which is the basis of ordinary optics, involves the index of refraction n(x). This is a real positive function which characterizes the optical behavior of the medium. In terms of it the optical length L of any curve x(s) connecting two points P and Q is defined as
L = f Q n[x(s)] ds. The parameter s denotes are length. Fermat's principle states that the optical rays connecting P and Q are those curves which make L stationary in the class Co of all smooth curves joining P and Q. This principle applies to an unbounded continuous medium (i.e., one in which n is continuous). It does not apply to bounded or discontinuous media [i.e., media in which n(x) is discontinuous]. We may try to apply it to such media by considering, instead of CO, a class of curves with a finite number of corners. However this formulation turns out to be unsatisfactory,
because it yields only some rays, "direct" ones, but does not include any reflected rays.
In order to obtain a principle valid for ordinary optics in discontinuous media, we introduce for each integer r ? 0 the class of curves CT. This is the class of curves with exactly r points on the boundaries or discontinuity surfaces of the medium. These points are to be inner points of these surfaces; i.e., they may not lie on edges or vertices. Now we formulate Fermat's principle as follows: The rays are those curves in each class C, which make the optical length stationary in C,. Upon examining the consequences of this formulation, we find that the class Co yields rays which do not touch the
A GEOMETRICAL THEORY OF DIFFRACTION
39
boundary or discontinuity surfaces; C, yields singly reflected or refracted rays; and Cr yields r-tuply reflected and/or refracted rays. The preceding formulation of Fermat's principle for discontinuous and/or bounded media is presumably implicit in older formulations of geometrical optics. Although it includes reflected rays, it still fails to take account of diffracted rays. Therefore we shall further modify Fermat's principle by introducing additional classes of curves. For each triple of nonnegative integers r, s, and t we shall define the class Dr8a This class consists of curves with r smooth arcs on the boundary or discontinuity surfaces, s points on edges of the boundary or discontinuity surfaces, and t points on vertices of these surfaces. Any number of the r arcs may be degenerate arcs, i.e., points. To each are the value of n on one side of the surface is assigned. We now define
the rays as those curves in each class Drsa which make the optical length stationary in Dr,t. The class Don is the previously considered class Co, and thus it yields the direct rays. The class Droo contains the previously considered class Cr and thus yields r-tuply reflected and/or refracted rays. Each ray in any of the other classes has at least one point on an edge or vertex and is thus a new ray not included in ordinary optics. Some of the rays in Droo are also new rays, since they have arcs on the boundary or discontinuity surfaces. From the above extension of Fermat's principle a number of conclusions can be drawn which suffice to characterize the rays explicitly. First, let us
consider any smooth arc of a ray not containing a boundary point in its interior.
By applying the usual considerations of the calculus of variations, we conclude that each such arc must be an extremal, i.e., a solution of the Euler equations. Similarly, each boundary are must be a surface extremal. Second, by applying the appropriate considerations of the calculus of variations to each corner at an inner point of a boundary surface, we find that the law of reflection or the law of refraction must be satisfied, according as the two parts of the ray lie on the same or on opposite sides of the boundary. Third, at each inner point of a boundary edge, we find that a law of diffraction must be satisfied. This law states that the two parts of the ray make equal angles
with the edge, if they lie in the same region at the edge, and that the angles are related by Snell's law if they lie in different regions. Fourth, at a vertex the two parts of a ray may make any angles. Finally, at an inner point of a boundary, a surface extremal and an extremal in space may join together smoothly. The foregoing consequences of the extended Fermat principle are essentially the explicit rules given in Sec. 2 for the determination of the rays. Since these
consequences also suffice to make the optical length stationary in each class, it follows that our two prescriptions for determining rays are equivalent. All the preceding considerations are based on the assumption that the index of refraction n(x) is a piecewise smooth function of x. This means that space is divided into a finite number of regions in each of which n and its deriva-
40
JOSEPH B. KELLER
tives are continuous and have limits at the boundary. The boundary is also assumed to be piecewise smooth, i.e., to consist of a finite number of parts each having continuous derivatives which have limits at the edge. The edge is assumed to be piecewise smooth, i.e., to consist of a finite number of arcs each having continuous derivatives which have limits at each end point of each arc. The end points of the edge arcs are called vertices. Isolated points of the boundary at which the derivatives are discontinuous are also vertices. The extended form of Fermat's principle for discontinuous media is complicated, compared to the original form for continuous media. Therefore it is natural to inquire whether the complicated form can be deduced from the simple form by considering a discontinuous medium to be the limit of a family of continuous media. Then the rays of the discontinuous medium could be defined as the limits of families of corresponding rays in the family of continuous media.
The answer to this question is negative. It turns out that the indicated limit process yields only some, but not all, of the rays determined by the In particular, many of the reflected rays are not obtained by the limit process, viz., those rays reflected at any angle extended form of Fermat's principle.
from a slower medium or normally reflected from a faster medium. However, some diffracted rays are given by this limit process. This result may be sum-
marized by stating that the geometrical optics of discontinuous media is not the limit of the geometrical optics of continuous media. 5. Diffracted wavefronts. In ordinary geometrical optics one always deals with normal congruences of rays. A normal congruence is a family of rays, all of which are normal to some surface. Such a surface is called a wavefront. The theorem of Malus guarantees that a normal congruence remains a normal congruence after reflection or refraction. Therefore reflected and refracted wavefronts can always be defined. Now suppose some of the rays of a normal congruence undergo diffraction. If the resulting diffracted rays also form a normal congruence, then diffracted wavefronts can be defined as the surfaces normal to the family of diffracted rays. That this is indeed the case can be proved, providing an extension of Malus's theorem to diffraction. The proof will not be given here. Some diffracted wavefronts have already been considered in the examples of Sec. 3.
In ordinary geometrical optics we define the eiconal, or phase, function P(P) at a point P as the optical distance to P from some fixed wavefront, measured along an ordinary ray. We then show that ' satisfies the eiconal equation (VT)2 = n2 and that the surfaces = constant are wavefronts. We also find that '(P) is double-valued if an incident and a reflected ray pass through P. These two values of ' become equal as P tends toward the reflecting surface. Thus the reflecting surface is a branch surface of T. If many rays-incident, reflected, and refracted-pass through P, then T(P) is many-valued, and the reflecting and refracting surfaces are the branch surfaces on which two or three different branches are equal. All the foregoing considerations can also be applied to diffracted rays.
We
A GEOMETRICAL THEORY OF DIFFRACTION
41
first define the eiconal I(P) as the optical distance to P from some fixed wavefront measured along any ray, ordinary or diffracted. We can then show that
' still satisfies the eiconal equation. With this new definition '(P) is even more multiple-valued than before. Not only are the boundaries and discontinuity surfaces of n(x) branch surfaces of T, but the discontinuity surfaces of derivatives of n(x) are also branch surfaces. Furthermore the edges and vertices are branch lines and branch points of T. In ordinary geometrical optics, in a continuous unbounded medium, it is possible to utilize the wavefronts and the eiconal equation as a basis for geometrical optics. We prescribe some smooth surface as an initial wavefront and consider a solution '' of the eiconal equation which has the value zero on the given surface. Because the eiconal equation is quadratic, there are two such solutions, but they differ from each other only in sign. For either solution we define the surfaces ' = constant as a family of wavefronts and the orthogonal trajectories of these wavefronts as rays. These rays are exactly the same as the rays given by Fermat's principle. By choosing different initial surfaces, we obtain precisely all the rays given by Fermat's principle. Thus
we see that in this case the wavefront formulation is equivalent to the ray formulation. No similar formulation in terms of wavefronts has been given for ordinary geometrical optics in bounded or discontinuous media. This is undoubtedly due to the occurrence of diffraction in such media. However, the present theory, which includes diffraction, can presumably be formulated in terms of wavefronts for any medium. To this end we proceed as above, by prescribing some smooth surface as an initial wavefront and by considering a solution I of the eiconal equation which is zero on the given surface. Then we define the surfaces ' = constant to be wavefronts and their orthogonal trajectories to be rays, just as before. The only difference is that now we must consider a multiple-valued solution which has as branch points, lines, and surfaces the vertices, edges, and surfaces determined by the boundaries and by the refractive index and its derivatives. This solution must be complete, in the sense that it must branch at every permissible branch point, line, or surface. Unfortunately, the foregoing requirements do not determine a unique solution T. Other conditions, perhaps at the boundaries or discontinuity surfaces, must be imposed to obtain uniqueness. Consequently, the equivalence of the ray and wavefront formulations of our theory is not yet demonstrated, since the wavefront formulation is not complete. In two dimensions, an interesting wavefront results from diffraction by an object bounded by a smooth convex curve. The diffracted wavefronts are the involutes of the curve. In three dimensions, toroidal wavefronts result from diffraction of a normally incident plane wave by a circular disk. 6. Imaginary rays. In an unbounded medium in which the refractive index and all its derivatives are continuous, no diffracted rays occur. This is clear from the above laws governing diffraction. Therefore in such a medium the
JOSEPH B. KELLER
42
theory so far presented coincides with ordinary geometrical optics. However, ordinary geometrical optics sometimes yields shadows in such media. An example is the region on that side of a caustic surface through which no rays pass (see Fig. 11). Experimentally some light is observed in these shadows. Since our theory fails to account for this light, the theory is incomplete. To complete the theory, we introduce another new type of ray, which we call an imaginary ray. Such a ray is a complex-valued solution of the ray equations. Thus, an imaginary ray in a homogeneous medium is a complex straight line . The definition presupposes that n(x) C aus ti c is analytic or piecewise analytic. Now we may consider an analytic normal congruence of real rays. By analytic we mean that the 11
rays of the congruence are analytic functions of two real parameters. Then complex values of these parameters determine imaginary rays of the same congruence. Therefore every analytic congruence contains imaginary rays. Some of them will enter the shadows of the type considered above and thus account for the light obFin. 11. A set of rays forming a caustic, or envelope. The shadow on one
side of the caustic is devoid of real
served there. To see this, consider a two-dimensional homogeneous medium. Suppose a given
curve C is a caustic, i.e., an envelope of a normal congruence of rays. These rays are then straight lines tangent to C. Let t denote arc length along C, and let the parametric equations of C be x = x(t), y = y(t). Then the equations of a ray tangent to C at the point [x(t),y(t)] are
rays.
(1)
x = x(t) + si(t), y = y(t) + sy(t).
In (1) the parameter s is the signed distance from [x(t),y(t)] on C to (x,y) on the ray. If the point (x,y) is given, the rays through it are determined by the solutions s,t of (1). Each solution yields one ray through (x,y). If C is convex, there will generally be two rays through each point on the convex side of C but no real rays through any point on the concave side. This is because there are no tangents from such points to C. However, if C is analytic, (1) may have complex solutions for s and t. Then for each solution s,t the point (x,y) lies on the complex ray (1) which is tangent to Cat the complex point [x(t),y(t)].
The value of s is the complex distance from (x,y) to the point of tangency, and (x,y) is the only real point on the ray. As an example, suppose that C is a circle of radius a. Let the polar coordi-
A GEOMETRICAL THEORY OF DIFFRACTION
43
nates (a,o) denote a point on C and let (r,0) denote a point off C.
Then (1)
becomes
r cos 0 = a cos 0 - s sin r sin 0 = a sin ¢ + s cos
(2)
Solving these equations for the point of tangency 0 and the distance s, we obtain
= 0 ± cos-,
(3)
s=
(4)
{
a
r r2 - a2.
From (3) we see that, for r ? a, there are two real values of gyp, if cos-1 is restricted to the range 0 to 7r. There are likewise two real values of t, one corresponding to each value of 0. If r < a, (3) and (4) do not yield real values of 0 and s, but they do give the complex values (5)
= 6 ± i cosh-1 a
(6)
s= +i1/a2-r2.
r
Thus two complex straight lines through (r, 0) are tangent to the circle r = a at the two points with 0 coordinates given by (5). These complex lines are the imaginary rays through (r,0) which belong to the normal congruence having the circle r = a as caustic. As a second example, consider as caustic the parabola x' = ay'2. If a line through (x,y) is tangent to the parabola at (x',y'), then
y --y
y'.
Solving for the point of tangency, we obtain
y'=y± y2- xa
(8)
Equation (8) shows that there are two, one, or no real points of tangency according as (x,y) is outside, on, or inside the parabola. In the latter case there are two complex points of tangency and thus two imaginary rays through (x,y). The distance s from (x,y) to the point of tangency is given by (9)
s=
y2-21 11+8a2y22-4ax±8a24y2-a
This distance is complex for points inside the parabola. 7. The complex eiconal, or phase, function. An analytic normal congru-
ence of rays may be defined as the set of rays normal to a given analytic Such a congruence contains imaginary as well as real rays. By means of the real rays, we have already defined the real eiconal, or phase, surface S.
JOSEPH B. KELLER
44
function ci(P). This is the optical length to P from S along a ray of the congruence. Now we may define the complex eiconal '(P) in exactly the same way by means of the complex rays. It is readily seen that '(P) is complex and that it satisfies the eiconal equation. Furthermore, it is the analytic continuation of the real phase function, which is defined only at points P lying on real rays of the congruence. In the same way, we can obtain the analytic continuation of the solution of the Cauchy problem for the eiconal equation. In this problem a surface 8, not necessarily a wavefront, is given, and a function To of position on S is also given. We are to find a solution NY of the eiconal equation such that 4' = To on S. This problem is usually solved by means of certain real rays through S. The analytic continuation can be obtained by applying the same considerations to the imaginary rays through S. As an example, let us consider a two-dimensional homogeneous medium with
n(x) = 1. We seek a solution 4' of the eiconal equation (VT)2 = 1 having the value T = t on C.
As before, t denotes are length along C.
Since the deriva-
tive of ' along C is unity, and since the length of VT is unity, we see that VT is tangent to C. Therefore the problem we have posed is a characteristic boundary-value problem, since C is everywhere characteristic (i.e., tangent to the rays). This problem has two solutions because the eiconal equation is quadratic. From the eiconal equation the derivative of T along a ray is ± 1. Therefore the solutions are given in terms of the parameters s and t by equations (1) and
* = s + t.
(10)
At points (x,y) for which s and t are real, 4' is also real. However, for points lying on imaginary rays, both s and t are complex; hence 4' is complex. In the case of the circle treated above, s is given by (4) or (6) and r = ac, where 0 is given by (3) or (5). Then from (10) we have in this case (11)
4' =aO±acos-llal+ r/ /P2-a2,
(12)
'P = aB -{- i La cosh-1 1 aJ - 1/a2 - r2J,
rr
/\
l
r
(r
- a),
(r
a).
For the parabola previously considered, if t = 0 at x = y = 0, we find (13)
t = a,
4a2y'2 - 1 + 4a log (lay' + 1/4a2y'2
1).
When this value of t is used in (10), together with s, given by (9), two solutions 4, result which are real on and outside the parabola but complex inside it. Complex solutions of the eiconal equation can also be obtained without making use of imaginary rays. As an example, consider the above problem for any curve C. Let a(t) be the radius of curvature of C and p be the distance
A GEOMETRICAL THEORY OF DIFFRACTION
45
along the normal to C, measured positively toward the convex side of C. In terms of p and t the eiconal equation becomes 2
*2
(14)
+ G_%) * = 1.
On the basis of the explicit solutions above, we assume that ' has the form (15)
`Y = t + Pq j b2(t)p1'2. =o
Upon inserting (15) into (14) and utilizing the boundary condition on C, we find (16)
t-I p
2
3
2 {-
a
ap
6a
+a2-3aa-27 a
45
The remaining coefficients can be found from a recursion formula which we will omit. The result (16) shows that ' is real for p > 0 and complex for p < 0, and that the imaginary part of T is proportional to 1pji for p small. Let us write = R + i7, where R and I are real. Then the eiconal equation yields (17) (18)
(yR)2 = n2 + (VI)2.
Equation (17) shows that the surfaces R = constant are orthogonal to the surfaces I = constant. In the next section these surfaces will be shown to be surfaces of constant phase and of constant amplitude, respectively, for a field
associated with '.
Thus (17) shows that for this field these surfaces are
mutually orthogonal. 8. Field and amplitude. To make our theory quantitative, we associate a field u(s) with each ray. It is composed of an amplitude A (s) and a phase W(s) in the form (19)
u(s) = A(s)e°l;*(
.
In (19), k = w/c is the propagation constant, determined by the angular frequency w of the field and the propagation velocity c in empty space. Equivalently, k = 21r/a, where X is the wavelength of the field in empty space. Thus our construction applies to a time-harmonic field. The time factor a-i" will be omitted. The total field at a point P is the sum of the fields (19) on all rays through P. When we deal with light, u is either the electric or the magnetic field and therefore A is a vector. However, our theory also applies to other types of field (e.g., acoustic pressure). For simplicity we shall describe it for a scalar u and then indicate the modifications which occur for vector fields.
JOSEPH B. KELLER
46
We first assume that the phase difference '(P) - 4, (Q) between two points on a ray is equal to the optical length L of the ray from Q to P. We also assume that a direction of propagation is associated with each ray and that From these assumptions it follows that can be determined at any point P if it is known at some point Q on the same ray: 41 increases in this direction.
*(P) _ *(Q) -I-- L.
(20)
We further require that ' be constant on some wavefront of a normal congruence of rays. Then F is just the eiconal, or phase, function previously introduced.
Next we assume that the principle of conservation of energy applies in its optical form. This states that the energy flux is the same at every cross section of a tube of rays. We assume that the energy flux per unit area is proportional to nA2. Then the energy principle yields, for a narrow tube of rays, nA2 do- = noA2 dvo.
(21)
Here n and A are evaluated at a point P on a ray in the tube, and do- is the cross-sectional area of the tube at P. The quantities no, Ao, and dvo are evaluated at some other point Q of the same ray. From (21) we obtain A=Ao
(22)
no dvo
V n do-
Thus we can compute A at any point P on a ray, provided that we know the amplitude A o at some point Q on the same ray. The ratio dvo/dv in (22) is the ratio of the areas of the cross sections at P and Q. Since these cross sections are portions of wavefronts, this ratio is just the Jacobian of the mapping from a wavefront at P to that at Q by means of rays. When A is a vector, we assume that its amplitude satisfies (22). Its direction, if A is an electric or magnetic field, is obtained from Ao by parallel transport along the ray with respect to the metric n ds. If pi and p2 denote the principal radii of curvature of the wavefront at Q, then, in a homogeneous medium, the corresponding radii at P are pi + s and p2 + s. Here s denotes the distance along the ray from Q to P. Since the area ratio is inversely proportional to the ratio of Gaussian curvatures, (22) becomes (23)
_
lI
PIP2
A = A0 (Pi+S)(P2+S)J
From (23) and (20) we see that, in a homogeneous medium, (24)
Here To = *(Q).
u($) =
P1P2
(Pi + s) (P2 + s)
Apezhc*o+n8>.
A GEOMETRICAL THEORY OF DIFFRACTION
47
The field (24) becomes infinite at two, one, or no points on a ray, according as both, one, or neither of the radii of curvature are finite. These points are on the caustics of the ray congruence. In these various cases, u decays for large s likes 1, s 1, or so, that is, as in a spherical, cylindrical, or plane wave. Later we shall indicate how to modify our theory in order to obtain a finite value for u on a caustic. In homogeneous media it is often convenient to measure s from a point Q on the caustic C. To do this, we first rewrite (23) in the form A[(P1 + s) (P2 + s)pa llf = Aopi
(25)
The left side of (25) has a limit as Q tends to C, and therefore the right side must also. This is understandable since A0 becomes infinite and pi becomes zero as Q tends to C. Let us denote this limit by Ao = lim Aopi. Then (24) becomes
u(s) =
(26)
1
[SP2 + s)
In a two-dimensional medium, or for cylindrical waves in three dimensions, P2 is infinite, and (26) becomes
u(s) = s
(27)
-A'eik(111o+ns).
0
In two dimensions, as we have seen, To = nt, where t denotes arc length along C. Furthermore, since Ao varies from ray to ray, we may designate each ray by its point of tangency t and write Ao = Ao(t). Then (27) becomes (28)
u(s,t) = s-iAo(t)ei1;1(t+8)
If two rays pass through a point P, as is often the case near a caustic, then u(P) is a sum of two terms of the form (28). Let us apply (28) to the congruence of rays tangent to a circular caustic of radius a. Making use of our previous results for s and t, we obtain for r > a, (29)
u(r7 B) =
A0[aO 4- a cos' -(a/r)]
eikn[a0-acos-1(a/r)+1/rs-aEj
12 - a2
+
AO[aO + a cos' (air)]
s/r2-a2
eikn[a0+acos-1(a/r)-
r2-01 -i(r/2).
For r < a we also obtain two terms, each corresponding to one of the imaginary rays through the point (r,B). One of these terms increases with distance from the caustic, whereas the other decreases. We now assume that the increasing term must be omitted. Then we obtain for r < a (30)
u(r B) = 7
tiai cosh-1 (air)] ekn[iaB{acosh i(a/r)-VaP-r=j-(ir/4)
4
a2
- 7'2
For the result (30) we require that A0(t) be an analytic function of t.
JOSEPH B. KELLER
48
Let us now consider the function v(r,e) defined by (31)
v(r,O) = B[Hnkd(nkr) -
H;,k'¢(nkr)]einxme.
This function is an exact solution of the reduced wave equation in two dimensions if B is a constant. We now expand it asymptotically for large nka and nkr. This yields exactly (29) for r > a and (30) for r < a, provided that Ao is constant and that B = ea-4 kn7r/2 A'.0 This agreement indicates that our construction yields the leading term in the asymptotic expansion with respect to k, for k large, of the exact solution of the wave equation. We believe that this is always the case. So far, we have described how the amplitude varies along a ray. Now we shall explain how the initial value of the amplitude is to be determined. First,
on rays which come from a source-even if it is at infinity-the amplitude must be prescribed.
This prescription characterizes the source. Second, on a
reflected or refracted ray at the point of reflection or refraction, we assume that the amplitude is proportional to that on the corresponding incident ray at this point. The proportionality factors are called reflection and transmission coefficients, R and T, respectively. For vector fields these coefficients are matrices. Third, on a ray diffracted from an edge or vertex we assume
that the field is also proportional to that on the corresponding incident ray at the point of diffraction. The proportionality factor we call a diffraction coefficient (or matrix, in the vector case). Additional hypotheses must be made to treat the fields on diffracted rays which have arcs on boundaries, but we shall not consider them here.
We assume that the various coefficients just introduced are determined solely by local conditions at the point of reflection, refraction, or diffraction. Thus, for example, the reflection and transmission coefficients depend only upon the angle between the incident ray and the surface normal as well as upon the properties of the media at the point of reflection. Therefore they can be determined from the solution of a canonical problem, that of reflection and refraction of a plane wave at a plane interface. The diffraction coef-
ficients can also be obtained from the solutions of appropriate canonical problems. The various coefficients depend upon the type of field under consideration. Sound waves will have different coefficients from water waves, electromagnetic waves, or other waves. Consequently, these coefficients must be determined separately for different fields. Mathematically, this difference will be mani-
fested by the differential equations and boundary conditions which occur in the canonical problems. Canonical solutions can also be used to modify the results of our theory at and near caustics. Thus, for example, let us again consider a two-dimensional homogeneous medium in which a circular caustic occurs. Our results (29) and (30) for the field u become infinite on the caustic r = a. But the function v in (31), which is asymptotic to u for large k, remains finite on the
A GEOMETRICAL THEORY OF DIFFRACTION
49
caustic.
Therefore we can use v instead of u on and near the caustic in order to obtain a finite value for the field. We can also assume that a finite value for the field at a point on any caustic can be obtained from the field u off the caustic by the same correction factor, involving the radius of curvature of the caustic at the point. 9. Relation to other work. Some types of diffracted rays and diffracted wavefronts have already occurred in the solutions of particular diffraction problems.
Some others have been observed experimentally or have been introduced to explain particular experimental results. We will now describe some of this previous work. First we recall Thomas Young's proposal that diffraction through an aperture in a screen is an edge effect. This proposal is in agreement with the present theory, which even makes it precise. Next we note that Sommerfeld's solution of Maxwell's equations for two-dimensional diffraction of waves by a half plane contains a cylindrical wave emanating from the edge [3]. The cylindrical wavefronts of this wave are just the diffracted wavefronts, and the normals to these cylinders are the diffracted rays, of our theory. The solutions of Sommerfeld and Macdonald for two-dimensional diffraction by wedges also contain cylindrical waves emanating from the edge. Their solutions for the three-dimensional case contain the cone of diffracted rays from each edge point. The bright lines in the shadows of plates, observed by G. G. Becknell and J. Coulson [1,2] have already been mentioned and explained in terms of our theory. Later Nijboer observed similar bright lines in the diffraction patterns of apertures. He introduced diffracted rays emanating normally from the edge and found that the caustics of these rays were exactly the observed bright lines. The present theory predicts the bright spot on the axis of a circular disk, as was noted above. This result is particularly interesting, because the observation of the bright spot was a strong argument for the wave theory of light. We now see that this result is also predicted by a ray theory. Therefore, if this ray theory had been available at the time of the controversy between ray and wave theory, it might have forestalled the acceptance of the latter. The field diffracted through an aperture in a screen can be represented as an integral over the aperture and screen. Using Kirchhoff's approximate
values for the integrand, A. Rubinowicz [4] reduced this integral to a line integral along the aperture rim and evaluated it by the method of stationary phase. The stationary points which he obtained for a given field point P coincide exactly with the places on the edge at which the diffracted rays through P are produced. N. G. van Kampen [5] evaluated asymptotically the integrals given by the modified Kirchhoff method. His result also contains one stationary point corresponding to each edge-diffracted ray through P, and in addition one stationary point corresponding to each corner of the edge, accounting for the corner-diffracted rays. R. M. Lewis, B. D. Seckler, and the present author [6] have obtained similar results from W. Braunbek's [7] modification of the Kirchhoff theory.
50
JOSEPH B. KELLER
Surface rays appear in the asymptotic expansion for large ka of the field diffracted by a sphere or cylinder of radius a. This was originally shown by G. N. Watson and elaborated by B. van der Pol and H. Bremmer [8], B. Friedman [9], I. Imai [10], W. Franz [11], and others. The tangent rays shed by these surface rays are exhibited in the exact solution of W. Franz [11] and the approximate solution of W. Franz and K. Depperman [12]. The latter authors showed that calculations of radar reflection from cylinders, based on the idea of surface rays, agreed excellently with the measurements of Limbach. F. G. Friedlander [13] introduced surface rays and the associated wavefronts in studying diffraction by cylinders of convex cross section. Surface rays produced by refraction at the critical angle occur in the work of E. Gerjuoy [14] and of L. Brehovskih [15]. These authors examined the field produced by a point source near a plane interface between two media, in the high-frequency limit. They found that each critically refracted ray gave rise
to the appropriate diffracted rays. Such rays have been observed experimentally in acoustics. Spherical waves emanating from the tip occur in the solution for the field diffracted by a circular or elliptic cone. The wavefronts and rays of these
waves are just the diffracted wavefronts and rays predicted by the theory in this case. The rays leave the vertex in all directions. Rays reflected from surfaces of discontinuity of derivatives of the index of refraction do not seem to have been considered before. However, the fact that such discontinuities do reflect at normal incidence was noticed by J. Feinstein [16] and S. A. Schelkunoff [17]. The possibility of using rays in a systematic way for the calculation of fields
was investigated by R. K. Luneberg [18]. He suggested that the ray construction would yield the leading term in the asymptotic expansion of the field for large k. The procedure for obtaining further terms in this asymptotic expansion was given by M. Kline [19], both for Maxwell's equations and for more general equations. Other authors have considered the same type of expansion for various equations. Thus F. G. Friedlander [20], H. Bremmer [21], and E. T. Copson [22] also considered Maxwell's equations; S. C. Lowell [23] considered waves in shallow water; J. B. Keller [24] considered weak shock waves; G. D. Birkhoff [25], L. Brillouin [26], G. Wentzel [27], P. A. M. Dirac [28], and J. B. Keller [29] considered the Schroedinger equation of quantum mechanics; F. G. Friedlander and J. B. Keller [30] considered the reduced wave equation; and W. J. Trjitzinsky [31] considered a very general linear equation. All of these authors restricted their attention to the rays of ordinary geometrical optics.
Many diffraction problems have been solved with the ray method by C. Schensted [32], J. B. Keller, R. M. Lewis, and B. Seckler [33], J. B. Keller [24,29,34,35], K. 0. Friedrichs and J. B. Keller [36], B. R. Levy and J. B. Keller [37], S. N. Karp and J. B. Keller [38], B. D. Seckler and J. B. Keller [40], etc. Whenever possible, the fields constructed by the ray method were compared
A GEOMETRICAL THEORY OF DIFFRACTION
51
with asymptotic expansions (for large k) of exact solutions. In all such cases perfect agreement was obtained. In other cases numerical results were compared, and good agreement was obtained for ka >_ 2, where a is a typical length in the problem. All of these results suggest that the ray method does yield the leading terms in the asymptotic expansions of solutions of diffraction problems. However, a general proof of this statement has not yet been obtained. Partial results of this kind are given by R. K. Luneberg [18], M. Kline [19], W. J. Trjitzinsky [31], W. L. Miranker [39], and R. M. Lewis [41]. BIBLIOGRAPHY
1. J. Coulson and G. G. Becknell, Reciprocal diffraction relations between circular and elliptical plates, Phys. Rev. vol. 20 (1922) p. 594. 2. and , An extension of the principle of the diffraction evolute and some of its structural detail, Phys. Rev. vol. 20 (1922) p. 607. 3. A. J. W. Sommerfeld, Optics, Academic Press, Inc., New York, 1954. 4. A. Rubinowicz, The diffraction waves in Kirchhoff's theory of diffraction phenomena, Ann.
Phys. vol. 53 (1917) p. 257. 5. N. G. van Kampen, An asymptotic treatment of diffraction problems, Physica vol. 14 (1949) p. 575.
6. J. B. Keller, R. M. Lewis, and B. D. Seckler, Diffraction by an aperture, II, New York Univ. Inst. Math. Sci. Research Rep. EM-96 (1956); J. Appl. Phys. vol. 28 no. 5 (May, 1957) pp. 570-579. 7. W. Braunbek, Neue Naherungsmethode fur die Beugung am ebenen Schirm, Zeit. Physik vol. 127 (1950) p. 381. , Zur Beugung an der Kreisscheibe, Zeit. Physik vol. 127 (1950) p. 405. 8. H. Bremmer, Terrestrial radio waves, Elsevier Press, Inc., Houston, 1949. 9. B. Friedman, Comm. Pure Appl. Math. vol. 4 (1951) p. 317. 10. I. Imai, Die Beugung electromagnetischer Wellen an einem Kreiszylinder, Zeit. Physik vol. 137 (1954) pp. 31-48.
11. W. Franz, Zeit. Natur. vol. 9a (1954) pp. 705-716. 12. K. Depperman and W. Franz, Theorie der Beugung an der Kugel unter Berucksichtigung der Kriechwelle, Ann. Phys. Ser. 6 vol. 14 (1954) pp. 253-264. 13. F. G. Friedlander, Proc. Cambridge Philos. Soc. vol. 38 (1942) p. 383. 14. E. Gerjuoy, Comm. Pure Appl. Math. vol. 6 (1953) p. 73. 15. L. Brekovskih, Tech. Phys. USSR vol. 18 (1948) p. 455.
16. J. Feinstein, Trans. IRE, PGAP, AP-2 (1954) p. 23. 17. S. A. Schelkunoff, Comm. Pure Appl. Math. vol. 4 (1951) p. 181. 18. R. K. Luneberg, The mathematical theory of optics, Brown University, 1944. Propagation of electromagnetic waves, New York University, 1948. 19. M. Kline, An asymptotic solution of Maxwell's equations, Comm. Pure Appl. Math. vol. IV no. 2-3 (August, 1951), pp. 225-263. Asymptotic solution of linear hyperbolic partial differential equations, J. Rational Mech. Anal. vol. 3 no. 3 (May, 1954).
20. F. G. Friedlander, Geometrical optics and Maxwell's equations, Proc. Cambridge Philos. Soc. vol. 43 part 2 (1946) pp. 284-286. 21. H. Bremmer, The jumps of discontinuous solutions of the wave equation, Comm. Pure Appl. Math. vol. IV no. 4 (November, 1951) pp. 419-427. 22. E. T. Copson, The transport of discontinuities in an electromagnetic field, Comm. Pure Appl. Math. vol. IV no. 4 (November, 1951) pp. 427-435.
52
JOSEPH B. KELLER
23. S. C. Lowell, The propagation of waves in shallow water, Comm. Pure Appl. Math. vol. 2 no. 2-3 (1949) pp. 275-291. 24. J. B. Keller, Geometrical acoustics, I, The theory of weak shocks, J. Appl. Phys. vol. 25 no. 8 (August, 1954) pp. 938-947. 25. G. D. Birkhoff, Some remarks concerning Schroedinger's wave equation, Proc. Nat. Acad. Sci. U.S.A. vol. 19 (1933), pp. 339-344; and in Collected mathematical papers Vol. II, American Mathematical Society, 1950, pp. 813-818. Quantum mechanics and asymptotic series, Amer. Math. Soc. Bull. vol. 39 (1933) pp. 681-700; and in Collected mathematical papers, Vol. II, American Mathematical Society, 1950, pp. 837-856. 26. L. Brillouin, Remarques sur la mecanique ondulatoire, J. Phys. Radium vol. 7 (1936) pp. 353-368. Las mlcanique ondulatoire; une methode ginerale de resolution par approximations successives, C. R. Acad. Sci. Paris vol. 183 (1926) p. 24. 27. G. Wentzel, Eine Verallgemeinerung der Quantenbedingungen fur die Zwecke der Wellenmechanik, Zeits. Physik vol. 38 (1926) p. 518. 28. P. A. M. Dirac, The principles of quantum mechanics, Oxford University Press, London, 3d ed., 1947, pp. 121-123. 29. J. B. Keller, Derivation of the Bohr-Sommerfeld quantum conditions from an asymptotic
solution of the Schroedinger equation, New York Univ. Inst. Math. Sci. Research Rept. CX-10 (July, 1953). 30. F. G. Friedlander and J. B. Keller, Asymptotic expansions of solutions of (V2 + kz)u = 0, New York Univ. Inst. Math. Sci. Research Rep. EM-67 (September, 1954); Comm. Pure Appl. Math. vol. 8 no. 3 (August, 1955) pp. 387-394. 31. W. J. Trijitzinsky, Analytic theory of parametric linear partial differential equations, Rec. Math. vol. 15 (1944) p. 179. 32. C. Schensted, The electromagnetic transport equation and the Luneberg-Kline method of solution, Univ. of Michigan, Eng. Research Inst. Rep. 15-25-(504)-3. 33. J. B. Keller, R. M. Lewis, and B. D. Seckler, Asymptotic solution of some diffraction problems, New York Univ. Inst. Math. Sci. Research Rep. EM-81 (1955); Comm. Pure Appl. Math. vol. 9 (1956) p. 207. 34. J. B. Keller, Diffraction by an aperture, I, New York Univ. Inst. Math. Sci. Research Rept. EM-92 (1956); J. Appl. Phys. vol. 28 no. 4 (April, 1957) pp. 426-444. 35. , Trans. IRE, PGAP, AP-4 (1956) pp. 312-321. 36. K. 0. Friedrichs and J. B. Keller, Geometrical acoustics, II; Diffraction, reflection and refraction of a weak spherical or cylindrical shock at a plane interface, Jour. Appl. Phys. vol. 26 (1955) pp. 961-966. 37. J. B. Keller and B. Levy, Diffraction by a smooth object, to be published, New York Univ. Inst. Math. Sci. EM Series; B. R. Levy and J. B. Keller, Diffraction by a smooth object, New York Univ. Inst. Math. Sci. Research Rep. EM-109 (December, 1957). 38. S. N. Harp and J. B. Keller, Diffraction by an aperture, III; to be published, New York Univ. Inst. Math. Sci. EM Series. 39. W. L. Miranker, The asymptotic theory of solutions of Au + kzu = 0, New York Univ. Inst. Math. Sci. BR-21 (1956). 40. B. 0. Seckler and J. B. Keller, Diffraction in inhomogeneous media, New York Univ. Inst. Math. Sci. Research Rep. MME-7 (December, 1957). 41. R. M. Lewis, Discontinuous initial value problems and asymptotic expansion of steadystate solution, New York Univ. Inst. Math. Sci. Research Rep. MME-8 (December, 1957). INSTITUTE OF MATHEMATICAL SCIENCES, NEW YORK UNIVERSITY,
NEW YORK, N.Y.
UPPER AND LOWER BOUNDS FOR EIGENVALUES' BY
J. B. DIAZ
1. Introduction. The problem of finding general methods for the approximation of eigenvalues of self-adjoint differential problems has attracted a great deal of attention in the scientific literature. The well-known Rayleigh-Ritz method (see Rayleigh [62], Ritz [64,65], and H. Poincar6 [58]) furnishes upper bounds for the eigenvalues of a differential eigenvalue problem. On the other hand, A. Weinstein [90], in connection with certain eigenvalue problems of the
theory of plates, developed a method for obtaining lower bounds for eigenvalues. It seemed to be of interest, in accordance with the theme of this Symposium, to present an account, as self-contained as possible, of the under-
lying ideas of these two fundamental methods for the approximation of eigenvalues. In order to proceed quickly, the following points of view have been adopted:
First of all, for definiteness, attention has been focused on a particular differential problem, that of a vibrating clamped plate, which was used by Weinstein [90] in developing his method originally. It has also been found convenient, so as not to encumber the exposition unduly, to omit for the most part all the
relevant differentiability hypotheses required of the "arbitrary" functions occurring in the discussion. The eigenvalues are supposed to be defined by means of certain variational problems, and the question of the equivalence between these problems and the corresponding differential problems is not analyzed. The variational problems are used in Sec. 2 as a basis for all later considerations. The Rayleigh-Ritz method is dealt with in Sec. 3, and the Weinstein method is treated in Sec. 4. The last section contains a number of remarks,
which were placed together at the end in order not to interrupt the trend of thought of the preceding sections. During the intervening years since the publication of Weinstein's volume in the series Memorial des sciences mathematiques, his method has been developed further by himself and other mathematicians, notably N. Aronszajn. A unified presentation of the Rayleigh-Ritz and the Weinstein methods for the approximation of the eigenvalues of operators in a Hilbert space is contained
in Aronszajn's report [2]. It would by far exceed the modest aims of the present account to even mention the many recent contributions of Aronszajn and his colleagues which appear among the items listed in the bibliography to this paper. 1 This research was supported by the U.S. Air Force through the Air Force Office of Scientific Research of the Air Research and Development Command under Contract No. AF18(600) 573. 53
J. B. DIAZ
54
The detailed treatment of a numerical application of Weinstein's method is not attempted here. However, attention is called to the many numerical applications made by Weinstein and his colleagues and also to Aronszajn [4, pp. 26-40], where one finds an application of the methods of Weinstein and of Rayleigh-Ritz to the computation of lower and upper bounds for the first 13 eigenvalues of a vibrating clamped square plate. Quite interesting numerical and theoretical considerations relative to plate problems are to be found in the report by Aronszajn and Donoghue [7].
It is hoped that the present summary of the basic ideas behind these two methods for the approximation of eigenvalues will make them more readily accessible to workers in related fields of mathematical physics. 2. The basic inequalities. The eigenvalue problem in question is the following; (1)
Mw - a2w = 0,
(2)
w
an=
(on D), (on C),
0,
where D is a bounded plane open connected set with a smooth boundary C, where the operator 0 = a2/ax2 + a2/ay2 is the two-dimensional Laplacian, and where a/an denotes differentiation in the direction of the outer normal to C. If w 0 0 is an eigenfunction of (1),(2) corresponding to the eigenvalue A2, then an application of Green's identity (3)
fc (
.
an -
a
) ds
yields at once that (4)
0 < X2 =
(Ow,Ow) (w,w)
=
(w,MMw).
(w,w)
Notice that here, as elsewhere in the sequel, the usual notation (5)
(f,g) = ff f(x,y)g(x,y) dx dy D
for the scalar product of the two functions f and g has been employed. The relation (4) explains why the eigenvalues in (1) have been designated by X2; one may also suppose that X > 0, for definiteness. To each eigenvalue X2 of (1),(2) corresponds a positive integer, called its
multiplicity, which is the maximum number of linearly independent, not identically zero solutions of (1),(2). Sometimes, if the number X2 is not an eigenvalue of (1),(2), it will be convenient to express this by saying in this case that the multiplicity of X2 is zero. Let the eigenvalues of (1),(2), with due account being taken of their multiplicity (i.e., each eigenvalue being represented exactly as many times as its multiplicity), be denoted by the following
UPPER AND LOWER BOUNDS FOR EIGENVALUES
55
nondecreasing sequence:
(6)
0 < X1 < X2 < X3
0, .
.
1 (1411)
.
and that also the Gram determinant
(24)
> 0.
(Di,Olpj) (Olj,41) The inequality (23) follows from the given linear independence of the j func, ¢j and the inequality (24) from the linear independence of tions +/'1, .
.
.
, A4,j, which will now be proved. Suppose, on the , Apj are linearly dependent. Then contrary, that the functions 0V,1, there exist real numbers C1, . . , Cj, not all zero, such that
the j functions AY'1,
C10411 .+
_{... Cj
4,j = 0(Ci4,i -+
.+ C5Y'j) = 0,
(on D).
But, in addition
C1+41+ ... +C,"pj=(C1lY1+ ... +C41j) =0, an
and thus C1+001+
.
. +CM=0,
(on C),
(on D),
contradicting the linear independence of the j functions 71, . , .kj. From the preceding remarks and from (22) one sees that the m eigenvalues X (Sm), , X (Sm) are precisely the m (positive) roots, in increasing order of magni-
J. B. DIAZ
60
tude, of the mth-degree polynomial in X which is given by the mth-order determinant (25)
(0/' ,i b) -
II,,
1'1,''II/'''1)
If the functions
(01//,17
(/'`1, Y'2)
''J'..1,
(X(
(DY'2,A'Y2) - X(1'2,'2)
(O 'm,L
.
.
.
lv''Ym)
(01bo, ''I'm)
'2) - (4m,y'2)
(P2,II,,,'m)
m) - X (Y'm,`Ym)
, qm are orthonormalized, i.e., if
(if i = j), j), (if
{1,
0,
, m, then (25) becomes just the characteristic determinant , m. The for the m-by-m matrix with elements (M'i,A1k) where j,lc = 1, Cm) denominator in (22) is then just the Euclidean distance of the point (C1, from the origin, and the m eigenvalues X (Si), , X2 (S.) are seen to be the squares of the reciprocals (arranged in increasing order of magnitude) of the principal semiaxes of the "ellipsoid" m m Cickwi,AY o = 1,
for i,j = 1,
I
i=1 k=
1
in m-dimensional Euclidean space with coordinates (C1,
,
Cm).
In the generalized Rayleigh-Ritz method of Aronszajn [2, pp. 54-56], the initial set So may be an infinite-dimensional subset of A, so that one may obtain in this way finite upper bounds for all the X (A) at the first step. The successive improvement of the initial upper bounds X (So) obtained by choosleads to considerations which resemble those ing sets of functions S1, S2, arising in the discussion of Weinstein's method, which is the subject of the next section, and for this reason will not be entered into at this juncture. 4. Weinstein's method for the determination of lower bounds for the eigenvalues. The discussion falls naturally into three parts. First, there is the determination of the initial set of functions Bo and of its corresponding eigenfunctions and eigenvalues (each eigenvalue with its proper multiplicity) : (26)
0 < X (Bo)
x2(B,) < X 2(B,)
..
.
For numerical applications, the set Bo must be chosen so that its corresponding eigenfunctions and eigenvalues may be regarded as known. The set Bo to be used here is that originally employed by Weinstein [90]; for other possibilities of choosing Bo, reference is made to Aronszajn [4, pp. 26-40]. In the second place, there is the determination, for m = 1, 2, 3, , of
UPPER AND LOWER BOUNDS FOR EIGENVALUES
61
the set of functions B and of its corresponding eigenvalues (each with its proper multiplicity) : (27)
0
X 2(B.) < X2(Bm) < X (Bm) S
.
The knowledge of the exact multiplicities (at least for an initial "segment" of the eigenvalues) is essential, for without this information one cannot assign to the (computed) eigenvalues corresponding to B0 and Bm their correct integral subscripts in the nondecreasing sequences (26) and (27), respectively, and hence one cannot obtain the desired inequalities (or even an initial "segment" of them). (28)
X 2(B,) < X (Bm) < X 2 (A),
(for m,n = 1, 2, 3,
).
Third, a fundamental difficulty (having to do with this question of the knowledge of the exact multiplicities) arises in this second part of Weinstein's method [i.e., in the successive improvement of the initial lower bounds X (Bo) by considering the sets of functions B, C B2 C This . ]. C Bm C difficulty does not occur at all in the Rayleigh-Ritz method but does occur in the generalized Rayleigh-Ritz method of Aronszajn, which was mentioned near the end of the last section. For each m = 1, 2, 3, , Weinstein [90] constructed (employing the known eigenfunctions and eigenvalues corresponding to Bo) a meromorphic function of a single complex variable (this function will be called Wm) and showed that, if the positive number X2 is an eigenvalue corresponding to Bm but is not an eigenvalue corresponding to B0, then X2 must be a zero of the meromorphic function Wm. Weinstein also showed how to obtain, by an analytic criterion, the multiplicity (relative to Bm) of each such eigenvalue X2 corresponding to B. [i.e., which occurs in the sequence (27)] but which is not an eigenvalue corresponding to Bo [i.e., which does not occur in the sequence (26)]. In order to determine those (if there are any "remaining") eigenvalues (with their proper multiplicities) corresponding to Bm which are also eigenvalues corresponding to B0, and thus be able to obtain the complete sequence (27) (or at least an initial segment of it), Weinstein employed a certain sequence of harmonic functions, which he termed a "privileged sequence." By means of this privileged sequence, Weinstein was able to surmount the difficulty just indicated. Aronszajn [2, pp. 38-531 gave a different criterion for the same purpose and showed that the consideration of the zeros and poles of Wm (together with their "order," thought of as a positive integer for a zero and as a negative integer for a pole) can be used for the exact determination of the sequence (27) of eigenvalues corresponding to B,,. The present account of the method, which in its main outline is patterned after Weinstein [90], will however employ Aronszajn's criterion just alluded to. 1. The initial set of functions Bo will be taken to be the set of all real-valued -
functions w defined on D ± C and vanishing on C, that is, satisfying the
J. B. DIAZ
62
single boundary condition
w = 0,
(29)
(on C),
[which is only a part of the boundary conditions (2) of the original problem; hence A C Bo]. The eigenvalues Xn(Bo) are defined by the following variational problems [cf. (11) and (12)]: A1 (Bo) = min
(30)
(Aw,Ow)
weBo
min
max
Xlm(Bo) =
(31)
(w w) (Ow,Aw)
(n = 2, 3,
(w w)
weBo
01,
.).
(w, 0:) = 0
It will now be shown that the eigenvalues and eigenfunctions relative to Bo (the eigenvalues and eigenfunctions of the "base problem" in the terminology of Weinstein) are quite simply related to the eigenvalues and eigenfunctions of the problem of the vibrating membrane with fixed edges for the same domain D + C. The eigenvalues Xln(Bo) have been defined by (30) and (31) in terms of "maximum-minimum" variational problems. This will lead to their recursive definition, in terms of the successive eigenfunctions, and finally to the corresponding differential problems, which will be seen to reduce to that of the vibrating membrane with edges fixed along C. This process followed here is, in a certain sense, a reversal of the usual steps (cf., e.g., Courant and Hilbert [20, pp. 398-407]), where one starts with the eigenvalues and eigenfunctions of a differential problem first, then obtains a recursive definition and finally a "maximum-minimum" definition. Let w1 0 0 be an eigenfunction corresponding to the eigenvalue X (Bo). From (30) it follows that (32)
(Ow1,Ow1) = X2 (B,) wl) 1(Bo)
(wl
(Q[wl + EJ], 0[wl++ tt-
_
(w1 -I- El', wl + E0 y + _ (Owl,Awl) + 2E((w1,r)
}
(w1,w1) +
where is an arbitrary function satisfying the boundary condition = 0 on C and where E is any real number. Inequality (32), in view of the arbitrariness of E, implies that (33)
(Ow1,Di') - X (Bo) (wl,i') = 0,
which, together with Green's identity (3), upon putting 4, _
and 0 = wi,
yields (34)
([DOW, - X (Bo)wl],i) +
In view of the boundary condition
_
Jc
(Awl an -
ant
I ds = 0.
= 0 on C, and of the arbitrariness of
in
UPPER AND LOWER BOUNDS FOR EIGENVALUES
63
D and of at/can on C, it follows from (34) that wl is a solution of the differential eigenvalue problem (35) (36)
Owl - A,(Bo)wl = 0, wl = Owl = 0,
(on D), (on C).
The additional boundary condition that Owl = 0 on C, which is actually fulfilled by the eigenfunction wl but is not required of it for membership in the set B0, is a "natural boundary condition," in the terminology of R. Courant [18].
Since the set Bo is not empty, from (32) one has that 0 < X (Bo) < oo. Let X1(Bo) >= 0 denote the (nonnegative) square root of X2(Bo). and (36) yield (37) (38)
O[Awi + X1(Bo)wl] - Xl(Bo)[Owl + X1(Bo)wl] = 0,
Awl + Xi(Bo)wl = 0,
Then (35) (on D), (on C),
with A1(Bo) > 0. Consequently, the function Owl + A1(Bo)wl must vanish throughout D + C, and wl must be an eigenfunction, with eigenvalue equal to X1(Bo), of the vibrating-membrane equation (39) (40)
(on D),
Owl + Xi(Bo)wl = 0,
wl = 0, (on Q. 0, it follows that X1(Bo) and X (Bo) are both positive. Without Since wl loss, one may assume that (wl,wl) = 1, and then X (Bo) = (Awl,4wl). Let w2 0 0 be a solution of the variational problem [cf. (31)] (41)
min wEBo (w,wi)
(Ow,Aw) (w,w)
=0
where wl is the eigenfunction of (30) already considered. Suppose that (w2,w2) = 1; then the minimum value of (41) is just (AW2,zw2). Further, from (30), (31), and the definitions of wl and w2, one has that (42)
0 < X (Bo) = (Owl,Ow1) : (Ow2,Ow2)
X2(Bo);
while from Green's identity (3), with 0 = wl and ¢ = w2, one obtains (43)
(Ow1,IXw2) = X (Bo)('w l,w2)
On the other hand, (wl,W2) = 0 from the definition (41) of w2. Now, from (31) (44)
A2(Bo) = min max 01
wEBo
(Ow,Ow)
(w,w)
(w,¢1) =0
but for any given ¢1 there always exist real numbers Cl and C2, not both zero, Since such that (Clwl + C2w2, 01) = (0[Clwl + C2w21, i[l lwl ++C2w2]) _ Cj /(Ow1,Ow1)++ C2(ow2,tw2) 1 + C2 (Clw1 + C2w2, Clwl + C2w2)
J. B. DIAZ
64
[cf. (43) and (wl,w2) = 0], it follows that (Ow Aw)
min
(w,w)
WEB,
< (Aw2,Ow2),
(w,¢i) =0
and consequently [cf. (44)] that a2(Bo) < (Aw2,Aw2).
(45)
In conclusion, then (46)
min
a2(Bo) = (Ow2,Aw2) =
(Aw,Ow)
(WweBa
(WA
'WI) =0
It still remains to show that w2 is an eigenfunction of the membrane problem, with eigenvalue equal to the positive square root of X2(Bo). The procedure is similar to that just carried out above with wl. From (46) it follows
that (47)
X2(Ba) < (A[u12 + 6l'], 0[w2 +
is an arbitrary function satisfying the boundary condition = 0 on C and (l',w1) = 0, and where e is any real number. In view of the arbitrariness of e, inequality (47) implies that where
(48)
(IAw2,01) - a2(Bo)
0,
whenever both l' = 0 on C and 0. But (48) continues to hold whenever 1 = 0 on C only [i.e., even if 0], because (AW2,Aw1) = (w2,w1) = 0 [cf. (43)]. An application of Green's identity (3), use being made of the boundary condition = 0 on C, and of the arbitrariness of on D and of on C, then yields that w2 is a solution of the differential eigenvalue problem (49)
OOw2 - X2(Bo)w2 = 0,
(on D),
(50)
w2 = Owe = 0,
(on C).
Further, if X2(Bo) denotes the positive square root of X2 (Bo), then (49) and (50)
imply that (51) (52)
(on D), (on C).
Awl + A2(Bo)w2 = 0, w2 = 0,
Proceeding in the above manner, one obtains that, for n = 2, 3,
[cf.
(46) ],
(53)
X 9 (Bo) = (Ow,,Awn) =
min wEBo
(AU'AW)
(w,w)
(w,w;) =0
i=1,
,n-1
where (Wj,Wk) = 1 for j = k and (w;,wk) = 0 for j 5Z k, with both j,k = 1,
UPPER AND LOWER BOUNDS FOR EIGENVALUES , n.
2,
(54) (55)
65
Further ODwn - an(Bo)wn = 0, wn = Own = 0,
(on D),
Awn + X,(Bo)wn = 0, wn = 0,
(on D),
Du + Wu = 0,
(on D), (on C),
(on C),
and (56)
(on Q. Consider the differential eigenvalue problem for the vibrating membrane:
(57)
(58) (59)
u = 0,
and let (60)
711, u2, U3,
and (61)
0 < W1 < W2 :_5 W3
be its sequences of eigenfunctions and eigenvalues, respectively. It has so far been shown that, if w is an eigenfunction corresponding to Bo, with eigenvalue X2 > 0, then w is also a membrane eigenfunction, with eigenvalue X > 0. But,
in order to show that indeed
an(Bo) = co, n(n = 1, 2, 3,
(62)
it still remains to show that the converse of this last statement holds, i.e., that if u is a membrane eigenfunction with eigenvalue w > 0, then u is also an eigenfunction relative to Bo, with eigenvalue W2. From (58) and (59) one has (63) (64)
OAu - W2u = 0,
(on D),
(on C), u = Du = 0, and the proof may then be completed along the lines of a reasoning of
H. Herrmann [34] (see also A. Weinstein [90, p. 11]), which will be omitted here for the sake of brevity. 2. Let pi(s), p2(s), be a sequence of real-valued functions , pk(s), defined on C, where s denotes are length on C. It is clear that, if this sequence pl(s), p2(s), , pk(s), is complete on C, then the boundary conditions (2), that is, w = aw/an = 0 on C, may be replaced by the equivalent boundary conditions (infinitely many in number) :
w = 0,
(65) (66)
f
pk(s) an
ds = 0,
(on C),
(for lc = 1, 2,
This immediately suggests the definition of the set of functions Bm, for m = 1, 2. , as the set of all real-valued functions w defined on D + C
J. B. DIAZ
66
and satisfying the boundary conditions (67) w = 0,
(on C),
(
(for k = 1, 2,
ds = 0, Jc P43) an
(68)
.
, m).
The variational problem [cf. (11) and (12)] corresponding to the set of func-
tions B. is called the "mth intermediate problem," in Weinstein's terminology.
For each m there is a sequence of eigenfunctions
(69)
Wml, Wm2,
.
. .
wmn,
. .
.
corresponding to the eigenvalues 0 < X (Bm) < a2(Bm) < ... < X2(Bm) 0, relative to the set of functions B,,,. Then the function w satisfies the following conditions [cf. (75), (76), and (85)]: (on D), AAw - X2w = 0, (86) (on D), w = 0,
Ow = I Akpk,
(87)
(on C),
k=1
(k = 1, 2, , m), A,,,, which are not all zero. By Green's
(Ow,pk) = 0,
(88)
, with m real constants A 1, identity, in view of the partial differential equation (86) and the boundary conditions (87), it follows that [put ¢ = w and, = ui in (3)], for i = 1, 2, 3,
(89)
(Aw,ui)
w (Ow,Du;) 1
A2
fe
c0i
_
X2(Aw,ui)
an
I
pj - ds
Aj
- W,
pw au' ds
C
j=1 m X2 W2
(Ow,ui) +
Aj(pj,ui), j=1
proper use being made of the equations [cf. (58) and (59)]
/ui + wiui = 0, ui = 0,
(on D), (on C), which are satisfied by the membrane eigenfunction ui. If, further, X2 0 w; for any positive integer i, then (89) implies that (90a) (90b)
m
(91)
G,2
(Ow,ui) = CO? 2
Aj(pj,ui),
X2
j=1
while Parseval's equality (77), together with equation (91), yields (92)
(0w,pk) = 2 (ow,ui) (pk,ui) .
j(pj,ui) I (pk,u'i) W?
W? - a2 (pjfui) (pk,ui) j=1
ti
i=1
UPPER AND LOWER BOUNDS FOR EIGENVALUES for lc = 1,
69
Finally, from (92) and (88),
, m. m
(93)
k = 1,
, m. Since not all the m numbers A1, A2, . , A. are zero, the eigenvalue A2 must be a zero of the determinant of the coefficients of the system of m linear equations (93). Thus (cf. Weinstein [90, p. 20]), if X2 is an eigenvalue corresponding to Bm but is not an eigenvalue corresponding to Bo, then X2 is a zero of the m-rowed determinant (note that j,k = 1, , m) (94)
W?
TV,,(,7) = det I
2
i=1
i
,
(Pj,ui)(Pk,2Ui) 11
which, as can be seen, depends explicitly on the eigenvalues and eigenfunctions
corresponding to Bo, that is, on the eigenvalues and eigenfunctions of the vibrating-membrane problem. The determinant Wm(71) is a meromorphic function of the complex variable 77.
For each complex number , let vt(Wm), called the "exponent of with respect to the function Wm," denote the uniquely determined integer v such that (95)
(77 - E)`[Co + C1(17 - ) + C2(71 - E)2 + .
.
.],
whenever 171 - I > 0 is sufficiently small, the analytic function in the square bracket in (95) being regular and different from zero (that is, Co 0 0) at o = . Clearly, if the function W, is regular at , then vE(Wm) is zero if Wm() 0; whereas vt(Wm) is a positive integer (equal to the order of the zero of W. at ) is a if Wm(E) = 0. On the other hand, if W. has a pole at , then negative integer (equal to minus the order of the pole of W. at t). Further, for each complex number let µ (Bo), called "the multiplicity of with respect to Bo," denote the number of times t appears in the sequence of eigenvalues X (Bo) corresponding to the set of functions Bo. Let uE(Bm), for m = 1, 2, Then, according to 3, , have a similar meaning with respect to Bm. Aronszajn [2, pp. 38-53], the following general relation holds for any complex . number and any m = 1, 2, 3, (96)
i (Bm) - µs(Bo) = vE(Wm)
This relation enables one, once the orders of the zeros and the poles of the function W. are known, to determine explicitly the sequence of eigenvalues M (B.), the proper multiplicities being taken into account. It should be noticed that (96) yields more information than the conclusion obtained from For if X2 is an eigenvalue corresponding to B. [so that ii (Bm) is a positive integer], but is not an eigenvalue corresponding to Bo [so that >.=(Bo) = 0], then (93) furnishes only the information that X2 must be a zero of the determi(93).
J. B. DIAZ
70
nant Wm(n) of (94), whereas (96), which in the present case reads (97)
,4X2(Bm) = vX
yields again the information that X2 must be a zero of W,n, but also that the exact number of times that the number X2 occurs in the sequence of the eigenvalues X (Bm) is precisely equal to the order of the zero of the function W.(77) at the number X2.
The proof of (96) will be carried out here (partially) only for m = 1. For the complete proof, for any m, reference is made to Aronszajn [2, pp. 38-53]. When m = 1, from (94), (98)
W1(n) _
I
(p1,u2)2
1 - 00/-I)
7,=1
It can be readily seen that W1 is a meromorphic function of 77 whose zeros and poles are simple and positive [notice that the poles can only occur at the eigenvalues X22 (Bo) = C02 corresponding to the initial set Bo]. Now suppose that X2 is an eigenvalue corresponding to B1 [so that µr2(B1) is a positive integer], but is not an eigenvalue corresponding to Bo [so that µa2(Bo) = 0]. The number X2 must be a zero of Wl(,i), and since this function has simple zeros only, it follows that v?,2(Wl) = 1. Hence [cf. (97)] one just has to prove that (99)
,.
2(B1) = 1,
and this fact can be readily seen. For, if w 0 0 is an eigenfunction corresponding to the eigenvalue X2 > 0 of B1j then [cf. (75), (76), (85)] AAw - X2w = 0, (100) W = 0, (101)
Aw = A1p1, (Aw,p1) = 0,
(102)
(on D), (on C), (on C),
where the number Al is not zero. But the only function satisfying equations (100) and (101), with Al replaced by zero, is the identically zero function. Consequently, the eigenfunctions corresponding to the eigenvalue X2 of B1 are
precisely the functions a(w/A1), where a 5z 0 is an arbitrary real number. This set of functions is one-dimensional, and this is precisely what (99) asserts.
5. Concluding remarks. 1. Lord Rayleigh [62] formulated the following conjecture [cf. (58) and (59)] which was proved by G. Faber [24] and E. Krahn [41] (see also L. Tonelli [74]) : of all membranes of a given area the circle has the gravest fundamental tone (lowest principal frequency wi). The corresponding result for n-dimensional Euclidean space: that the smallest eigenvalue w1 of the problem 2
(103)
Au + wu = 0,
A = 8x2 + u = 0, 1
(104)
2
+
8x2, n
(on D), (on C),
UPPER AND LOWER BOUNDS FOR EIGENVALUES
71
for all n-dimensional domains D of a given volume occurs when D is an n-dimensional sphere, was proved by E. Krahn [42, specially pp. 39-43]. "Isoperimetric" theorems of this kind have received a great deal of attention recently. Reference is made, in particular, to the book of G. Pblya and G. Szego [61]. The following assertion was made by E. T. Kornhauser and I. Stakgold [40] : of all simply connected domains D of a given area, the circle has the maximum value for the second eigenvalue µ2 of the problem (105) (106)
Au + µu = 0,
'A =
au
+
2
axay,
(on D),
=0
(on C).
T,
(It is readily seen that the first eigenvalue µi is always zero.) This statement was proved by G. Szego [72], by an argument based on conformal mapping. The corresponding result for n-dimensional space has been recently proved by H. F. Weinberger (see L. E. Payne and H. F. Weinberger [55]). 2. In the study of vibrations and buckling of continuous beams and other composite systems, E. Saibel [66,67] (see also E. Saibel [68,69], E. Saibel and E. d'Appolonia [70], E. Saibel and W. F. Z. Lee [71], and W. F. Z. Lee and E. Saibel [46]) has developed a method for the determination of eigenfrequencies which is very closely related to Weinstein's method (see Sec. 4 of the present paper). Although Saibel's analytical approach employs Lagrange
multipliers, his scheme of "developing the solution in terms of the eigenfunctions and eigenvalues of the beam with inner constraints removed, referred to as the simple beam" (E. Saibel and W. F. Z. Lee [71, p. 499]), is to be compared with Weinstein's approach [90, pp. 5-7] of reducing plate problems to simpler membrane problems. For a fuller discussion of the exact relationship
of these various ideas, reference is made to H. F. Weinberger [82, specially pp. 12-13]. 3. In the notation of the preceding sections, let w denote the nth eigenvalue
of the vibrating membrane [cf. (58) and (59)], and X denote the nth eigenvalue of the vibrating clamped plate [cf. (1) and (2)]. Weinstein [84, p. 50] has called attention to the inequalities (107)
2 < ax,,
(n = 1, 2, 3,
),
connecting the eigenvalues of the two problems in question. R. Courant [19] (see Weinstein [90, p. 20]) has arrived earlier at the inequalities (108)
2 < Ate,
Xn, in the notation of Sec. 4 of this paper]. Wein[which state that X (Bo) stein's [90, p. 50] proof of the inequalities (107) involves the use of what he
termed a "suite fondamentale privilegiee de fonctions harmoniques." It appears to be of some interest to obtain a proof of (107) which does not employ the notion of a privileged sequence. The following direct proof of (107) was
J. B. DIAZ
72
developed during a conversation with L. E. Payne and H. F. Weinberger. Recall first the definition "by recurrence" of the eigenvalues and eigenfunctions under discussion: wl =
(109)
(u, -Au)
(where u = 0, on C),
(u,u)
(u, - Du) (u,u)
in
wn =
(u,ui) =0
(110)
(where u = 0 on C, for n = 2, 3, and [cf. (7) and (8)] a i = mi n
(Aw,4w) (w,w)
(111)
(where w = an = 0, on C) , A
(Ow'z w)
min
n=
(w,w)
(w.w,)=o
i=1,
(112)
,n-1
(where w=an=0, on C,forn=2,3,
).
The first is Schwarz's inequality
Two other facts will be needed in the proof.
(v, -1.v) 2 < (v,v) (Av,Liv),
(113)
where, if the function v 0 0, the equality sign holds if and only if there is a real constant k such that This implies that, if v (115)
(on D).
Ov + kv = 0,
(114)
0, then one has the inequality
[
(v,-Ov) (v,v)
2
]
(Ov,Ov)
(VA
,
with equality if and only if (114) holds for some constant k. The second fact needed is H. Weber's [79, specially p. 5] representation formula (116)
v(x,y) =
f
\u any - Yo a7 ) ds, for a solution v of equation (114), where (x,y) is any point of D, the directional derivative 0/an is taken along the outer normal to C, the function 4
c
Yo = Yo(r N/k-) is Bessel's function of the second kind of order zero, and r denotes the Euclidean distance from the point (x,y). (Equation (116) is just
"Green's third identity" for the partial differential equation (114), in the terminology of 0. D. Kellogg [39, p. 219]).
The proof of (107) may now be readily carried out. n = 2, 3, first. There exist n real constants C1,
Consider the case ,
Cn, not all zero,
UPPER AND LOWER BOUNDS FOR EIGENVALUES
73
such that the function (117)
V = C1w1 + C2w2 +
+ Cn7AJn
satisfies the n - 1 orthogonality relations (v,ui) = 0,
(118)
(for i = 1,
, n - 1).
The function v of (117) satisfies the boundary conditions v = av/an = 0 on C.
Further, the function v is not identically zero, since the n eigenfunctions , w are linearly independent. Consequently, for this particular function v, inequality (115) holds in the strict sense [if the equality sign did hold, then v would be identically zero, from (116) and the boundary conditions satisfied by v]. Thus, with v given by (117), one has, in view of the recurrence definitions (109) to (112), that, for n = 2, 3, , wl,
(119)
w2 < [ (V,-AV) 12 < L
(VA
J
(Av,4v)
(pl,p2ju), but we introduce the positive-definite norm
(7)
S S = f (pi + p2 + c2u2) dA,
where c2 is a positive constant. The spaces L' and L" are again defined by
S' -* (pi,pa,u')tL' , pi = u'1, P2, = u'20; u' = f 11 11 S (p1 )p2 ,u )sL as p11 + P2/2 + k2u" + Q = 0 Let S' and S" be given vectors in L' and L", respectively, and let Q
(9)
where u is the solution of (5). (10)
S " (u,1,u.2,u), Then
S"-S'=(S"-S)-(S'- S).
Letting (11)
S" - S' E-'
and (12)
v=u'-u,
(on B), (in A).
COMPUTATION OF FORCED VIBRATIONS
91
we have (13)
- S' - S H (v,I,v,2,v)
and (14)
- Av - k2v = p1,I + P2,2 + k2u v=0
Thus, (15)
(in A), (on B).
v = fG(p1,1 + P2,2 + k2u) dA,
where G is the Green's function associated with the problem (5). Furthermore (16)
(S' - S) . (S' - S) = f (v2I + v'22 + c22v2) dA
and (17)
(S11 - S') , (S" - S') = f (pi + p2 + c2u2) dA. A simple computation involving the expansion of Green's function in the eigenfunctions of (6) shows that (18)
f (v2 + V22 + c2u2) dA
max [2Ai(Ai + c2) (X, - k2)-2]f (pi + p2) dA
+ max [2k4c-2(X + c2) (Xi - k')-2] fu2 dA. i
Thus, we have obtained the inequality (2) with (19) K = sup { max [2Ai(Xi + c2) (ai - k2)-2], max [2k4c 2(Ai + C2) (AZ - k2)-2] ] .
It is easy to show that the maximum will occur when Xi is either the eigenvalue above k2 nearest to k2 or the eigenvalue below k2 nearest to k2. Furthermore,
if a lower bound is used for the former and an upper bound for the latter, a larger value of K is obtained, so that the inequality (2) still holds. Thus, it is only necessary to know the distance of k2 from the spectrum of (6).
We then
have the inequalities (2) and (4), so that S is approximated by both S' and S" in the norm (7). If the vector S" is taken with pl = u;i, p2 = u,2, then u" - u satisfies a homogeneous differential equation of second order. Therefore the inequality (4), unlike (2), leads to arbitrarily close bounds for the value of u at a fixed interior point. INSTITUTE FOR FLUID DYNAMICS AND APPLIED MATHEMATICS, UNIVERSITY OF MARYLAND, COLLEGE PARK, MD.
APPLICATIONS OF VARIATIONAL METHODS IN THE THEORY OF CONFORMAL MAPPING BY
M. M. SCHIFFER
1. Introduction. The theory of conformal mapping has attracted the attention of many analysts for various reasons. It is an essential tool in the general theory of analytic functions of a complex variable. It is of great use in boundary-value problems of two-dimensional potential theory and plays, therefore, a role in electrostatics, elasticity, and fluid dynamics. Finally, it may be considered as an interesting branch of functional analysis in which we study the various quantities involved in their dependence on curves and domains which are geometrically given. The problems in conformal mapping may be classed into three major groups. We have to deal with existence proofs for specified canonical mappings; we need constructive procedures to carry these mappings out; and finally, we wish to estimate the various quantities arising in conformal mapping by means of more easily accessible quantities which depend upon the geometry of the curves or domains considered. In all three types of problems, methods of the calculus of variations have been successfully applied. In so far as the problem of conformal mapping may be reduced to a particular boundary-value problem of potential theory, we may consider the Dirichlet principle as the oldest variational approach to
the theory. In the Dirichlet principle we characterize the solution of a boundary-value problem by a minimum property within the class of all differentiable functions with a finite Dirichlet integral and with the prescribed boundary values. The characterization is easy, since the class of competing functions is so large that it is easy to vary within the class and to find admissible neighbors. On the other hand, since the class is so wide, it is very difficult
to prove that there is an extremal function which would satisfy the characterization. This difficulty invalidated much of the intuitively obtained results of Riemann and Lord Kelvin; it necessitates careful analysis and elaborate limit procedures or Hilbert space arguments. On the other hand, it is well known that the class of univalent analytic functions in a fixed domain of the complex plane forms a normal family; that means that from each sequence of functions of the family a convergent subsequence can be selected [13]. This subsequence converges uniformly in each closed subdomain and has as its limit either a constant or a univalent function. By restricting the class of univalent functions by means of some normalization, one can exclude the possibility of a limit function which is constant, and one obtains in this manner a compact class of univalent functions. In such a class every extremum problem will determine at least one function of the class for which the extreme value is actually achieved. But now the class is already 93
M. M. SCHIFFER
94
so narrow that it is much harder to characterize the extremum function by comparison with neighbor functions of the class. In other words, by restricting ourselves to univalent functions, we simplify very much the existence problems
but increase the difficulties of the technique of variations. We are led to a new type of variational problem in which the univalency of the admissible function is the hardest side condition. It will be seen in the following pages that a variational procedure can be established which operates within the class of univalent functions. There exists a close relationship between the functions mapping a domain upon some canonical domain like a circular region and the Green's function of potential theory. We shall show how a variational formula for the Green's function of a domain can be established and how the theory of univalent functions can also be approached from this point of view. Since the Green's function is a functional of the domain considered, the functional-analytic aspect of the theory is most clearly exhibited by this method. 2. The group property of univalent functions. The possibility of varying univalent functions is based on the following obvious remark: If w = f(z) is univalent in a domain D. in the z-plane and maps D. onto the domain D. in the w-plane; if, moreover, W = F(w) is univalent in D., then W = F[f(z)] is univalent in D. We shall refer to this fact as the group property of univalent functions.
Our next problem is to provide functions F(w) which are univalent in the image domain D. and very near to the identity function Fo(w) = w. Clearly, F[ f (z)] will then be a neighbor function of f (z) within the class of univalent functions. We assume that D, has as complement in the z-plane only proper
continua; then the complement A. of D,, in the w-plane will likewise be a set of proper continua. We choose a point wo E Ow and a finite continuum The complement of P(wo) in the w-plane may be called P (wo) ; it can be mapped onto the domain W1 > p by a univalent function which has near infinity the series development I'(wo) C A,,, which contains it.
(1)
W=F(w) =w-wo+ao+w alwo+(w
a2wo)2+
The number p is called the exterior radius of r(wo); it is a monotonic function of the set r and can be made arbitrarily small by shrinking r down to wo.
The inverse function w = F-1(W) is univalent in the circular domain JWJ > p, and its coefficients can be easily estimated by means of the classical area theorem. Working these estimates back to the original coefficients av, we can show that (2)
IavI < (4p)v+'.
Then putting av = bvpv}1, we may assert that the function (3)
blp2
W = w + pbo + w - wo + O(P3)
CONFORMAL MAPPING
95
represents a small variation of the identity function and yields, by the group property of the univalent functions, a variation for the univalent function f (z) in D. We may take any function O (W) which is univalent for JWI > p and construct the new function O(W) = O[F(w)], which is also univalent in I'(wo) and, a fortiori, in D. Thus, we have for given wo and r(wo) a large class of functions which are univalent in Dw and have a development (3) in powers of w - wo. This development converges in the exterior of a circle around wo which encloses r(wo) and, for p small enough, converges everywhere in D. except for a very small neighborhood of wo. Thus, we can prove [18]: THEOREM I. If f(z) is univalent in D. and maps this domain onto D. in the w-plane and if A. is the complement of Dw, then for each point wo c A. there exists an infinity of univalent functions in D. with arbitrarily small p of the form b1p2
(4)
f*(z) = f(z) +
f(z) - wo
+ 0(p3),
16)
where the estimate 0(p3) is uniform in each closed subdomain of Dz.
In general, we have a very great freedom in the choice of the coefficient bl of the varied function (4). Indeed, the domain D. must be of a very special form in order that we may not prescribe the sign of bi arbitrarily. This is shown by THEOREM II. Let r be a continuum in Ow and s(w) be analytic on r. If we have, for every point wo c r and every function (4), the inequality (5)
Re {bls(wo)} > 0(p),
then r must be an analytic arc w(t) in the w-plane which depends on a real paramWe can choose this parameter such that r satisfies the differential equation
eter t. (6)
()2 s(w) -}- 1 = 0.
The proof of Theorem II is rather difficult and will not be given here. But we wish to point out the importance of the two theorems for the general calculus of variations within the class of univalent functions. The first theorem
provides a formalism for varying a function, while the second allows us to draw conclusions from the extremum property and thus to characterize the extremum function. Let us illustrate the method by the following application: Let D. contain the point at infinity, and denote by a- the family of all functions which are univalent in D. and which have at infinity the development (7)
f(z) = z + co + ciz-1 + ..
.
If zo is a given fixed point in D.j let us ask for the maximum and minimum value of I f' (zo) I within the family .
M. M. SCHIFFER
96
We know from the compactness argument that an extremum function must exist, and we let f (z) satisfy, say, the conditions of the maximum problem. If D. is the image domain and A. its complement in the w-plane, we can
obtain by Theorem I an infinity of competing functions (4), all in . But the maximum property of f (z) implies the inequality (8)
f*'(zo)1 = f'(zo)I
I
1-
[f(zo)
Wo]2
+ O(P3)
< I f'(zp)I,
An easy transformation leads from (8) to the inequality (9)
Re {b1 [f(zo) - wo]-2} > O(p),
which allows the immediate application of Theorem II. We recognize that A. consists of analytic arcs each of which satisfies the differential equation (10)
w (t) 2
+ 1 = 0,
AZOA.
This equation can be readily integrated and yields (11)
w(t) = a + kve't.
The constant of integration k, depends on the particular component of A. considered.
Thus, we have proved: There exists a conformal mapping of each domain Dz by a function of which maps D. onto the complex plane slit along circular arcs around a common center a. We may prescribe arbitrarily the point zo in D. which shall go into a.
Similarly, we would have obtained the following result by considering the analogous minimum problem: There exists a conformal mapping of Dz by a function of the family which maps D, onto the complex plane slit along linear segments which all point to a common point a. point zo in D. which shall go into a.
We may prescribe arbitrarily the
We have thus obtained existence theorems for canonical slit mappings by proving appropriate extremum problems. It is clear that in the same way a great variety of existence theorems and canonical domains may be obtained. But, at the same time, our method yields also inequalities and estimates for conformal mappings. Indeed, let D. be the circular domain IzI > 1. is then the much-investigated class of univalent functions in the exterior of the unit circle which are represented in the entire domain by the power series (7). If we ask for the minimum value of If'(zo) I for an arbitrary 1zoj > 1, we have to determine a function in which maps jzj > 1 onto the w-plane slit along a linear segment showing in the direction of f (zo). Since f (z) is obviously only determined up to an additive constant, we may assume f(zo) = 0. All mappings of JzJ = 1 into linear segments and which are of the family have the form
CONFORMAL MAPPING (12)
f(z) = z -f- m +
97
e2ix
z
Under these mappings the unit circle goes into a segment of length 4, centered
at the point m, and having the direction e'". If we wish that the segment should be radial, clearly m = ImIe". requirement f (zo) = 0, which yields (13)
The sign itself is now determined by the
-e" = sgn zo.
Hence finally (14)
f'(zo) = 1 -
Izpl
and since this is the minimum value, we derive the general estimate (15)
TWI ? 1 -
zl
valid for all functions of . It is obvious that many estimates and inequalities can be obtained in this manner. The procedure is as easily applied in the case of arbitrary multiply connected domains as for the case of a circle, to which most alternative methods are particularly suited. The limitation of the method of variation described lies in the fact that it is not very well adapted to take care of many side conditions. If we are dealing, for example, with problems of interpolation in conformal mapping, we have to consider subclasses of univalent functions which satisfy at N given points z, , N) of the domain D. the conditions f (zv) = to,, with N given (v = 1, 2, numbers w.. If we want to vary in this subclass, the formalism provided by
Theorem I becomes hard to apply. A second disadvantage lies in the too restrictive limitation to analytic single-valued functions in the domain. Some of the most important functions of potential theory, like the Green's function, cannot be dealt with adequately in this way. 3. The method of interior variation. If we want to deal with the general problem of two-dimensional potential theory, it is advisable to focus our attenThis function is harmonic and symmetric tion on the Green's function in both variables in the domain D. Only if z = does it become infinite, but (16)
h(z,r) =
log Iz - rl
is regular harmonic if z is near . If z or approaches the boundary C of D, the Green's function tends to zero. lies in the fact that all important quantities conThe importance of nected with the potential theory of the domain D are expressible in terms of g. For example, if D is simply connected and if ¢(z) maps D upon the interior
M. M. SCHIFFER
98
of the unit circle Jwl < 1, we have
g(z,O = log 1 - 0(0 0(z)
(17)
(z) - 0(1")
Thus, the knowledge of the Green's function allows a determination of the mapping function onto the circle. Similarly, in the case that D is multiply connected, the functions mapping D upon the important canonical domains like parallel-slit domain, circular and radial-slit domain, etc., are expressible in terms of We want to derive now a formula which describes the change of the Green's
function with a change of the domain D. This formula contains then, in principle, also a method for variation of univalent functions. For we may characterize the univalent functions by the domain D upon which they map and express them in terms of the Green's function of the domain. Instead of varying the functions by an arithmetical expression (4), we can describe their variation by the deformation of their image domain and express the change of the function by means of the formula for the variation of the Green's function with the domain. In fact, both theorems of the preceding section can be deduced from the following theory of the Green's function.
Let us start at first with a domain D bounded by n analytic curves CY, C=
C,,.
Let ¢(z) be an analytic function defined on and near the bound-
ary C of D. We assume that there exists a curve system r in D which is homotopic to C such that in the ring system bounded by C and r the function O(z) is analytic. Consider now the mapping (18)
z* = z + e4 (z)
It is easily seen that, if jel is small enough, the curves C, will be mapped in a one-to-one manner upon n analytic curves C*, which will in turn determine a domain D* in the z-plane. Let 0 denote the subdomain of D bounded by the curve system r; if lel is sufficiently small, A will also be a subdomain of D*, which will now be assumed. We denote by g(z,l') and the Green's functions with respect to D and D*. Let p(z,l') and p*(zS) be their analytic completions, i.e., analytic functions of z whose real parts are these Green's functions. We consider the harmonic functions (19)
b(z) = g(zs),
'I'(z) = g*(z + E0(z), q),
which are regular in the domain D - 0 if and 77 are chosen in A and if lel is small enough. Observe that both functions vanish if z tends to C, since g* vanishes on C* and the latter curve is obtained from C by the mapping (18). By Green's identity, we have
CONFORMAL MAPPING 27
(20)
c+r (cia an - * aT) ds = 2a r (41 an -
99
) ds = 0.
On the other hand, the functions and g*(z,n) are harmonic in 0 except for their poles at and n. Hence, we have (21)
2a 7'r [9(z,0 a9
z,n)
(an
- 9*(z,,1) ag(z,an )] ds =
Subtracting (21) from (20), we obtain (22)
2a
r [an [g*(z + eo(z), n) - 9*(z,n)] [9*(z +
(z), n) - 9*(z,n)] . ag(z )] ds = g*(1",n) -
9(r,n)
Using the complex functions p and p*, we can bring (22) into the simpler form (23)
where
g*(3',n) -
Re {tai 7'r [p*(z + e0(z), n) - p*(z,n)jp'(z,i") dz}, dp(z,i-)/dz.
The identity (23) is of fundamental importance in the theory of variation for the Green's function. It has been derived under the assumption that the boundary C of the domain D is an analytic curve system. But it connects only expressions of the Green's functions taken at points of P and inside A. Hence, if we approximate a given domain D by a sequence of analytically bounded domains, we may use formula (23) at first for each domain of the sequence; because of the uniform convergence of the Green's functions and their derivatives in each closed subdomain of D, we can then conclude the validity of (23) for the Green's functions of D and its varied domain D*. Thus, (23) is now established for the most general plane domain D. We observe that (23) is an integral equation, between and and can be used to compute g(i',n) as a series of iterated integrals involving Indeed, a rapidly converging computational procedure can be established for the calculation of the one Green's function in terms of the other. We do not pursue this subject further since we are interested in the case of very small e. We obtain from (23) by series development in e (24)
g*(i',n) - g(1",n) = Re {tai
r
p*'(z,n)p'(z,l") . eo(z) dz} + 0(e2).
The error term 0(e2) is a harmonic function in
and n which can be estimated
uniformly in 0; from this we conclude easily that p*'(z,n) = p'(z,n) + 0(e),
1\2. M. SCHIFFER
100
and hence we replace (24) by the more useful result Re
(25)
{tai
Tr
p'(z,n)p'(z,i")e0(z)
dz}
+
If the boundary C of D consists of smooth curves C., we may replace in (25) the path of integration r by C in view of the fact that the integrand is analytic in the ring domain D - A. But on C we clearly have
-i
(26)
Cz,
= ds)
Hence (25) assumes the form (27)
* 9
1
Re
f
ag(z,'n) e0(z)
- 2.7r c
an
an
iz'
ds + 0(0).
Observe that l
Re
(28)
EO(z) t
iz1
I
Sn
describes the shift in the direction of the inner normal of each boundary point z 8 C under the variation (18). Thus, we arrive at the classical variational formula of Hadamard [11]: (29)
21
f
agan77)
o
on ds.
The importance of this formula in numerous applications is well known. Hadamard's formula was first applied to problems of conformal mapping by Julia [12]. It allows us to make statements on the monotonic change with the domain D of numerous functionals which are connected with the Green's function and leads in this way to comparison theorems and estimates. But it is made precise and admits of an estimation for the error term by virtue of the integral equation (23), from which it has been derived. The limitation of (29) to the case of smooth boundaries is a serious one and restricts the usefulness of Hadamard's formula considerably. Suppose, on the other hand, that 0(z) is meromorphic in 0; then (25) can be calculated explicitly by means of the residue theorem. For the sake of simplicity, we consider the case (30)
45 (z) =
z
(zo E 0).
1 zo,
Then (25) reduces to [20] (
(31)
Re S e
r Lp
P'(??'zo]}
77 -
+ O(e2).
Formula (31) gives the first variation of the Green's function for the case of the very special deformation (30) of the boundary. But it is valid for the most general domain D and expresses the variation by the values of the Green's
CONFORMAL MAPPING
101
function at an interior point of D. Equation (31) is called the interior variational formula for the Green's function. It is most valuable if one has to characterize a domain by its extremum property. In this case, (31) leads to simple differential equations for the Green's function of the domain considered. The case 0(z)
(30')
can be reduced to the preceding one, since the first variations superimpose linearly. On the other hand, formula (30') enables us to introduce so large a number of parameters as to enforce rather difficult side conditions for the variation. We have given an explicit formula for the first variation of the Green's function under the interior variations (18) and (30). It is clear that we can as a power series in e with coefficients depending on g(i',,J) calculate and that the variation of every order can be obtained if necessary. The knowledge of the fact that the second variation of a functional is always of a fixed sign for a given class of deformations leads often to useful convexity theorems, which have theoretical as well as numerical significance [5]. 4. Applications of interior variation. Let D be a domain containing the point at infinity and consider its Green's function, g(z) = g(z, oo ). We have near infinity
g(z) = log jz1 - y + 0 (h).
(32)
The quantity y is called the capacity constant of the domain D (or of its boundary C) and is, indeed, closely related to the electrostatic capacity of cylindric conductors with the cross section C. In the case that D is simply connected, we can also find a geometric inter-
pretation for y. We map D upon the circular domain jwj > 1 by means of the univalent function w=
(33)
(z) = r+ao+ z+
(r > 0).
The constant r is called the mapping radius of the domain D or the exterior radius of its boundary C. In view of (17) we have (34)
g(z) = log O(z)I = log jzj + log
r
which shows that (35)
+
0
r=ell.
It is easily seen that, if Dz is mapped upon a domain D. by means of a univalent function (7), the capacity constant y is unchanged.
M. M. SCHIFFER
102
We wish to apply our variational formulas in order to obtain various results concerning y. In this way we will illustrate the potential theoretical significance and possibilities of the various formulas. We obtain from the definition
(32) of y and from (29) the Hadamard-type variational formula (36)
5-Y
=
1
7c
(az))2 on ds, On the other hand, (31) specializes to
which was given first by Poincare [14]. (37)
y* = y - Re {ep'(zo)2} + O(e2)
for the case of the variation (38)
zz+ z - zo e
(zo E D).
,
We consider, for example, the following problem. Let - be the family of all functions f (z) of normalization (7) at infinity and univalent outside of the unit circle. We decompose the circumference jzj = 1 into two arcs A and B of angle a and p, respectively. Consider the set of image points corresponding to A; they form in the w-plane a continuum iX which determines a domain with capacity constant ya. We ask for the minimum value of -ya within all mappings in a-. The existence of a minimum function f (z) can be easily shown.
If F(w) is univalent outside of 1 and normalized at infinity, then the group property of univalent function implies that F[ f (z) ] lies also in 3 and is also a minimum function, for the capacity constant ya is unchanged under normalized mappings. We overcome the indeterminacy of the minimum function by asking in addition that the continuum 9-1 be a circle in the w-plane. This can always be achieved by an admissible mapping in the w-plane. It is easily seen that the radius of the circle must be r = e y.. We must characterize the extremum function by stating what happens to the image of B if A is mapped on the circle.
We know that B will be mapped onto a continuum jb in the w-plane. Let wo be a point of 0. By Theorem I of Sec. 2 there exists an infinity of functions (4) which are univalent and normalized. These are competing functions to the extremum function f(z). But observe that the mapping (39)
w* = to +W bl
p WO
+ O(P3)
will change the capacity constant ya of the circle jwl = e7a. (40)
g(w) = log *1 - 'Ya,
Since
p(w) = log w - ya,
we find that the varied image of A will have the capacity constant (41)
ya = ya - Re {bip2wo2} + O(p3).
CONFORMAL MAPPING
103
Because of the minimum property of f (z), we have necessarily y*
ya.
Hence, for all admissible functions (4), we have (42)
Re {blwo2} < O(p).
We can now apply Theorem II of Sec. 2 and conclude that the image of B is an analytic are b which satisfies the differential equation (43)
w'2w-2 = 1.
Thus, 58 is a ray w = tetix, where etix is a fixed sign factor and t the real parameter. Hence, we have proved: The minimum value for ya is attained when the arc A is mapped into a circle and the arc B into a radial slit.
Once this geometric characterization of the minimum function f (z) is obtained by variational methods, it is easy to express f (z) explicitly in elementary functions. In particular, the calculation shows that in the minimum case
r. = e1a = sine
(44)
Hence, we have proved: Every function of the family maps an arc of the unit circumference of opening a into a continuum whose mapping radius r satisfies the inequality (45)
r > sin2 4
Since a + 0 = 2ir implies sin (p3/4) = cos (a/4), we see that two complementary arcs A and B are mapped onto continua 9K and 0 whose mapping radii satisfy the inequality ra + r5 >= 1.
(46)
Observe now that the unit circumference is mapped by a normalized mapping onto the continuum ?X + -58 and that, by definition (33), its mapping radius is, therefore, exactly 1. It is easily seen that, if we start with a closed curve C and subdivide it into two arcs A and B, we have always (47)
rA > rc sin2 4,
rB > re sin2
4-
+ rB > = rc.
Here a and ii are the angles of the images of A and B if C is mapped upon a full circumference, and rA, rB, rc are the exterior radii of the continua. The
quantities rc and a play an important role in the lift theory of an infinite cylindrical wing with cross section C which is subdivided by the stagnation points into subarcs A and B. The inequalities (47) are of significance in this connection. We have shown in the preceding example how one can combine the method
of interior variation with the method of boundary variation of Sec. 2.
We
104
M. M. SCHIFFER
obtained the complete answer to our minimum problem and a number of interesting inequalities. Let us show next an application of (36). We prescribe a closed curve C, and ask for another closed curve C, which shall contain C in its interior, shall enclose a prescribed area A, and shall have a minimum value for its capacity constant y. While it is possible to show that an extremal curve C, must exist, it is by no means sure that it will be smooth enough to admit the variational formula (36). For the sake of analysis, let us assume that a smooth extremum curve C, exists and try to characterize it by means of (36). We decompose C, into those arcs which are in touch with points of C and
into "free" arcs which do not touch C and can be shifted a little without violating the side condition that C, contain C. Under a variation Sn of the free arcs we have obviously (48)
SA = - fa on ds.
If we keep SA = 0, we must necessarily have Sy > 0, and the fundamental lemma of the calculus of variations leads to the consequence (49)
T = constant,
(on the free arcs).
We can now give an interesting fluid-dynamical interpretation to the situation described by the above analysis. We may conceive g(z) as the stream function of an incompressible and irrotational fluid flow. It is generated by a vortex at infinity and has Cl as closed streamline. Along the free arcs of Cl the stream velocity is constant, and hence, by Bernoulli's law, the pressure is constant along these arcs. In other words, g(z) describes a circulation around the given obstacle C which leaves parts of the fluid around C in rest. The combined obstacle of C and the stagnant fluid forms the body Cl; the con-
dition of equal pressure in the fluid at rest is guaranteed by the extremum condition (49). We have not proved the existence of such a flow pattern around the curve C, since our reasoning by means of (36) was purely heuristic. But once we have
got an insight into the nature of the extremum function, we may apply the method of interior variation in order to analyze the extremum function in an exact manner and to prove the existence of the flow pattern from compactness theorems in analytic function theory. This method was applied successfully by Garabedian and Spencer [8,15] in order to prove the existence of cavitation
in two-dimensional fluid dynamics, and it can be readily extended to more complicated situations [4].
The role of the variational formula (45) as a heuristic tool and of (47) as a less intuitive but more precise and applicable tool is well illustrated by the above example. It is often easy to foresee the answer to an extremum prob-
CONFORMAL MAPPING
105
lem by means of Hadamard's formula, and afterwards one can give an exact proof by means of interior variations. We mention finally the application of the method of interior variations to the coefficient problem for univalent functions [6,7,17]. One obtains immediately differential equations for the extremum functions; since the coefficients, in these differential equations depend in turn on the unknown extremum function, one is led to an interesting functional problem which has been solved in several cases.
The method of interior variations is flexible enough to admit variations for important subclasses of univalent functions. One can preserve under variation the property that the function has real coefficients in its Taylor development at the origin and the condition that the function be bounded by a given value. V. Singh has recently considered the class of univalent functions whose image domain contains a fixed given circle; it is possible to vary in such a way as to preserve this property of univalent functions.
It is also possible to generalize the variational procedure to the case of potential theory on Riemann surfaces and to apply it to the theory of p-valent functions [19,24].
5. The Fredholm eigenvalues. We want to discuss in this section a fundamental problem of two-dimensional potential theory. If we have a domain D bounded by a closed smooth curve C, we can apply the Poincar6-Fredholm theory and reduce the solution of every boundary-value problem to solving an inhomogeneous integral equation (50)
,f(z) = O(z) - a
fk(z,)) dst,
with (51)
7c(z,l') =cant log z
1
(zE C),
I
The integral equation with the transposed kernel leads to the solution of Neumann-type boundary-value problems. The same integral equations serve also to solve corresponding boundary-value problems for the complementary domain D, the outside of C.
It can be shown that the lowest eigenvalue X of the kernel
which Hence, a convergent belongs to a nonconstant eigenfunction is larger than 1. can be derived, Neumann-Liouville series for the resolvent kernel of and (50) can be solved by successive approximation. However, the convergence of this iterative procedure depends strongly upon the value of X and is better if X is larger. It seems, therefore, important to study the eigenvalue X as a functional of the curve C and to derive variational formulas for it. In this way, we shall obtain some results which are important if one wishes to apply the integralequation technique to harmonic boundary-value problems.
M. M. SCHIFFER
106
We start with the integral equation (52)
4 (z) = ° fC
dsr,
(z r,. C),
which defines the eigenvalue X, of k(z,t) and the eigenfunction 0y(z) on C. We introduce the harmonic function (53)
h,(z)
fk(z,()
aY
dsr,
which is defined for z v D. Because of the well-known discontinuity behavior of a double-layer potential, we have (54)
lim h(z) = (1 + X)0,(zl),
(z1 E C).
Z-4Z1
Thus, hv(z) is a harmonic function in D whose boundary values are proportional to the eigenfunction 0,(z), which is defined only on C. We next define the analytic function in D (55)
V, (Z) c
7z
hv(z)
The integral equation (52) can be expressed in terms of vy(z) in the elegant form (56)
tai
v°(z)
f ( - z)-'v,(i') df', o
(z t D).
Observe that the differentiation in (55) weeds out the constant eigenfunction and that all eigenvalues of (56) are larger than 1. It is easily seen that we can assume all eigenfunctions to be orthonormalized according to the condition (57)
f f vYv dT D
Similarly, we may define in the complementary domain D a harmonic function /Y(z) with an analytic derivative 'U (z). It is easily seen that (54')
(1 - X,)0,(z1),
lim 2-121
(z1 E C),
and that (56')
iUZ(z)
= tai
f(C
di',
(z E D).
Since the normal derivative of the double-layer potential passes continuously through the charged curve, we have (58)
an
h, (z) = an hv(z),
(for z z C),
CONFORMAL MAPPING
107
while (54) and (54') imply (59)
h,(z) = 1 + aY hy(z).
If we differentiate the boundary relations (58) and (59), we obtain the following connection between vv(z) and 77),(z) on C: (60)
iv,(z)z' = (1 + X)-lv,(z)z' - A'(1 + X)-1vY z)z'.
On the other hand, we may use the right-hand side in (56) in order to define an analytic function V,(z) in U. By the well-known theorem of Plemelj for the jump of a function defined by a Cauchy integral, we find for z E C V '(Z) = vy(z) - X x'(7i)7i'2.
(61)
Comparing (60) with (61), we find (62)
(1 +
271i
f(-
z E D),
and analogously (62')
(1 -
27ri
c
( - z)-1
(for z e D).
Thus, the knowledge of one analytic eigenfunction leads immediately to a simple representation for the corresponding eigenfunction of the complementary domain. We assumed the eigenfunction 0,(z) chosen in such a way that v,(z) is normalized. Since
fV2dr = 41 fc h, anv ds = 1,
(63)
D
the conditions (58) and (59) show that
ffI2dr=A, +-
(64)
1
8
We introduce, therefore, the normalized eigenfunction
v,(z) = i \Ix. + 1
(65)
ivv(z),
which also satisfies (56') and is related to vv(z) by the symmetric formulas (z) (66)
u,(z) =
z c D),
Ay 2Jri xy
2t 1/1-vAai
df,
(for z v- D).
M. M. SCHIFFER
108
Finally, we remark that the integral equation (56) may be expressed in the alternative form ( 67)
v(z) _ Ty
dr,,
ff
(z E D),
D
which leads to close relations to the theory of the Hilbert transform [1,2]. Let us perform now a variation (with zo E D).
(68)
It will transform the curve C into a curve C* with eigenfunctions vv (z) and eigenvalues A*. We write down integral equation (56) with respect to C*, but by the aid of (68) we can refer everything back to1D. Observe that (69)
v*
+z
(Z
E
Wy(z)
(1
Z0
J
(1
- (z
E
zp)2 / = W, (z)
/
Hence, we find
is regular analytic in D. (70)
E
l*
(z --z o)2/
2ai
1 Wv(J) d C
[1
(z -
zo)
z
Thus, Wy(z) and X* are eigenfunctions and eigenvalues for an integral equation with respect to the same domain but with a slightly changed kernel.
Let us suppose, for the sake of simplicity, that to the eigenvalue X, there corresponds exactly one eigenfunction v,(z). Then, clearly, the corresponding eigenfunction Wy(z) will be near to v,(z). Hence, using (56) and (66), we can transform (70) into (70')
Wy (z)
'!r
ff ( - Z)2 d7- D
(Z
+ 0(E). ZO)2
Finally, we multiply this identity by v;-(Z) and integrate over the domain D. Using (66) and (67), we obtain
ffwdi. = yX, ff v4Wy dT - ?E 1 -X,Xy 71v(zo)2 + 0(E). z
(71)
D
D
Now, Wy(z) is very near to vy(z), and vy(z) is normalized. real parts on both sides of (71), we find [2] (72)
Hence, taking the
X* - X. = Re {7rE(1 - A2)PJv(z0)2} + 0(E).
If we had used a variation of the form (68) but with zo E D, the symmetry of the functions v, and vy would have led to the analogous result: (73)
X* - ay = Re {ire(1 - X2)vy(zo)2} + 0(E).
In the case of an eigenvalue of higher multiplicity, we would have obtained
CONFORMAL MAPPING
109
a secular equation for A*, the terms of which are expressible in an obvious manner in terms of the eigenfunctions v"(z) and v"(z) which belong to the degenerate eigenvalue. It is easy to derive a Hadamard-type variational formula, once the interior formulas (72) and (73) have been established. For this purpose we put (73) into the form (74)
2e
i (1 - A,) C c
SA" = Re
- zo d
(zo a D). J
Since zo a D, we have, by Cauchy's integral theorem and the fact [obvious from (66)] that v"(oo) = 0,
0 = Re
(74')
{(i - AY) C 2i ! e
2
v"
d },
(zo a D).
But using (60) and the definition (65), we see easily that on C (1 - A2 v2 '2 (75) X2 - 1 v2 '2 = A2 v2 '2 v2Y 12 - 2A Iv I2 is purely real. Thus, combining (74) with (74'), we obtain (76)
SA"
_-{
[A" Re
Iv"12] Re
ds.
Using the geometric interpretation (28) for the last factor, we arrive at (77)
X"""
[X" Re
Iv"12] an ds.
It can also be shown from the boundary relation (60) that (78)
A" Re
Iv"I2
= -(A" Re {p2 12} - Iv"I2).
Since the interior normal with respect to D has the opposite sign to that with respect to D, we see the complete symmetry of (77). We have derived the variational formula (77) with respect to the particular kind of variation (68). However, we can obtain the most general an variation of C by superposition of variations of this type, and, by a limit argument, we can establish (77) for the most general case admissible. The eigenvalue variational formulas have been derived under the assumption that the curve C is smooth. In the case that C has a slit component, however small, it can be shown that its lowest eigenvalue is necessarily 1. This fact shows that the lowest eigenvalue depends in a highly unstable fashion upon the boundary, and consequently we shall have to frame extremum problems for this eigenvalue in a particularly careful manner. We shall be able to assert the existence of extremum curves C and characterize them by variational procedures, if we will admit some class of analytic curves C and vary within this class. A very useful extremum problem for the lowest eigenvalue is the following: Suppose that there exists an analytic function p(z) which is defined and uni-
M. M. SCHIFFER
110
valent in the ring domain
(r 1, the problem requires a maximization over Nk-dimensional space. The increase in grid points is exponential, rather than proportional, with a corresponding increase in computing time. Fortunately, the function 0(pN) is not an arbitrary function of N variables, but one arising from an iterative process. As a consequence, it possesses a specific structure. On metamathematical grounds, appealing to that most
DYNAMIC PROGRAMMING
117
useful of principles, the principle of wishful thinking, there should exist ways and means of taking advantage of the special features of the function so as to
reduce the dimensionality of the problem and so to ease the analytic and computational difficulties. As we shall see below, this is indeed the case. 2. The principle of optimality. Let us begin by introducing some notation. Any admissible set of choice variables (qi, q2, , q,v) will be called a policy,
and a set which maximizes the criterion function 4'(pN) will be called an optimal policy.
The problem of maximizing ¢(pN) is equivalent to that of determining all optimal policies. These optimal policies may be characterized by the following simple and intuitive principle : PRINCIPLE OF OPTIMALITY. An optimal policy has the property that, whatever
the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.
As we shall see in the following pages, the analytic transliteration of this principle yields the functional equations which can be used to resolve the original maximization problem. 3. Functional equations. In order to solve the original problem, we begin by complicating it. In place of the original definite problem involving a given initial state p and a fixed number of stages, we pose ourselves the problem of determining optimal policies for similar processes involving any number of stages, starting from an arbitrary initial position. In other words, we embed the original problem within a family of related problems. Furthermore, in place of determining the entire set of choices (ql, q2, qN) which constitute an optimal policy, we endeavor to determine q1, the first choice, as a function of p and N. This information, when available for all p and N, suffices to determine all optimal policies. Consistent with this idea, define the sequence of functions fN(p) = max 0(pN), (3) [51,g2, ... ,QN]
. , and for all p within a prescribed region of variability D. for N = 1, 2, Let us assume that ps D implies that T(p,q) c D for all q, and impose suf-
ficient continuity restrictions to ensure the existence of the above maximum. The principle of optimality cited in Sec. 2 yields the recurrence relations (4)
for N = 1, 2, (5)
fN+1(p) = max fw(T(p,q)),
,with
qj
fi(p) = max 4(T(p,gl)) q1
Starting with the computation of f 1(p), involving a k-dimensional maximization, where k is the dimension of the q-vectors, we compute f2(p), using (4) above, then f3(p), etc. In this way the original Nk-dimensional problem is reduced to a sequence of N k-dimensional problems.
RICHARD BELLMAN
118
4. Continuous decision processes and the calculus of variations. Let us now consider a continuous version of the above discrete process. In place of determining a sequence { qk } which maximizes, we wish to determine a function y(t) which maximizes. In lieu of an abstract definition, which can involve us in a great deal of verbiage, most of it extraneous to the present question, let us consider a specific problem in the calculus of variations, a member of the
general class we shall treat in succeeding sections, and interpret it to be a continuous decision process. We wish to maximize the functional
J(x) = foT F(x,x') dt
(6)
over all functions x(t) satisfying the initial condition x(O) = c. Let us assume that F satisfies appropriate conditions ensuring that the maximum is assumed by a function x(t) possessing a derivative. Since J(x) may be written
J(x)=f5+ ffT,
(7)
(0<S< T),
it is easy to see that, upon having chosen x(t) over the initial interval [0,S], we have a variational problem of precisely the same type as the original, with the distinction that the length of the interval is now T - S and the initial value is the value of x(t) at t = S. This suggests that we follow the same approach as above, and define, for T >_ 0 and all c,
f(c,T) = max J(x).
(8)
X
The principle of optimality yields the functional equation (9)
f (c,T) = max [ f S F(x,x.') dt + f (c(S), T - S) ], x[0,S]
0
where c(S) = x(S) and the maximization is over all functions x(t) that are defined over all 0 < t < S, that satisfy x(0) = c, and that possess derivatives. More precisely, we are maximizing over all values of x'(t) over the interval [0,S]. Let us now assume that the solution of the variational problem is a continu-
ous function of the state variables c and T, and that f (c, T) has the requisite continuous partial derivatives. Letting S -> +0, the limit of (9) is the nonlinear partial differential equation (10)
aT = max [F(c,v) + vac]'
where we have set v = v(c,T) = x'(0).
This is the analogue of the recurrence relation of (4). Once again we have reduced a global maximization to a local maximization. The classical
DYNAMIC PROGRAMMING
119
approach seeks to determine the solution in the form of a function x(t), while the dynamic-programming approach is to determine x' as a function of x and T - t. Geometrically, the classical approach regards a curve as a locus of points, whereas we are taking it to be an envelope of tangents. The duality will manifest itself in a moment in our derivation of the Euler equation from (10) above. Differentiating the right-hand side of (10) to determine a maximizing v, we see that (10) is equivalent to the two equations
aT =
(11)
F(c,v)
+ vac'
0 = av +ac or
(12)
of
aF
ac
av aT=F-vav
Eliminating f between these equations, we obtain the first-order quasi-linear partial differential equation for v, 2
2
32
av2 VT = F, - vv':
(13)
av -
v
acF
av
*
As is easily verified, the equations of the characteristics are equivalent to the usual Euler equation for the variational problem. A fuller discussion of this approach may be found in [5] and [7]. b. Constraints. If we consider the maximization problem of (6) subject to a constraint of the form
0 0. 0 < 1 xii(t) 1, with (41)
f(C1,C".
.
CN, 0) =
F(C1,C2,.
.
.,CN).
Proceeding as in the previous part, we obtain the recurrence relation (42)
f (C1,C2,.
.
.,CN, T + 1) = max f f (r1 +
91(v11,v21,.
.
vNl),
(Vii)
r2 +
rN +
92(v12,v22j.
9N(V1N,V2N,.
.
,VNN), T)],
where N
0 < I v;j < ct,
(43)
(Vii >= 0).
j=1
Starting with the determination of f(c1,c2,
,CN, 0), we compute f(ci, This approach overcomes difficulties b through e of Sec. 10 but faces grave difficulties, with C2,
,CN, 1), using (42); then f(c1,c2,'
,cN, 2); etc.
present-day computing machines, as far as dimensionality is concerned. The value of T, no matter how large, causes no trouble; it is the value of N which is significant.
For values of N larger than 4, not only do we run up against memory problems, but the question of access time becomes significant. 12. Continuous version. To pass from the discrete case to a continuous allocation process, we consider allocations over the interval [t, t + A] having the form x;j(t)0. The quantity x;j(t) now represents a rate of allocation. In place of (35), we obtain a system of differential equations (44)
dx; = dt
hi(x1i)x2a,.
.
.,XNi),
x;(0) = c;.
The inequalities of (34) reduce to (45)
0 0,
a constraint automatically satisfied in the discrete case. Subject to these constraints, we wish to maximize the function (47)
F(x1(T),x2(T),. ..,xN(T)),
DYNAMIC PROGRAMMING
125
or, more generally, an (integral of the form (48)
.,xx(t)) dG(t).
JoT
The continuous case appears to afford a much more attractive problem than the discrete case, because of the absence of many of the constraints. Unfortunately, precisely because of the absence of upper bounds, the maximum may not exist, unless we allow the use of delta functions. In precise terminology, it may be necessary to examine variational problems over distribution functions in the sense of L. Schwartz.
To avoid this, in order to obtain a solution of the type we wish to admit as an economic solution, it is customary to impose an upper bound on the rate of allocation: (49)
mzj < X,
0 < xz, (t)
(i, j = 1, 2,
.
.
.
,
AT).
Under this condition, we can establish the existence of a solution of the variational problem, under reasonable assumptions concerning the functions appearing, using a "weak-convergence" argument. 13. Discussion of computational difficulties-continuous case. Many of the difficulties discussed above in Sec. 10 in connection with the application of classical variational methods to the discrete process are also attendant upon the continuous case. We shall list some of the difficulties and discuss only those which have not been considered above. a. Two-point boundary conditions b. Nonuniqueness of solution; local maxima c. Noninterior maximum; constraints d. Nondifferentiable functions e. Stability analysis
The analogue of the finite-dimensional variational equations, obtained by equating partial derivatives to zero, is the Euler equation. This is, in general, a nonlinear differential equation. In order to illustrate in its simplest form what we mean by the difficulty of the two-point boundary condition, consider the problem of maximizing T
(50)
J(x) =
F(x,x') dt, o
(x, = d
over all x subject to x(0) = c. Proceeding purely formally, the Euler equation is (51)
aFdaF TX
dt ax'
RICHARD BELLMAN
126
This is to be solved subject to the two conditions a. X(0) = c aF
(52)
= 0.
b.
Observe that two conditions are required, since (51) is a second-order equation. The first condition (4a) is part of the data; the second condition is derived from the variation.
If (51) is nonlinear, as it is in general, there exists no uniform method for determining the solution of (51) satisfying (52), nor even for determining the existence and uniqueness of solutions.
The usual approach, in any particular problem, is to take x'(0) as an unknown and try a range of values until (4b) is satisfied. Although this method, with modifications, is efficient in one dimension, in multidimensional problems it breaks down. Let us also briefly mention some of the difficulties introduced by constraints. Assume that, in the above problem (50), we had imposed a constraint
0<x' 0,
(for 0 0, IG(x,y)l < ait + a2JxI,
(for 05y<x).2
C.
d.
(for 0 0, and it is plausible that of/ac is a monotone increasing function of T for t >_ 0, if we think of an economic interpretation of the analytic problem:
FIG. 1
It follows then that for small T, with c fixed, the maximum will be at v = 0. As T increases, the maximum stays at v = 0 until we reach a value of T where (71)
ac
For larger T, there will be a maximum inside the interval [v,c] until T becomes so large that (72 )
(;)= v
For larger values of T, the maximum remains at v = c. It follows that the "policy function," v = v(c,T), has the form shown in Fig. 1 as a function of c and T. Having determined the structure of the policy, we can now determine the precise boundary curves, employing (68), (71), and (72). 1 This condition is imposed to ensure the existence of a solution of (68a) for all T k 0.
RICHARD BELLMAN
130
A rigorous discussion of this problem, using classical variational methods, is presented in [13].
18. Linear functions and functionals. A particularly interesting class of problems derives from the general problem of Sec. 12 under the assumption that all functions involved are linear. Using vector-matrix notation, the problem is that of maximizing the inner product
J(y) =
(73)
fOT
(x(t),a) dt,
where (74)
a.
b.
dx dt
x(0) = c,
= Ax + By, Cy 0,
(26)
unless x = 0. We therefore can introduce the left member of (26) as new norm in L; i.e., we introduce
(x,M()) = [x,]
(27)
as new scalar product in L. The linear space thus obtained is called H, and its completion H. Now, for x C H, the functions y = M(x), y' = Ml(x) are continuous, so that they can be substituted in (1); in other words, for x C H, the definition
i(x) = j(M(x))
(28)
makes sense. We see then from (1), (2), (23), and (27) that
i(x) =
(29)
[x2 ]]
+ I(x)
where as
(30)
I( x) =
d
bi(t) dti dt +
al
fal a c(t,y) dt,
[(y = M(x))].
ti=1
It can be seen that i(x) can be extended to H.10 If (31)
ci(t,y) =
UaY_c,,. .,
c
Y-)
exists and if its components satisfy the assumptions previously made concerning c, then it can be easily verified that the first Frechet differential di(x,E) of i(x) at the point x belonging to the increment exists and is given by the formula (32)
,g(x))
where the "Euler operator" g(x) is defined by (33)
g(x) = x + G(x),
G(x)
dt + ci(t,M(x)).11
10 For a proof, see [10, Lemma 3.1]. 11 Here again the operator is first defined for x C H and is then extended. ing footnote.
Cf. the preced-
APPLICATIONS OF FUNCTIONAL ANALYSIS
149
If in addition the second derivatives a2c/ayi ayi exist and satisfy the conditions previously made on c, then it can be verified that the second Frechet differential d2i(x,g,r) of i(x) at the point x corresponding to the increments E, exists and is given by the formula (34)
[E,k(x,l')]
where the "Jacobi operator"
is defined by
K(x,f) = (K.,. ..,K.),
(35)
with (36)
Ki = ) j=1
(see footnote 11).
a2c
a yi ayi
Bi,
0 = (91,.
.,0n)
=M
If we use the classical notation 9(17,77') -
/
IJiP,1707j + J 7nj17,gj],12
then direct computation shows that (37)
2 fat"
dt,
(7 = M( ))-
Now the advantage of introducing, instead of y,,q, the "new variables" x = L(y), = L(,l) lies in the fact that the operator G(x) defined in (33) is completely continuous and that the operator defined in (35) is completely continuous in x as well as in r; in the latter variable it is, moreover, linear and symmetric.13 This allows us to draw nearly immediately the following conclusions from
the theory of gradient mappings: From (32) we see that g(x) is the gradient of i(x), and comparison of (29) with (33) shows then that the completely continuous G(x) is the gradient of I(x). From this, the existence of a minimum for i(x) in any ball [x,x] < K2 follows [8, p. 430, Theorem 4.1]. If c > 0, we can conclude that, for K big enough, the minimum is taken in an interior point xo of the ball and that xo is a critical point, i.e., that g(xo) = 0 (cf. the corresponding remark in the last paragraph of Sec. 1). Moreover, from (34) and the spectral theorem for completely continuous symmetric operators, we see that the "index form" (37) can be written in Note that, in our case, f,, = 0. 13 As to the proof of this, we remark first that, on account of the Lipschitz assumptions for the first and second derivatives of c with respect to the y;, it is easily seen that the complete continuity of G and K follows from that of y = M(x). The proof for the complete continuity of this latter operator is quite analogous to the proof of the first part of Theorem 2.1 in [10], to which we therefore refer. The symmetry of K follows from (35) together with the symmetry of d2i(x,t,l) in and f. !2 See, for example, [7, p. 7].
E. H. ROTHE
150
the form d2i(x, s) _
(38)
[ey, ]2X
,
Y=1
where e, and X, are the eigenelements and corresponding eigenvalues of the problem
k(s) _ X
(39)
and that at most a finite number of the X. are negative. This follows from the fact that i, = Ay - 1 are the eigenvalues of the completely continuous (in t) operator K(x,l-); these are either finite in number or tend to zero as v -* oo so that, in the latter case, X. -+ +1. Also to each eigenvalue belong a finite number of eigenelements ev, since the e, are also eigenelements of the completely continuous K(x,f'). For the same reason the "nullity" [i.e., the number of eigenelements to the eigenvalue 0 or, what is the same, the number 014] is finite. of linearly independent solutions of the "Jacobi equation" If x = x0 is a critical point, then the index of xo is defined as the number of negative terms in the quadratic form (38) or, what is the same, the number of linearly independent eigenelements of (39) belonging to negative eigenvalues.
If the "Jacobi equation" has no (nontrivial) solutions, then the quadratic form (38) is nondegenerate, and it can be shown that then the critical point xo is isolated (see [10, Theorem 4.2]). In this case the index can be related to the topologically defined Morse-type numbers of real-valued functions in Hilbert space (see [10, Theorem 4.3]).
We finally remark that the fact that the index is the number of negative terms in the quadratic form (38) for the second differential makes it nearly evident that a critical point xo furnishes a (relative) minimum if and only if the index is zero, provided that we add a proper "differentiability condition"; e.g., if we assume that the third Frechet differential exists at xo.15 For a formal proof we have only to apply "Taylor's theorem" (with remainder term) up to terms of order three (see [3, Theorem 5]). BIBLIOGRAPHY
1. R. Courant and D. Hilbert, Methoden der mathematischen Physik, vol. 2, SpringerVerlag OHG, Berlin, 1937.
2. K. Friedrichs, Spektraltheorie halbbeschrankter Operatoren and Anwendung auf die Spektralzerlegung von Differentialoperatoren, Math. Ann. vol. 109 (1934) pp. 465-487.
14 If k = (ki,
,k,), direct computation shows that (for x C H),
-ki(x)=dt017'-07iSince 7(a') _ i7(al) = 0, it follows that, if ;: is a solution of the "Jacobi equation" of the text, then n = M(t) is a solution of the classical Jacobi equation vanishing in the end points. If t is nontrivial, then obviously 77 is nontrivial. 11 Sufficient for this is that the third derivatives of c with respect to the yi exist and satisfy the assumptions previously made about c.
APPLICATIONS OF FUNCTIONAL ANALYSIS
151
3. L. M. Graves, Riemann integration and Taylor's theorem in general analysis, Trans. Amer. Math. Soc. vol. 29 (1927) pp. 163-177. 4. M. R. Hestenes, Application of the theory of quadratic forms in Hilbert space to the calculus of variations, Pacific J. Math. vol. 1 (1951) pp. 525-581. 5. E. Kamke, Differentialgleichungen, Losungsmethoden and Losungen, vol. 1, Gewohnliche Diferentialgleichungen, Akademische Verlagsgesellschaft M.b.H., Leipzig, 3d ed., 1944. 6. T. Kato, Quadratic forms in Hilbert space and asymptotic perturbation series, Tech. Rept. No. 7, prepared under Contract DA-04-200-ORD-171, Task order 5, for Office of Ordnance Research, Department of Mathematics, University of California, Berkeley, Cal., April, 1955. 7. M. Morse, The calculus of variations in the large, Amer. Math. Soc. Colloquium Publ. vol. 18 (1934). 8. E. H. Rothe, Gradient mappings and extrema in Banach spaces, Duke Math. J. vol. 15 (1948) pp. 421-431. 9.
, A note on the Banach spaces of Calkin and Morrey, Pacific J. Math. vol. 3
(1953) pp. 493-499. 10. , Remarks on the application of gradient mappings to the calculus of variations and the connected boundary value problems, Comm. Pure Appl. Math., N. Y. U. vol. 9 (1956) pp. 551-568. 11. R. G. Sanger, Functions of lines and the calculus of variations, Univ. of Chicago Dept. of Math. Contributions to the Calculus of Variations (1931-1932) pp. 191-293. UNIVERSITY OF MICHIGAN, ANN ARBOR, MICH.
INDEX Malus, theorem of, 40 Mappings, gradient, 149 Metric, definite, 79 indefinite, 79
Allocation processes, multistage, 121 Aronszajn, N., 54
Bottleneck process, 134
Operators, half-bounded, 143 Optimal policy, 117
Caustic, 35 Characteristic-value problem, double, 139
Courant, 55
Perfectly plastic material, 18
Decision processes, continuous, 118 multistage, 115 Deformation theories, 12
Eiconal equation, 27, 40 Eiconal function, complex, 43 Eigenvalues, 105, 150 Electromagnetic vibrations, 85 Energy, complementary, 9 potential, 9 Equilibrium-stress field, 8 Euler (differential) equations, 2 Existence proof, 143
Rate principle, complementary, 15 Rayleigh-Ritz method, 57 Rays, diffracted, 27, 28 imaginary, 41 normal congruences of, 40 Smoothing processes, 130 Snell's law, 29, 39 Spectral theorem, 143 Stationary principles, 79, 89 Strain-displacement field, 8
Univalent functions, group property
Fermat's principle, 38, 41
of, 94
Green's function, 97
Variational principle, for displacements, 3 for stresses, 4 Vibrating membrane, 80 Vibrations of an elastic body, 83
Hadamard variational formula, 102, 109
Hilbert transform, 108 Index form, 147 Instability, 139 Interior variation, method of, 97
Weinstein's method, 60 Work hardening, 15 Work-hardening relations, 13 153