Methods in Electromagnetic Wave Propagation , 2nd Edition

METHODS IN ELECTROMAGNETIC WAVE PROPAGATION SECOND EDITION IEEE Series on Electromagnetic Wave Theory aeory The IEEE ...

Author: D. S. J. Jones

273 downloads 1738 Views 50MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

METHODS IN ELECTROMAGNETIC WAVE PROPAGATION SECOND EDITION

IEEE Series on Electromagnetic Wave Theory aeory

The IEEE Series on Electromagnetic Wave Theory consists of new titles as well as reprintings and revisions of recognized classics that maintain long-term archival significance in electromagnetic waves and applications.

Series Editor Donald G. Dudley University of Arizona

Advisory Board Robert E. Collin Case Western Reserve University Akira Ishimaru University of Washington D. S. Jones University of Dundee

Associate Editors Electromagnetic Theory, Scattering, and Diffraction EhudHeyman Tel-Aviv University Differential Equation Methods Andreas C. Cangellaris University of Arizona Integral Equation Methods Donald R. Wilton University of Houston Antennas, Propagation, and Microwaves David R. Jackson University of Houston

Books in the Series Chew, W. C., Waves and Fields in Inhomogeneous Media Christopoulos, C., The Transmission-Line Modeling Methods: TIM Collin, R. E., Field Theory ofGuided Waves, Second Edition Dudley, D. G., Mathematical Foundationsfor Electromagnetic Theory Elliott, R. S., Electromagnetics: History, Theory, and Applications Felsen, L. B. and Marcuvitz, No, Radiation and Scattering of Waves Harrington, R. F., Field Computation by Moment Methods Jones, D. So, Methods in Electromagnetic Wave Propagation, Second Edition Lindell, I. V., Methods for Electromagnetic Field Analysis Tai, Co To, Generalized Vector and Dyadic Analysis: Applied Mathematics in Field Theory Tai, Co To, Dyadic Green Functions in Electromagnetic Theory, Second Edition Van Bladel, J., Singular Electromagnetic Fields and Sources

Wait, J., Electromagnetic Waves in Stratified Media

METHODS IN ELECTROMAGNETIC WAVE PROPAGATION SECOND EDITION

D. S.

JONES

UNIVERSITY OF DUNDEE

+IEEE

The Institute of Electrical and Electronics Engineers, Inc., New York

ffiWILEY~ INTERSCI ENCE A JOHN WILEY & SONS, INC., PUBLICATION

A NOTE TO THE READER This book has been electronically reproduced from digital information stored at John Wiley & Sons, Inc. We are pleased that the use of this new technology will enable us to keep works of enduring scholarly value in print as long as there is a reasonable demand for them. The content of this book is identical to previous printings.

IEEE PRESS 44S Hoes Lane, PO Box 1331 Piscatawaty NJ 08855-1331 IEEE Antennasand Propagation Society, Sponsor ©D.S.Jones, 1979, 1994 First edition 1979Oxford University Press Second edition 1994 Reissued 1995 jointly with IEEE Press

0)

=0

(x

~

0).

Thus [(3)+}3 = 27, {(-3)+}3 = 0 whereas (t - 5)+ = t - 5 if t> 5 but 0 if t ~ 5. Then applying (1.9) for i = 2, ... , n and using an equally spaced partition with Xi + 1 = ih we see that n

S"

= 2Po/h 2 + 613 I x/h 3 + L 6p;{ x i=2

where the first two terms represent S

(i - l)h} +/h 3

S';.

(0

~

x ~ nh)

After two integrations we obtain

= ct o + ctt(x/h) + Po(X/h)2 + Pl(x/h)3 +

n

L Pi{X -

;=2

(i - l)h}~/h3.

(1.10)

By construction S and its first two derivatives are continuous: we make it take the value /; at x = ih by requiring that CXo =

CXo

+ cxlm + 130m2 + 131 m3 +

10'

m

L pi(m i=2

i + 1)3 =

Let us use the central difference operator ~f(x)

~,

= I(x + th) -

Then ~/(th) = f(h) - 1(0) or ~/l/2

~21m =

= 11 -

1m

(n ~ m ~ 2).

defined so that

f(x - th). fOe Similarly

Im+l - 21m + 1m-10

(1.11)

8

ASPECTS OF NUMERICAL ANALYSIS

Our equations can now be written as cto = 10'

ct l

+ 130 + 131 = ~/I/2' 2130 + 6f31 + f32 = ~2/I

6131 + 5132 + 133 = b3/3/2' 13m + 2 + 4f3m+ 1 + 13m = ~4/m (m = 2, ... , n - 2)

(1.12)

which are (!l + 1) equations governing the (n + 3) coefficients a o, aI' Po, · .. , /3". Two of these coefficients may be chosen arbitrarily and then the others found from (1.11) or (1.12). Once the eqns (1.11) or (1.12) have been solved the coefficients in (1.10) are linear combinations of the values Ii of 1 at x = ih. Accordingly, (1.10) can be rewritten in the form

S=

L"

hCj(x)

(1.13)

i=O

where the polynomials Ci(x) can be determined. Clearly Ci(jh) = 0 (j =1= i) and Ci(ih) = 1 for i.] = 0, 1, ... .n. The functions Ci(x) are known as cardinal splines. They can be regarded as basic functions for (1.13) but they are not satisfactory for many practical applications because they are non-zero over most of the interval. To overcome this difficulty cubic splines which vanish identically outside an interval of length 4h have been constructed. Consider the function Bi defined by Bi(x)=*[(x-i+2)~ -4(x-i+ l)l +6(x-i)l-4(x-i-l)l +(x-i-2)l].

(1.14)

Notice firstly that Bi vanishes identically for x ~ i - 2 and is also identically zero for x ~ i + 2. Also, since the first two derivatives of x~ are continuous, the first two derivatives of Bi are continuous and, in addition, vanish identically for x ~ i - 2 and x ~ i + 2. Thus the Bi are splines which are non-zero only for the interval i - 2 < x < i + 2; they are known as cubic B-splines and each forms a bell-shaped curve. Special consideration may have to be given to the B-splines to be used at the ends of intervals. Often one will wish them to be lop-sided in order not to stray outside the given interval; sometimes taking half a bell is satisfactory. (There is additional information about B-splines in §6.8.) One reason why splines may be preferred to the polynomial approximations described earlier in this section is that the latter are subject to the Runge phenomenon. If one is given a function and, in a definite interval, one seeks to improve the approximation by increasing the number n of points where the given function and approximant agree, one finds that, although the separation between the points of agreement decreases, the maximum difference between the given function and approximant increases and, in fact, becomes infinite as n -+ 00 if the length of the interval exceeds a certain quantity. By using different

INTERPOLATION

AND APPROXIMATION

9

polynomials in adjacent intervals as when splines are employed this difficulty can be overcome. It is, of course, possible once the splines have been constructed with specified knots to ask that the given function be matched not at the knots but at some data points chosen in some convenient way. For quadratic splines the error .between the given function and approximant tends to have a ripple on it when the data points coincide with the knots. If, however, the data points are midway between the knots the ripples die away, effectively by a factor of 6, as can be seen from the parabolic shape of cardinal spines. (For further information on splines see Ahlberg, Nilson, and Walsh (1967). Extensive tables of coefficients are given by Sard and Weintraub (1971).) 1.2 Inverse interpolation Frequently, the problem of determining where a function takes a specified value is met. In other words, given y find an approximate value of x such that f(x) = y when f is known only for certain values of x, perhaps corresponding to entries in a table. One method is to construct an interpolating polynomial p(x) and then solve (1.15) p(x) = y This is known as inverse interpolation. Inverse linear interpolation occurs when p(x) is chosen to be linear. In this case, the table is first inspected and two consecutive entries x 1 and x 2 are determined between which x must lie. Then define

p(x) = {(X2 - x)f(x 1) + (x - Xt)!(X2)}/(X2 - Xl) and the solution of (1.15) is

x

= [{!(.~2) -

y}x 1 + {y - !(X t)}X2]/{!(X2) - f(x t ) } .

If p(x) is not chosen to be linear then more complicated methods must be used to solve (1.15). Examples are Muller's method, the secant method, the method of false position and the method of bisection described in §1.8. An alternative way, if the function inverse to f is known, is to carry out interpolation on the inverse function. In general, this will be less reliable than inverse interpolation on ! because, although a polynomial may well be a good approximation to f, there is no guarantee that the inverse function can be represented equally well by a polynomial. For example, if f(x) = x 2 the inverse function x = y does not have a good representation as a polynomial near the origin x = 0, y = o.

J

1.3 Interpolation in two dimensions The problem of interpolation in two or more dimensions is much more complicated than for one variable. In part, this is due to the fact that functions

10


(x"y, )

Fig. 1.4. Triangular interpolation.

may be specified on domains of highly irregular shape. It is usually assumed that any shape likely to arise in practice can be approximated to as high a degree of accuracy as required by a network of standard shapes, e.g. triangles or rectangles, provided that they are made sufficiently small. Therefore we restrict our attention to such shapes. Suppose that we want an approximation F to f(x, y) over the triangle shown in Fig. 1.4 and suppose that F has the form F(x, y)

= cx + Bx + YY

i.e. we make a linear approximation. If we impose the condition that F and! are to agree at the three vertices we discover that where

F(x, y)

= cxt!(x t, Yt) + rx2!(X2, Y2) + rx3!(X3, Y3)

= X 2Y3 Arx2 = X3Yt Acx 3 = X 1Y2 Arx t

+ (Y2 XIY3 + (Y3 X2Yl + (Yl X3Y2

Y3)X - (x 2

-

x 3)Y,

Yl)X - (X3 - xt)Y,

Y2)X -

(Xl -

x 2)Y

and A, twice the area of the triangle, is given by A

= (x 2 -

Xl)(Y3 - Yl) -

(X3 -

X 1 )(Y2 -

Yl)·

Take another triangle with vertices (x., Yi), (X2, Y2) and (X4' Y4) which does not overlap that of Fig. 1.4 and find a similar linear approximation F, to f over this triangle. Then, since both F and F1 vary linearly along the side joining (x., Yl) and (x 2, Y2), and have the same values at the two vertices, they must be equal at every point of the side. In other words, F and F1 are continuous across the common side. In this way, by selecting non-overlapping triangles to cover the region of interest, we obtain a linear approximant which is continuous throughout the region.

INTERPOLATION

11

AND APPROXIMATION

(x"y,+k)

(x,+h,y,+k)

(x,+h,y,)

(x, ,y,)

Fig. 1.5. Interpolation on a rectangle.

If rectangular elements are employed (Fig. 1.5) we can try the approximation F(x, y) = a + px + YY + ~xy. If we require that F = f at the four vertices, we have

where at

~l

= {f(x t

= f(XI' Yt), PI = {f(x 1 + h, YI) - f(X h YI)}/h, Yt = {f(x t , YI + k) - f(x h YI)}/k, + h, YI + k) - f(x 1 + h; Yl) - f(x 1 , Yl + k) + f(x h

YI)}/hk.

For fixed Y, F is a linear function of x and, for fixed x, a linear function of F is known as a bilinear interpolant. On any side F depends only on the values at the two vertices so that, for two non-overlapping rectangles with a common side, the two bilinear interpolants take the same value on the common side. Thus bilinear interpolants yield a continuous approximant over the region covered by non-overlapping rectangles.

y. Consequently,

Exercises 1. The function f(x) has the values shown

x

f(x)

0.1 0.2 0.3 0.4

1.10517 1.22140 1.34986 1.49182

Using linear interpolation determine an approximate value for f(0.26). 2. If f(x) = 3x 2 - 1 find a piecewise linear interpolant which agrees with it at x = 0, 0.1, 0.2, 0.3, 0.4, 0.5. What approximation to f(0.33) does it give?


12

3. If I(x i) and I(Xi+ 1) are increased by the small quantities 81 and 82 respectively, what is the change to the value of the linear interpolant for I{!(x i + X i + I)}? 4. If the approximation F is linear on [a, b] and agrees with I at the end-points, show that there is some c satisfying a < c < b such that I(x) - F(x) = !(x - a)(x - b)/"(c) if f E C 1 [ a, b] and I" exists. What accuracy does this suggest for linear interpolation in a table of (i) sin x, (ii) In x when x is given at intervals of 0.01 between 1 and 2, while I is given to 5 decimal places? 5. Find a polynomial P(x) of degree 2 or less such that P(I) = 1, P(2) = 1, P'(I) = 1. 6. Show that there is no polynomial P(x) of degree 2 or less such that P(x) = a, P(x

+

h) = b, P'(x

+ !h)

¢ ! 0; otherwise replace Xi+ 1 by x'. If x' < Xo put x' for X o if {p,.(x') - !(x')}{p,.(x o) - !(xo)} > 0; otherwise replace x, + 1 by x'. Operate similarly if x' > X n + 1. Return now to (1.16) and repeat the calculation with the new set of points. Proceeding in this way we shall, after a finite number of steps (since there is a finite number of selections of n + 2 points), reach a polynomial PIt for which the inequality of Theorem 1.4a is valid at all points of the set S. Acceleration of the convergence may sometimes be achieved by the second

INTERPOLATION

AND APPROXIMATION

15

algorithm of Remes. Since p" - f changes sign in each of the intervals [x o, Xl]' [Xl' X2], ... , [X", X" + 1] it has at least one zero in each interval. Let Yi be a typical zero in [Xi' Xi+l]. In each of the intervals [a, Yo], [Yo, Yl]' ... ' [Y", b] find a value of x, say z., where p,,(z,) - f(z,) is an extremum and has the same sign as f(Xi). If, for some z., Ip,,(zi) - f(zi)1

= max Ip,,(x) -

f(x)1

xeS

work with the set zo, ... , z,,+ l' otherwise find x' so that Ip,,(x') - j(x')

and replace one

Zi

= max Ip"(x) xeS

j(x)1

by x' as in the preceding paragraph.

Exercise 13. Construct a computer program to carry out the first algorithm of Remes and use it to determine some best approximation over a finite set of points.

1.5 L2-norm approximation The determination of the best polynomial in the L 2 or least squares norm involves considerations which are more conveniently handled in a rather more general setting. If Ifl2 dx exists we write f E L 2 (a, b) or, more briefly, f E L 2 when no confusion can arise. When f E L 2 and g E L 2 we can introduce the inner product (f, g) by

S:

(f, g)

=

{b

Ig* dx

(1.19)

where g* is the complex conjugate of g. Although we are only concerned with real functions at the moment, complex-valued ones will occur later and it makes little difference to the analysis to cover both cases at once. We may verify that the right-hand side of (1.19) exists by deriving the Schwarz inequality. Clearly

or

;.2

{b 1/1

2

dx

+ 2A./1

{b I/gl

r {b

dx

+ /12

{b Igl

2

dx

~0

for any real land u. The inequality on the quadratic form can hold only if

({b IIgl

dx

~

1/1 2 dx

{b Igl

2

dx

16


whence

which constitutes the Schwarz inequality. The norm ,,/ 1/ of / is defined by

u. /)1/2.

II/II=

(1.20)

(When other norms are considered, a suffix will be added to this norm to distinguish it from the others.) The norm is always positive unless I = 0 almost everywhere. Further consideration of norms will be found in §1.11. It will be remarked that, if c is a complex constant,

(cl, g) = cif, g); (I, cg) = c*(/, g);

IIc/1I = fcill/II; (I, g) = (g,/)*·

(1.21)

From the Schwarz inequality Also

1(/, g)1

~

11/11 "gil·

~ {({b 111 2 dx /1

2+ ({b Igl 2dx /12f

(1.22)

(1.23)

by the Schwarz inequality. This may be expressed as

II! + gil On replacing T by

/1 - /2 and 11/1 - !311

~

IIfll + IIgll.

/2 - /3' ~ 11/1 - 1211 + 11/2 - 1311. as the length of f, (1.22) states

(1.24)

g by

(1.25)

If the norm of I is regarded that the modulus of the inner product of I and g is never greater than the product of their lengths. There is an obvious analogy with the scalar product of vectors and, if (f, g) = 0, we often say that I and g are orthogonal. Similarly, (1.25), expressed in terms of lengths, is the same as the triangle inequality of vectors. The distance between two functions 11 and 12 is "11 - 1211 and is zero only when 11 = 12 almost everywhere. Approximation in the L 2-norm is an attempt to reduce the distance between two functions to a minimum, distance being understood in the sense above. An important role is played by orthogonal elements. Suppose there is a finite or infinite set of functions cPl' cP2' ... , of L 2 such that

4Jn) = 0 (m # n), (4Jn' 4Jn) = l14Jn11 2 = 1.

(4Jm,

(1.26) (1.27)

INTERPOLATION

17

AND APPROXIMATION

Such a set is said to be an orthonormal set and (1.26) and (1.27) are often abbreviated to (lPm, lPn) = mn· Suppose we want to approximate a function f E L 2 by means of an orthonormal set lPh lP2' ... , lPN using the L 2-norm. Then we wish to choose the coefficients Cn so that

is a minimum. Now, on account of (1.26) and (1.27)

Ik- ntl nrJ>nl C

2 =

2

IIfl1 - ntl {c:(J, rJ>n) + cn(rJ>n, J) - cnc:}

= IIfl1 2

N

-

L

n=1

I(!, lPn)1 2 +

N

L I(f, cPn) - c nl2 • n=1

Only the third term contains the coefficients Cn and, since no member of the series can be negative, it attains its smallest value of zero when (n = 1, 2, ... , N).

(1.28)

Thus (1.28) gives the rule for selecting the coefficients so that the norm is a minimum. When this choice is made (1.29) The left-hand side cannot be negative and so N

N

n=1

n=1

IIfl1 2 ~ L I(f, lPn)1 2 ~ L Icnl 2

(1.30)

which is known as Bessel's inequality. An orthonormal set is said to be complete, if for every f E L 2 , there is a linear combination such that the L 2-norm of the difference is arbitrarily small. If (f, lPm) = 0 for every lPm of a complete orthonormal set all the coefficients Cm are zero so that the norm of the difference cannot be made arbitrarily small unless f = o. There is no loss of generality in assuming that the number of elements in a complete orthonormal set is infinite. Letting N -+ 00 in Bessel's inequality (1.30), we obtain 00

L I(f, 4>n)1 n=1

2

~

IIfll2

(1.31 )

which shows that the series on the left-hand side is convergent. Therefore

18


must tend to zero as m and n tend to infinity. It follows (from the Riesz-Fischer theorem) that there is agE L 2 such that

!~~ II g -

ttl (f, cPt)cPtll = o.

From the Schwarz inequality (1.22)

Hence II

(g, lPPfI) = lim II'" 00

L (/,. lP")(lP,,, lPPfI) = (/, lPPfI)·

k= 1

Consequently, (g - f, tPPfI) = 0 for m = 1, ... and since the orthonormal set is complete our earlier remarks entail f = g. We may summarize this by saying: if tPl' tP2' · · · is a complete orthonormal set every / E L 2 can be expressed as 00

/ =

L (f, tP")tP,,,

"=1 the equality being understood to mean that lim 11-+00

Ilf- ,,=i

1

(1,

cPt)cPtll = o.

It follows from (1.29) that, for a complete orthonormal set, implies that

IIfl1 2

f = Ltex>= 1 c"tP"

ex>

= L Ic,,1 2 •

(1.32)

"=1

If 9 = L~= 1 b"lP" and we apply (1.32) to f the identity

+ g, / -

g,

f + ig, f - ig then, from

IIf + gll2 - IIf - gll2 + illf + igll 2 - illf - igll 2 = 4(f,g), is derived Parseval's formula

(f, g)

=

ex>

L c"b:.

"=1

Given a set of linearly independent elements e l' '" 2' •.. which will approximate any / E L 2 arbitrarily close in L 2-norm we can always manufacture a complete orthonormal set by a method known as the Schmidt process. First define tPl by

tPl = "'1/11"'111· Then pick lP2 = g2/1tg211 where 92 = "'2 - ("'2' tPl)tPl; 92 cannot be zero because and are linearly independent. Clearly (tP2' tPl) = O. In general, tPlI = gllill gil II

"'1

"'2

INTERPOLATION

AND APPROXIMATION

19

where

It is important to observe that in the whole of the preceding discussion concerning the minimization of the norm we have not used the specific form (1.19) but only properties of the inner product such as (1.20), (1.21), (1.22), and (1.24). Therefore we can draw the same conclusions if the inner product is defined in another way so long as it has the properties (1.20), (1.21), (1.22), and (1.24). For instance, if we choose M

(f, g) =

L

i= 1

f(xi)g*(X i)

for some fixed Xi we can easily verify that the properties are valid and so we 2 may deduce that II 1 1 «», II or L~ 1 I/(Xi) 1 cn4>n(Xj)1 is a minimum when

L:=

L:=

M

en

= (I, 4>n) = L

i= 1

f(Xi)4>:(Xi ) ·

It is this kind of problem which arises in fitting data at a discrete number of points by the method of least squares. Note that it is frequently a computational advantage to employ orthonormal polynomials for least squares rather than expansions in non-orthogonal functions because the matrices tend to be diagonally dominant even when round-off error is present. Another possibility is to take

(f, g) =

Lb w(x)f(x)g*(x) dx

where w is a real non-negative function. This corresponds to varying the contribution from the various parts of the interval according to the weight function w. In this connection there is the following interesting result: THEOREM 1.5. If cPt, cP2' ... is an infinite orthonormal set of polynomials on the finite interval [a, b] with weightfunction w, i.e.

then the orthonormal set is complete. Proof. Theorem 1.4 ensures that, for continuous j', there is a polynomial p(x)

such that

If(x) - p(x)1
1 = l/n 1 / 2, 4>2 = (2/n)I/2 x, 4>3

= (2/n)I/2(2x 2 - 1),...

which are multiples of the Chebyshev polynomials. The Chebyshev polynomial T" is defined by T,,(x) = cos(n cos -1 x)

= n!( -2)" (1 _ (2n)!

X 2)l/2

~ [(1 dx"

_

X 2)"-l/2].

Some examples are To(x)

= 1,

T1(x)

= x,

T2(x)

= 2x 2 -

1, T3(x)

= 4x 3

-

3x.

INTERPOLATION

AND APPROXIMATION

21

The term of the highest power in 1'" is 2n - 1 x", The Chebyshev polynomial has a celebrated property concerning the maximum norm, namely 1.5a (CHEBYSHEV). Of all polynomials ofdegree n in which the coefficient of the highest power is unity the one with the smallest maximum norm on [ -1, 1] is T,,(x )/2n - 1 and

THEOREM

Here the notation IlfII 00 is employed to signify the maximum norm, i.e. sup over the appropriate interval which, in this case, is [ -1, 1].

IfI

Proof. Assume that there is a polynomial Pn(x) of degree n and with leading coefficient unity which is of smaller maximum norm than T,,(X)/2"-1. Let q(x) = Pn(x) - Tn(x)/2 n -

I

.

Then q is a polynomial of degree at most n - 1. Since P« has a smaller norm than 1;./2"- 1, q must be negative at the maxima of Tm/2" - 1 and positive at the minima of T,,/2 n - 1. Now putting x = cos B, 1',.(cos 8) = cos nO so that ~(x) has zeros at x = cos{(2k - 1)rc/2n} for k = 1, 2, ... , n and therefore possesses n + 1 maxima and minima on [ -1, 1]. Hence q must vanish at least n times which is contrary to its being a polynomial of degree n - 1. Thus the first part of the theorem is proved and the second part follows from the form of T" when x = cos o. Another way of expressing Theorem 1.5a is to say that of all polynomials of degree n with maximum norm unity on [ -1, 1], T,,(x) has the largest leading coefficient, namely 2"- 1. Series of Chebyshev polynomials can be readily summed on the computer by taking advantage of the recurrence formula

1;, + 1(x) - 2xT,,(x) For instance, if

+

1;,-1(x) = O.

N

f(x)

= L

n=O

anT,,(x) ,

define bN + 1 = 0, bN=aN and then calculate bN-

h""

b, from

It follows from the recurrence formula for 1;, that !(x) = ao - b2

+ b,».

The round-off characteristics of this method are no worse than those of ordinary polynomial evaluation and the same number of multiplications is used. In fact, the method can be used for any system of polynomials Pn(x) which

22


satisfies a recurrent relation of the form Pn+ l(X) - P(X)Pn(X)

by putting

+

Pn+ l(X) = 0

and then N

L

n=O

anPn(x) = (ao - b 2 )po(x)

+

b1Pl(X),

Any power series can be expressed as an expansion in Chebyshev polynomials by employing formulae such as

1 = To(x),

x = Tt(x), X

3

X

2

= !{3Tt (x) +

= !{To(x)

+

T2(x)},

T3 (x)} .

It is often possible to reduce the degree of an approximating polynomial and thereby economize in computation by implementing the properties of Chebyshev polynomials. For example, if the function f is approximated by the polynomial Pn+ I where consider the polynomial Pn defined by

Pn(x)

= Pn+l(X) - an+ I 1',, +1(X)/2n.

Then Pn is of degree nand

Pn -

f = Pn+l

-

f -

n

a n+ 1 T,. +1(X)/2 .

Thus the error in fin does not exceed that in P« + I by more than an + 11;.+ 1(x )/2n. Since 11;.+ l(x)1 ~ 1 on [ -1, 1], this error can be quite small when a n + t/2n is small enough. In other words, truncation of the power series by removal of the higher powers by subtracting appropriate multiples of Chebyshev polynomials can lead to an effective measure of economization. Although the properties of Chebyshev polynomials have been described for the interval [-1, 1] they can be extended to other finite intervals such as [Xl' X2] by first making the substitution

Exercises 14. Express 1, x, ... , x 5 in terms of Legendre polynomials. 15. Find the polynomial of degree 2 which gives the best L 2 -norm approximation to eX on [0, 1].

INTERPOLATION

AND APPROXIMATION

23

16. The function f(x) was determined experimentally and found to have the following values 1.12 1.16 1.20 1.08 1.04 x: 1.00 9.17 9.32 9.00 8.82 8.63 f(x): 8.41

Find the polynomial of degree 2 which gives the best approximation in L2-norm. 17. By making the substitution (~2 + l)y - 1 x=----(~2 - l)y + 1

expresstan - 1 Y in terms of Chebyshev polynomials of x. If only those 1;. are retained for which n ~ 7 show that the recurrence relation method gives tan -1(1/~3) = 0.5235986. 18. By starting from the Taylor series for eX up to powers of x 5 show that Chebyshev truncation leads to

eX = (382 + 383x + 208x2 + 68x 3)/384 with an error of not more than one unit in the second decimal place on [ - 1, 1].

1.6 Rational approximation Although Weierstrass's theorem tells us that any continuous function can be approximated as closely as we like on a finite interval, the degree of the polynomial may be unduly high for a specified level of accuracy. Again, the presence of a singularity in the complex plane near the real axis may render polynomial approximation awkward. For these reasons it is worth considering whether a rational function will give better accuracy as an approximant than a polynomial. It has been suggested (see, for example, Hart et ale (1968» that for a given amount of computational effort rational functions give greater accuracy than polynomials. Consider the possibility of constructing a rational approximation to f in a neighbourhood of the origin-there is no loss of generality in selecting the origin since any other point can be converted to it by a simple change of variable. We try Pm(x)/q,.(x) where Pm and q,. are polynomials of degree m and n respectively, and are supposed to have no common zero since, otherwise, it could be cancelled. One method of specifying Pm and q,. is to require that Pm/q,. and its first m + n derivatives agree with f and its first m + n derivatives at x = 0; it is then called a Pade approximant. For example, for a Pade approximant to In(1 + x) with m = 2 and n = 2 we would want the coefficients in (a o

+ a1x + a2x2)/(bo + b1x + b 2x 2 )

chosen so that the expansion of the rational function near x = 0 was the same as x - x 2 /2 + .... To put it another way we wish to make as many powers of

24


x disappear from ao

+

alx

+ a2 x 2 -

(b o

+

b,»

+

b 2x 2)(X -

1X2 + ... )

as possible. Therefore, select

= 0, a l = bo, a 2 = b, - tbo, tb i + tbo = 0, -tb 2 + !b 1 - ib o = 0

ao

b2

-

so as to eliminate powers up to and including x 4 ; if we tried to remove x 5 we should find bo = b i = b2 = 0 which is obviously unacceptable. Since we have one more coefficient than equations we normalize by putting bo = 1. Then a l = 1, bl = 1, a2 = t, b2 =! and the Pade approximant to In(1 + x) is

x + -!x 2

1+ x

+ tx2 + x).

agreeing to powers of up to x 4 in In(1 Other Pade approximants can, of course, be constructed by choosing different values of m and n but, as a matter of practice, it is usually found that the best approximations are obtained by taking In = n or possibly m = n + 1 provided that f has a Taylor expansion at the origin. An alternative form of rational approximation may be derived from Obreschkoff's formula

t r-r

k=O

n!(m + n - k)! (x - X1)k (n - k)!(m + n)! k!

=

f

k=O

jlk)(X)

n!(m + n - k)! (x - Xl)k (n - k)!(m + n)! k!

+

1

(m

+

n)!

IX Xl

jlk)(X

1)

(x - t)m(x 1 - t)nf<m+n+ l)(t) dt

which may be verified by integrating the integral by parts m + n + 1 times. The integral is effectively of order (x - X 1 )m+n+1 and so can be ignored to a first approximation; its explicit form can be used to provide an estimate of the error made in such neglect. As an example let f(x) = x lJ and. Xl = 1. Then, dropping the integral, we have with m = n = 1 x lJ

-

t(x - 1)Jlx lJ -

1

= 1 + t(x

- l)Jl

or

x" =

as a rational

2 - Jl + JlX

x

+ (2 - lJ')x approximation valid near x = 1 for any real u. Jl

INTERPOLATION

AND APPROXIMATION

25

Pade approximants usually become increasingly inaccurate as [x] increases. So attempts have been made to minimize IPmlq" - fl over an interval. Something like the second algorithm of Remes (§1.4) can be constructed but the algorithm may not converge if the initial approximation is not sufficiently good and, in any case, the solution of non-linear equations is involved at each stage. A convenient method for evaluating rational functions is by continued fractions, which may also arise in other contents in numerical work. (Expansions for numerous functions in polynomials, Chebyshev polynomials, rational functions, and continued fractions can be found in Abramowitz and Stegun (1965).) To fabricate a continued fraction suppose we are given min. Divide m by n; let at be the quotient and P the remainder so that P n

m

1 nip

- = at + - = at + -. n

Divide n by p; let a2 be the quotient and q the remainder; then n

q

P

p

- = a2 + -

= a2

1

+ -. plq

Proceeding in this way we obtain m I l -=a l + =al +---n

a2+ a3

1 a2+--a3 + ...

+

...

More generally we can consider expressions of the form

bo+~~···. bl

+ b2 +

If the number of terms is finite it is called a terminating continued fraction. Otherwise,it is called an infinitecontinuedfraction and the terminating fraction f

j,

_

n

-

b0 +a-l - a2- ... -a" bl + b 2 + b;

is called the nth convergent. If Iimn -+ oo f" exists, an infinite continued fraction is said to be convergent. It can be proved that, if a, = 1 and the b, are integers, convergence is always secured. If f" = AniB" it may easily be verified that A" = b"A,,-t + a"A"-2'

(1.33)

+ a"Bn- 2 ,

(1.34)

B, = bnBn -

1

26


subject to A -1 = 1, A o = bo, B_ 1 = 0, Bo J,,+

1 -

f" =

= 1. Hence

-a"+ I B"-I(!n - f,,-I)/B"+

1·

If a, and bi are all positive, (1.34) indicates that 0 < an+ t B"-t/Bn+ t < 1. Thus /,.+ 1- in is numerically less than, and of opposite sign to, In - /"-1. Now, in

this case, bo is less than the continued fraction since part is omitted while the convergent bo + at/b t is greater than the continued fraction because the denominator is too small. Following this route we conclude that, when the a, and b, are positive, every convergent of odd order is greater than the continued fraction and every convergent of even order is less than the continued fraction; moreover

so that the convergents of odd order steadily decrease while those of even order steadily increase. These properties make continued fractions very convenient for computation. Since, for any rational function an equivalent terminating continued fraction can be manufactured (clearly, a terminating continued fraction in which a, and b, are polynomials is equivalent to a rational function), the continued fraction may be evaluated more economically, as far as the number of arithmetical operations is concerned, than calculating the numerator and denominator of the rational function separately and then dividing. For the conversion of series the following terminating continued fractions may be noted:

b" -b + i:

"

1

1

1

1

- + - + ... + - = - - - - - Ut U2 Un U 1 - U 1 + U2 1

X

x2

ao

aOa l

aoa l a2

(-

-Un - t

+

(1.35)

(1.36) U"'

)"x"

- - - - + - - ... + - - - aOa l · · ·

an

- - - - - _ ...

_-

a" - x

(1.37)

Infinite series may be handled via ~

i..J

where

tXo

= ao,

C("

tXo

a x" = -

"=0"

x - -tXtX- - - -tX2-

1- 1 +tX1X- 1 +

= a"/a n - 1 (n ~

(1.38)

tX2 X -

1). Alternate expressions can be derived by

INTERPOLATION AND APPROXIMATION

27

using the fact that the nth convergent can be written as a2

a3

clal C lC2 C 2C3 C"-lCna" L> b0 + -----···--clb l

n

for arbitrary non-zero

+ c2 b2 + c3b 3 +

c"b"

Cia

Exercises

19. (i) Construct the Pade approximant with m = n = 2 for eX in the neighbourhood of

the origin. (ii) Find the maximum norm of the difference between the Pade approximant and eX on [0, 1]. Compare your result with the polynomial of degree 5 obtained by the first algorithm of Remes with S the set 0(0.1) 1. 20. Find the Pade approximants with (i) m = 2, n = 2, (ii) m = 3, n = 2 for sin x near the origin. 21. Use Obreschkoff's formula to obtain the approximations (i) In(1 + x)

=-

=

6(x

+ 2) 2 2 (2x + 1)

3x - 3),

-

x2

1

(11..) ex

x(x

1+-x+-

2 1 1 - -x 2

_12 x2 +12

near the origin. How does (ii) compare with 19(i)?

22. Find a, b, and c so that

+bxl max ex -a1 + ex

O~x~ 1

I

is a minimum. Compare the corresponding Pade approximant with m = n = 1. 23. Calculate successive convergents to . 1 1 1 1 1 (1) 2 + - - - - - - . 6+ 1+ 1+ 11+ 2

..

1111111

(11) - - - - - - - - .

2+ 2+ 3+ 1+ 4+ 2+ 6

24. A metre equals 1.0936 yards. Find limits to the error in taking 222/203 yards as equivalent to a metre. 25. Show that

x2 x 3 1- 3- 5X

(i) tan x = - - _ ... ,

+x 1- x

(ii) In I

= ~ ~ (2X)2 (3X)2 1- 3- 5-

7-

...

28


26. The numerator and denominator of a rational function, both of degree n, are expressed in terms of Chebyshev polynomials. Obtain the formulae converting it to a continued fraction of the form

b1

b2

°0+------

+ X+ a2 + X+

at

1.7 Trigonometric interpolation The approximation of a function tao

+

f

on [0,21t] by a series of the form

N

L (an cos nx + bn sin nx)

n=O

is a particular case of the general theory developed in §1.5. Nevertheless some of the formulae are of interest and will be needed subsequently. By the general theory the best L 2 -norm approximation to f is obtained when an = an and b; = Pn where (Xn

= (l/n)

Pn = (1/n)

1 1 2K

0

f(x) cos nx dx,

2K

0

f(x) sin nx dx.

The coefficients a" and B; are, of course, those which would occur in the infinite Fourier series representation of f. This infinite series may not converge to f but, if f has only a finite number of discontinuities which are finite jumps, the series converges to !{f(x + 0) + f(x - O)} at interior points and t{f(O + 0) + f(21t - O)} at x = 0,21t (when f is piecewise smooth). However, since at the moment we are concerned with finite trigonometric series the problem of convergence does not arise. Suppose now that we ask that the trigonometric expansion be specified not by the L 2 -norm but by being required to agree with f at certain points. Let the points be chosen as kh (k = 0, 1, ... , M) where M is a positive integer and h = 21t/M. Then we try to find an and b; so that tao

+

N

L

n=1

(an cos nkh + b; sin nkh) = f(kh)

(k

= !{f(O) + f(21t)} Now M

einh _ ei(M+ l)nh

k= 1

1 - e,nh

L einkh =

unless e inh = 1. But e iM nh

= 1, since

= 1, .. . ,M -

(k = 0, M)

1)

(1.39)

.

n is an integer and so the series is zero if

29

INTERPOLATION AND APPROXIMATION e i nh

;f: 1. If, however,

e

inh

M

L k=l

e

= 1, each inkh

term in the series is 1 and so

=M

(if nlM is an integer)

=0

(otherwise)

since n/M being an integer is the condition for we see from (1.40) that M

L

i nh

= 1. If m and n are

ei(m+n)kh

=M

(if (m + n)/M is an integer),

ei(n-m)kh

=M

(if (n - m)/M is an integer)

k=l M

L

e

(1.40)

integers

k=l

and otherwise the sum of each series is zero. With cos nkh cos mkh =

t91{e i(m+n)kh

~

+

denoting the real part

ei(n-m)kh}

and hence M

L

k=l

cos nkh cos mkh = 0

or

1M

or

M

(1.41)

according as (a) neither (n + m)/M nor (n - m)/M is an integer, (b) one but not both of (n + m)/M and (n - m)/M is an integer, (c) both (n + m)/M and (n - m)/M are integers. Similarly, from

LM sin nkh sin mkh =

k=l

f cos nkh sin mkh

k=l

1

~-

M L {

2 t =1

= .F!

f

2 k= 1

ei(n - m)kh -

{ei(n + m)kh -

ei(n

+ m)kh }

,

ei(n - m)kh}

we deduce that M

L

sin nkh sin mkh = 0

k=l

or

-tM

or

tM

(1.42)

according as (a) both (n + m)/M amd (n - m)/M are integers or neither is, (b) (n + m)/M is an integer but (n - m)/M is not, (c) (n - m)/M is an integer but (n + m)/M is not, and that M

L

cos nkh sin mkh = O.

(1.43)

k=l

Multiply the kth equation of (1.39) by cos mkh, where m is one of the integers

30


0, ... , N, and add. Then M-l

L

k=l

f(kh) cos mkh + t{f(O)

= k~l

+ f(2n)} cos 2nm

{tao + ntl (ancosnkh + bnSinnkh)}cosmkh.

(1.44)

Suppose how that M is even; select N = !M. Then, from (1.40), (1.41), and (1.43) the right-hand side of (1.44) is tMa", if m '# tM and MaN if m = tM = N. In a similar way the right-hand side of M-l

L

k=l

f(kh) sin mkh + !{f(O) =

+ f(2n)} sin Znm

k~l {tao + ntl (an cos nkh + b; sin nkh)} sin mkh

is tMbm when m =f; 0, !M. Thus the solution to our problem when M is even is N-l

tao + taN cos Nx + L (a,. cos nx + bPI sin nx) n=1

where N =

tM and 2

M

M

k=1

am = - L f(kh) cos mkh, M k=l 2 M b; = - L f(kh) sin mkh

(1.45) (1.46)

with the understanding that f(Mh) means !{f(O) + f(2n)}. It will be observed that there is no other solution since the coefficients am and bm vanish when f is zero in (1.45) and (1.46). If M is odd, an analogous procedure gives the expansion N

tao + L

(an cos nx

n=1

where N (1.46).

= t(M -

+

b; sin nx)

1) and the coefficients am, b". are still given by (1.45) and

L

The analysis of the inner product f(xi)g*(X i) in §1.5 demonstrates that, not only does the trigonometric polynomial agree with the function at the specified points, but also it is the same as would be obtained by the method of least squares in fitting the data by a trigonometric polynomial of degree N. Exercises 27a. Find the trigonometric interpolant on [0,2n] for f(x) that it is badly in error at the end-points.

=x

with M

= 4 and

show

SOLUTION OF EQUATIONS

31

27b. If f(x) = x(O ~ x ~ n), = 2n - x(n ~ x ~ 2n) obtain the trigonometric interpolant when M = 3 and when M = 10. Compare the graphs of the interpolants with the original function.

SOLUTION OF EQUATIONS 1.8 Solution of an equation Often one is faced with the problem of finding the values of x which satisfy an equation of the form (1.47) f(x) = O. Such a value of x is called a root of (1.47) or a zero of f. Since the number of equations which can be solved analytically is very limited, the devising of numerical techniques is of paramount importance. It is necessary to be aware right from the start that it will rarely be possible to find the roots of (1.47) exactly by numerical methods. There are several reasons for this. In the first place, unless f is a very elementary function, it will usually have to be replaced by some approximant-perhaps one of the types discussed in preceding sections. Such replacement is bound to introduce some error. Secondly, any computation will usually involve round-off error. Thirdly, any computer can carry only a certain set of rational numbers so that if the root of (1.47) is not a rational number or is a rational number outside the computer set its representation in the computer must inevitably be in error. Given that these sources of error are virtually inescapable it is vital to arrange that techniques produce answers which can be related to the 'roots of (1.47) and, in particular, do not supply more or less zeros of f than were originally present. Suppose that f is continuous for a ~ x ~ b and that f(a) and f(b) have opposite signs, i.e. f(a)f(b) < O. Then we know that f(x) = 0 has at least one root in [a, b]. In the bisection method we aim to locate a root by taking a sequence of intervals, each half the size of the previous one and each containing a root. The actual algorithm is: Define ao = a, bo = band then form the numbers at, b t , a2 , b2 , ••• successively

by the following procedure. Put

= -!(a r - t + br - 1 ) f(c r) = 0 then x = c, is the c;

and calculate f(c,). If root sought. If f(c,) =F 0 then either (i) f(cr)f(a r - 1 ) > 0 and then we define a; = c, b, = br - H or (ii) f(c,)f(a r-· 1 ) < 0 and then we define a, = ar-I' b, = c.. Stop the process when lar - brl ~ E, where E is some pre-assigned number. In general e is selected so that desired accuracy is attained or so as to keep the number of iterations down to a specified level. The convergence of the process is governed by Theorem 1.8.

32


THEOREM

1.8. Under the conditions of the algorithm

(i) b, - a, = (b - a)/2'

and, if X o is the root of f(x) = 0, (ii) Ixo -1{a, + b,)1 < t(b, - a,) < (b - a)/2'+

1.

Proof If (i) of the algorithm applies b, -

Q,

= b'-l

b,. -

Q,.

= c,. - a,.-1

-

C,

= t(b'-l - a,.-I).

If (ii) applies

= ! o. If Xo - a < Xl < Xo + a the iteration (1.50) has the properties (i) X o - a < x, < X o + Q, (ii) lim,-+ 00 x, = Xo (iii) !x,+ 1 - xol ~ M'!x 2 - x 1 1/(l - M).

THEOREM

The result (i) ensures that all iterates stay within the given interval while (ii) shows that the iteration converges to the root. An estimate of the distance of an iterate from the root is supplied by (iii). Proof Assume firstly that, for some r, Xo - a < x, < Xo

+ a. Then

IX'+l - xol = IF(x,) - F(xo)1 ~ Mix, - xol

(1.53)

from (1.52). Hence Ix,+ 1- - xol < a. Therefore, if the result is true for r it is true for r + 1. Since IX 1 - xol < a, the validity of (i) follows by induction. Inequality (1.53) implies that IX'+l -

whence lim,-+oo !x,+ 1 Further

-

xol ~ M'!x l

xol

-

xol = 0 and (ii) is proved.

IX2 - xol = IF(x1) - F(X2) + F(X2) - F(xo)1

~

Mix! - X2! + Ml x2 - xol

so that IX 2 - xol ~ Mlx l - x21/(1 - M). From (1.53) M,-11x2 - xol and the proof of the theorem is finished. THEOREM Xo

!x,+ 1

-

xol ~

1.8c. If F is continuous and differentiable on [x o - a, Xo + a] where ~ M < 1 then Theorem 1.8b holds and

= F(x o ), and IF'(x)1

· x,+ 1 - Xo -- F'( x ) . 11m o '-+00

X, -

Xo

35


Proof. We have already seen that the differentiability of F entails the conditions of Theorem 1.8b so only the last part needs proof. Now · x,+ I - Xo I1m x, - X o

= I'1m F(x,) x, -

r-« 00

'-00

F(xo) Xo

= F'( Xo )

from the definition of a derivative and Theorem 1.8b (ii). It should be remarked that Theorem 1.8c states that the iteration converges if IF'I < 1 but this does not imply that the iteration diverges if IF'I ;?; 1. In fact, we could permit F'(x o) = 1 without invalidating the theorem. More generally, if x - F(x) > 0 and F'(x) > 0 for a + X o ;?; x > Xo then a + X o ;?; x, > X o has the consequence x,+ 1 = F(x,) < x, while the mean value theorem x,+ 1

Xo

-

= (x, -

xo)F'(c,),

with c, between x, and x o, shows that x,+ 1 > xo. Therefore, if a + Xo ;?; x, > Xo, induction demonstrates that X o < x,+ 1 < X, for all r. Thus the sequence converges to a limit L ;?; x o. By continuity, L = F(L) and so L = xo. Thus the sequence converges to X o' Similarly, the conditions F(x) - x > 0, F'(x) > 0 for X o - a ~ x < X o give a sequence converging to X o if Xo - a ~ Xl < xo. Newton's method for finding X o so that f(xo) = 0 may be derived in the following manner. Let x, be an approximation to x o. Then f(xo) = j'(x,)

+

(xo - x,)f'(x,)

+ t(x o -

x,)2f"{x,

+

8(xo - x,)}

(1.54)

where 0 < 0 < 1. If X, is a good approximation to Xo, Xo - x, can be expected to be small and then, if f" is not too large, the last term can be neglected, i.e. f(xo)

~

f(x,)

+ (xo -

x,)f'(x,).

This will make f(xo) zero if xo - x,

=-

f(x,)/f'(x,).

In other words, if x, is an approximation to Xo, x, - f(x,)/ f'(x,) should be a better one. Calling this new approximation x,+ 1 we have the iteration formula x,+

1

= X,

-

f(x,) f'(x,)'

(1.55)

Note that if x, converges we expect its limit to be a zero of f if f' does not vanish there. In fact, the iteration will converge to a multiple zero as will be seen later. Sometimes to simplify the computation f'(x,) is replaced by f'(x 1 ) but we shall consider only the form (1.55).

36


The eqn (1.55) has the structure of (1.50) if F(x)

=x

- I(x)/I'(x).

Hence F'(x) = !(x)!"(x)/{/'(x)}2

and Theorem 1.8e tells us that Newton's method converges to a simple zero of if 111"/f,21 < 1 in a neighbourhood of the zero. Since f is small near zero, the basic assertion is that the method will converge if x I is close enough to the zero. However, it must not be concluded that, if Xl is closer to one zero than another, the iteration will necessarily converge to the nearby zero. For example, the iteration for

I

f(x)

= (x

- l)(x

+

1)3

will converge to - 1 if x I = ! even though x 1 is closer to 1 than - 1. A modification of Newton's method is Cauchy's method in which (J is placed equal to zero in (1.54). Then x,+ 1 is chosen as the root of !(x,+

1 -

X,)2!"(x,) + (x,+

1 -

x,)f'(x,) + f(x,)

=0

for which x,+ 1 - x, has the smallest modulus. The obvious disadvantage of Cauchy's method is that it requires the calculation of two derivatives as well as the solution of a quadratic equation. An iteration scheme which is a generalization of the secant method is Muller's method. For this, three starting values, say Xl' X2' and X3' are necessary. Then one constructs a polynomial of degree 2 which has the values f(xI)' f(x2)' and f(X3) at Xl' X2' and X3 respectively. The polynomial has two zeros; choose the one X4 for which IX4 - x31 is smallest. Then repeat the process starting with X2' x 3, and x 4 • The polynomial always possesses a root unless f(x,) = f(x r + 1) = f(x, + 2) when it represents a straight line parallel to the x-axis. Hence, provided that this situation is never met, the iteration can proceed. The advantage of Muller's method over Newton's is that no computation of a derivative has to be undertaken. Also Muller's method offers the possibility of finding complex roots, which are excluded by Newton's method when f is real. To discuss the convergence of an iterative process we say that, if

· [x, + I - xol I1m [x, - xol P

r-+ 00

=b

where b is finite and non-zero, the iterative method is of order p. If sup ,~s

IXr + 1

IXr -

-

xol xol

=B

37


we have Ix,+s+

xol ~ Blx,+s - xoI P

1 -

~ B 1 + PIX,+s-l - xoI

P2

and, continuing in this, we obtain where

c = 1 + P + p2

+ ... +

p'- 1 •

If p = 1, c = rand (1.56) whereas, if p

-:1=

1, C = (p' - 1)/(p - 1) and

IX,+s+

1 -

1 l ) {B1 /(P- l )1x s+ I ~ B 1 /(p-

Xo ~

1 -

Xo 1}P" ·

(1.57)

It is evident that, if p = 1, convergence is relatively slow and only certain if B < 1. On the other hand, if p > 1 and Ixs + 1

-

xoIB1/(p-l)

0). (i) If F'(x) exists on Xo - a < x < Xo + a and F'(x o) # 0, the iterative method is of order 1. (ii) If F'(x o) = 0 and F"(x) is continuous on Xo - a < x < Xo + a, then the iterative method is of order 2 if F"(x o) =1= o.

THEOREM

Xo -

Proof. As in Theorem 1.8c

I

lim x,+ 1 - Xo x, - X o

''''00

I= IF'(xo)1

so that, when F'(xo) =F 0, the method is of order 1. In case (ii) lim,-+fX) x, = Xo implies that all x, from some r onwards will certainly lie between Xo - a and X o + a. For such r Taylor's theorem gives

38


where c, is between x, and Xo' Since F'(xo)

lim ,~CX)

= 0,

IX'+l-X~1 = lim IF(X,)-F(~o)1 (x, - xo)

= lim

because F" is continuous and c, of the theorem is complete.

-+

(x, - xo)

,~CX)

ItF"(c,)1

Xo since x,

-+

Xo' Since F"(xo) :1= 0 the proof

In Newton's method F'(x o) = 0 and F"(xo) = f"(xo)/f'(xo). Therefore, if f"(x o) =F 0, Newton's method is of order 2 for a simple root provided that f'" is continuous on an interval including Xo' If Xo is a q-fold root where f(xo) = f'(xo) = . · · = f(q - l)(XO) = but f(q)(xo):F 0, Newton's method may still be shown to converge when f(q) is continuous in a neighbourhood of Xo' First, observe that

°

(x, - xo)f'(x,) - f(x,) x ,+ 1 - x 0 -- - - - ----. f'(x,)

By Taylor's theorem

f(x,) = (x, - xo)qf(q)(el)/q!, f'(x,)

= (x,

-

XO)q-l f(q)(e 2 )/(q

where both

e1and '2 lie between x, and Xo' Hence

As x,

'1

--+

Xo,

-+

Xo and

'2

x,+ 1

- I)!

-+

Xo so that, from the continuity of j(q),

-

Xo

~

(x, - x o)(1 - l/q).

This demonstrates that the convergence is much slower than in the case of a simple root and can be very slow indeed if q is large. For a multiple root the convergence of Newton's method can be improved by adopting the formula (1.58) X,+l = x, - qf(x,)/f'(x,). Using the same technique as just above but taking one extra term in the Taylor expansions we obtain X,+l -

xo =

(x, - XO)2 f(q+ l)(X O) q+ 1 f(q)(x o)

39


so that the method is of order 2. However, one should be warned that if (1.58) is employed near a simple root convergence may fail. It can be demonstrated that the secant method is of order 1.62 approximately and Muller's method of order 1.84 approximately. A standard scheme for accelerating the convergence of an iteration procedure is Aitken's c5 2 -method. In this method, starting from x, we generate Y,+ 1 = F(x,), Y,+2 = F(Y,+l) and then define Xr + 1

= Y, + 2

-

(Yr+ 2 Y,+2

-

Yr+ 1)2

+ x, -

2Y'+1

•

Analysis reveals that this scheme is of order 2 if F'(xo) ~ 1 and neither F'(x o) nor F"(xo) is zero. If F'(x o) = 1 the scheme is of order 1.

Exercises 28. Use the bisection method to solve (i) 8x3 - 4x - 5 = 0, (ii) 2x = tan x, correct to two decimal places. 29. On 0 ~ x ~ !, f(x) = ! and on ~ x ~ 1,

t

f(x)

= 6x -

1 - 6x 2 •

Obtain the value of c;+ 1 in the method of false position. 30. Solve 3 sin x = 2 correct to three decimal places by the secant method. 31. Use Newton's method to find.J7 correct to 2 decimal places, starting from Xl = 3. 32. Obtain by Newton's method a root of (i) x 3 - 2x 2 - 5x + 10 = 0, starting from Xl = 3, (ii) x 3 - 6x 2 + 13x - 9 = 0, starting from Xl = 2. 33. Find the root of 27x 3 + 18x - 25 = 0 between 0 and 1 using the iteration

checking whether Theorem 1.8c is satisfied. Is the iteration better? 34. Examine the iterations (i) (ii)

35. 36. 31. 38.

Xr + 1

Xr + 1

= (x;

=b-

xr+ 1

= (15 -

27x:)/18

+ c)/b, (cjx,)

as possible schemes for determining the larger root of x 2 - bx + c = 0 when b > 0, !b 2 > c > O. What happens when Newton's method is applied to x 2 - 2x + 2 = O? Solve x 3 = 3 by Cauchy's method starting from Xl = 3. Find a root of sin x + 2 = x by Muller's method starting with Xl = -1, X2 = 0, X3 = 1. If F(x) = x + h(x)f(x) find h so that the iteration method is of order 2.

40


39. To calculate

Ja when a> 0 the following iteration is suggested: Xr + 1

=

x;

+ 2

3x r

3ax r

+a

.

Show that it is of order 3. 1.9 Systems of non-linear equations

The solution of simultaneous non-linear equations is complicated and we shall be content to describe how Newton's method can be generalized. Suppose the values of x and yare required which simultaneously satisfy f(x, y)

= 0,

y(x, y)

= o.

By Taylor's theorem, if we neglect second orders, f(x r+ l' Yr+ 1) = f(x r, Yr) + (Xr+ 1

-

g(Xr+l' Yr+ 1) = g(Xr, Yr) + (Xr+1

-

+ (Yr+ 1 Xr)gx + (Yr+ 1 -

xr)fx

Yr)fy, Yr)gy

where fx denotes the partial derivative af/ax and all the partial derivatives are evaluated at (x., y,'), If we hope that (x r+ H Yr+ 1) is close to a zero we want the left-hand sides to be zero. This can be arranged by putting

= x, + (yfy - fYy)/J, Yr+l = Yr + Ue, - gfx)/J

Xr+ 1

(1.59) (1.60)

where J is the Jacobian defined by J

= Le, -

I,gx.

Eqns (1.59) and (1.60) constitute the generalization of Newton's method to two equations, all quantities on the right-hand side being calculated at (x., Yr).

MATRICES 1.10 Matrices It is assumed that the reader has some acquaintance with the theory of matrices so that the treatment here will be somewhat cursory (see, for example, Liebeck (1969)). A general matrix consists of mn entries arranged in m rows and n columns, giving an m x n array, to be denoted by a capital letter such as A:

A=

41

MATRICES

The symbol aij denotes the element in the ith row and jth column and often we shall abbreviate the notation by writing A = (aij). The matrix is called square and of order n if m = n. If n = 1 so that the matrix consists of a single column we shall call the matrix a column vector and signify its special nature by using bold type, e.g.

a=

The elements au for i = 1, 2, ... , n in a square matrix are said to be the diagonal elements. The elementary rules of combination are:

= B if and only if aij = A + B = (au + bij)' A

bi j all t.i

aA = (aaij)'

Multiplication of A and B is possible only if A has the same number of columns as B has rows. If A is m x nand B is n x p then AB

=(

±

k=l

aikb ki)

the result being an m x p matrix. In general, two matrices do not commute, i.e, AB '# BA even if both are square. The unit matrix I of order n is a square matrix all of whose elements are zero except the diagonal ones which are unity. Thus Al = A. The transpose of a m x n matrix A = (ajj) is the n x m matrix whose ijth element is aji' The symbol AT will be used to indicate the transpose. Note that the transpose aT of a column matrix will be a row matrix, i.e. a matrix whose elements lie in a single row. There is no difficulty in verifying that (A + B)T = AT + B T, (AT)T = A, (AB)T = BTA T. If A is m x n and x is a column matrix with n elements Ax is a column matrix whose ith element is PI

L aijxj' j=l Observe that xT AT is a row matrix. If B is a n x m matrix such that BA = I then B is called a left-inverse of A. Similarly, if C is n x m and AC = I then C is called a right-inverse of A. Suppose A is square and has both a left-inverse and a right-inverse then

B

= BI = (B(AC) = (BA)C

= IC = C.

42


Thus there is only one left-inverse and only one right-inverse and both are equal. This unique matrix is called the inverse of A and denoted by A-I. Clearly, (A- 1)-1

= A, (AB)-1 = B- 1A- l

but, in general, (A + B)-1 =f. A-I + B- 1. A matrix is called symmetric if A = AT and anti-symmetric if A = _AT. A matrix such that A-I = AT is known as orthogonal. From now on we shall be concerned primarily with square matrices A. It will therefore be assumed that A is square and of order n unless otherwise is specifically stated. It is known that the equations Ax =0

possess a solution with x =F 0 if and only if det A determinant of the matrix. The quantities Ai such that

= 0, where

det signifies the

(1.61) has solutions

Xi

=F 0 are called the eigenvalues of A. The

det(A - l.!)

Ai

are solutions of

=0

and are therefore n in number, though some of them may be multiple roots. Since the determinant of the transpose of a matrix is the same as the determinant of the original matrix det(A T - AI) = O. Consequently, there are Yj

=1=

0 such that ATYj

= A.jYj

(1.62)

Hence A and AT have the same eigenvalues. Multiply (1.61) by yJ and (1.62) by xi and subtract. Then YjTA Xi

-

TAT Xi Yj

= AiYjT Xi 't

-

.. T )"jX i Yj'

The left-hand side vanishes and so

(Ai - A.j)yJX

i

= o.

If Ai =1= A. j then yJ Xi = 0, i.e, the eigenvectors of A and AT corresponding to distinct eigenvalues are orthogonal. Moreover, the eigenvectors corresponding to distinct eigenvalues of A are linearly independent. Suppose, to the contrary, that s are linearly dependent and that any smaller number are linearly independent. Then ~lXl

where all the

(Xi

+ ... +

(XsX s

=0

are non-zero. On multiplying by A we obtain (XIA.1 X 1

+ ... +

ex)sxs = O.

(1.63)

43

MATRICES

If At = 0, s - 1 vectors would be linearly dependent contrary to our hypothesis. If At =F 0 multiply (1.63) by At and subtract; then ~2(A2

-

)"1)X 2

+ ... +

~s(As

-

AI)X s =

O.

Since Ai - At =F 0 for i = 2, ... .s this gives a linear relation between s - 1 vectors. Again, a contradiction occurs and the statement is proved. One consequence is that, if A has n distinct eigenvalues, yT Xi :F O. For, if this were not true, Yi would be orthogonal to the n independent vectors X h · · • , x, which is impossible because Yi =F O. It is therefore always possible to select Yi so that yJ Xi = 1. Moreover, if A has n distinct eigenvalues, define X as the matrix with columns Xi' i.e. x = (x., X2 , • • • , xn ) . Then, with Yi picked so that Y;Xi = 1,

because of the orthogonal relations. Hence X

-1 AX

= X -t(A I Xl' A2 X 2 ,

••. , AnX n)

= diag(Ai)

(1.64)

where diag is used to denote a diagonal matrix, i.e. a matrix whose non-diagonal elements are all zero. Two matrices A and B are said to be similar if there is a non-singular matrix R (i.e. det R =F 0) such that B = R - 1 AR. Sometimes, A is said to have undergone a similarity transformation. The eigenvaues of similar matrices are the same because Ax = ).. X can be written as (R-1AR)R-1x

= AR-1x

showing that R - IX is an eigenvector of R -1 AR. What has been demonstrated above is, if A has n distinct eigenvalues, that A is similar to a diagonal matrix whose entries are the eigenvalues of A. If A is also symmetric then Yi = Xi and X-I = X T so that, in this case, there is an orthogonal similarity transformation converting A to diagonal form. If A is symmetric with multiple eigenvalues it can be shown that there is still an orthogonal similarity transformation which changes A to diag(Ai). If, however, A has multiple eigenvalues but is not symmetric the situation is more complicated. What can be demonstrated is that there is a non-singular R such that (1.65)

44


where J is the Jordan canonicalform of A and has the following structure: J is a block-diagonal matrix

o J=

where each

~

o

s;

is either the number Ai or a matrix of the form

o (1.66)

~=

o which is an upper triangular matrix since all the elements below the diagonal are zero. The Jordan canonical form is the most compact to which a general matrix can be reduced by a similarity transformation. The same eigenvalue may occur in different J;, but the total number of times that a given eigenvalue occurs in the diagonal of J is the same as the multiplicity of the eigenvalue. The number of linearly independent eigenvectors of A is k, i.e, the number of Jordan blocks in the canonical form. In particular, if ~ is m, x m, and r, is the ith column of R then r l , rm 1 + l' ... ,rm 1 + .. . mlc-l + I are the eigenvectors of A. If the elements of A are changed continuously, then det(A - AI) varies continuously and so the eigenvalues of A change continuously. In general, however, the eigenvectors do not alter continuously. If p(t) = ao + alt + ... + amtm is a polynomial in t, a corresponding matrix polynomial p(A) can be defined by p(A)

= ao +

alA

+ ... +

amA m

where, of course, A' = AA,-l. It is immediate that any eigenvector of A is an eigenvector of p(A) with eigenvalue p(A i ) . If A has an inverse the eigenvalues of A -1 are l/li. If the elements of A are complex, the matrix A * is obtained by replacing each element of A by its complex conjugate. Write AH = A*T. Then a matrix is said to be unitary if AHA = I. It is called Hermitian if A H = A. Note that the Hermitian matrices include the real symmetric matrices. If A is the eigenvalue of a Hermitian matrix A so that Ax = AX then (1.67)

45

MATRICES

Now (x"Ax)" = (x" A*X*)T = x"A"x = x"Ax so that x"Ax is real. Since x"x is real and non-zero it follows that A is real, i.e. the eigenvalues of a Hermitian matrix are real. Furthermore, if Xi and Xj are two eigenvectors, A;xfx; = xfAx i = (xr Ax j)"

= A.j(xrXj)H = ;"jXfX i

from which we deduce that X7Xi = 0, i.e.the vectors are orthogonal, if A. i i= Aj • It can be shown that if A is Hermitian there is a unitary matrix U such that

= diag(A;).

UHAU

(1.68)

Moreover, the eigenvectors can be arranged to be mutually orthogonal. Consequently, any vector y can be expressed in the form n

y

L i=

=

I

Hence

aix i· n

n

yHAy

= yH L

aiAiX

i= 1

=L

Ail ail2

i= 1

since X7Xi = 0 (i :f:. j) and the magnitude may be made to satisfy xrx i = 1. Put the eigenvalues in order so that At ~ A2 ~ ••• ~ An. Then

An

n

n

i= I

i= 1

L la;\2 ~ r" Ay ~ z, L la;\2.

In other words, for arbitrary y, AnyH y ~

r" Ay ~ )vlY"y

(1.69)

when A is Hermitian and AI' An are the least and largest eigenvalues respectively of A. A Hermitian matrix is said to be positive definite if x"Ax > 0 for every x :f:. 0 and positive semi-definite if x"Ax ~ 0 for every x :f:. O. A deduction from (1.67) is that a Hermitian matrix is positive definite if, and only if, all its eigenvalues are positive. It is positive semi-definite if and only if all its eigenvalues are non-negative. A measureof the eigenvalues of a matrix is provided by the trace Tr A definedby Tr A

Obviously

= all + a22 + ... + ann' (1.70)

+ B)

(1.71)

Tr(A

Also Tr(AB)

=

n

= k Tr A, = Tr A + Tr B.

Tr(kA) n

L L

i=1 k=l

aikbki

=

n

n

L L

k=1 i=1

bkiaik = Tr(BA).

(1.72)

46


A deduction from (1.72) is that Tr(R-tAR)

= Tr(ARR- t) = Tr A.

It therefore follows from the Jordan canonical form (1.65) and (1.66) that Tr A

= At + A2 + ... + An.

(1.73)

Exercises 40. Find the eigenvalues, eigenvectors, and Jordan canonical form of

(i)

C~)

(ii)

(

-10 0) 1 0

-1

1 1 -2

41. If A and B are symmetric prove that AB is symmetric if and only if AB 42. Show that A and AT are similar. 43. If det A ::1= 0 prove that AHA is positive definite. 44. Prove that the eigenvalues of Am(A + !J.I) -1 are )..i(A i + !J.) -1 given !J. any i. 45. Prove that the eigenvalues of

1

1 + - cos 2'+ 1 2'

-2' sin 2'+1

-2' sin 2'+ 1

1 1 - - cos 2'+

2'

= BA.

*- -

Ai for

1

are 1 ± 2-'. Deduce that the eigenvalues tend to 1 as r -+ 00 but that the eigenvectors do not have a limit. 46. A is real positive semi-definite and R is an orthogonal matrix such that R T AR = diag(Ai). If B = diag(.JAi) and C = RBR T prove that C 2 = A so that a square root of A may be defined as A 1/2 = C. 47. Show that the Hermitian matrix A is positive semi-definite if and only if there is a matrix B such that A = BB H• 48. Show that (i) Tr(AA H ) > 0, (ii) if A is anti-symmetric Tr A = O.

1.11 Matrix norms The modulus of a complex number gives an idea of its size and it is desirable to have a single number which plays a similar role for matrices and vectors. This quantity will be known as a norm (see also §1.5). We define the norm in terms of its properties, and not by means of a specific formula. In this way it is possible to define many different kinds of norm associated with a vector. In fact, any formula for the norm [x] of a vector x will be acceptable if it has the properties (a) [x] > 0 if x #- 0; [x] = 0 only if x = 0; (b) II(Xx II = 1(Xlllxll for any complex number et; (c) IIx + YII ~ [x] + IIYIl·

47

MATRICES

If x has elements Xl' ... , x, standard norms are the lp-norms defined by (l~p 0 if X =1= 0; (x, x) = 0 only if X = 0; (b) (x, y) = (Y, x)*; (c) (x + y, z) = (x, z) + (y, z), (ax, y) = a(x, y). An inner product supplies a norm via [x]

= (x, X)I/2 and the Schwarz inequality

I(x, y)\ ~ IlxllllYII

always holds. The 12-norm is often known as the Euclidean norm since it stems from the inner product x"y. Note that in inner product notation

x"Ay = (x, Ay) = (A"x,

y).

A matrix norm can also be introduced by asking that it has the properties (a) (b) (c) (d)

IIAII > 0 if A =1= 0; IIAII = 0 only if A = 0; IlaA II = lalll A" for any complex number et; IIA + BII ~ IIAII + IIBII; IIABII ~ IIAIIIIBII.

If II II' is a matrix norm and are said to be compatible if

II II is a

IIAxl1

vector form, the matrix and vector norms ~

IIAI/'I/xl/

(1.74)

A matrix norm can be constructed from a vector norm by defining

IIAII =

sup

lIxll =

1

IIAxll;

(1.75)

such a matrix norm is said to be subordinate to the given vector norm. It is obvious that the subordinate norm is compatible. From (1.75) can be seen by putting A = 1 that any subordinate norm has the property III II = 1. From now on the only matrix norm which will be considered is the one

48


specified by (1.75). Corresponding to the vector lp-norms we have n

IIAIII = max

L

laul,

(1.76)

IIAILx> = max

L laul,

(1.77)

l~j~ni=l

n

l~i~nj=l

(1.78) where

u is the largest eigenvalue of AHA. Sometimes IIAII2 is known as the

spectral norm of A. From (1.76) and (1.77) IIAIII = IIAT ll oo. To prove these results we remark that

This inequality shows that the norm certainly does not exceed the value given in (1.76). Moreover, if n

L

lau l

i= 1

is largest when j = k choose Xi = 0 (i :f:. k), = 1 (i = k) and then the value in (1.76) is actually attained and so (1.76) is proved. The proof for II A II 00 is similar except that in the last stage, if n

LI laul

j=

is greatest for i = k, we choose Xi = aki/lakil (aki =F 0), = 1 (aki = 0) to achieve the supremum. For IIAI12 we remark that IIAxll 2 = (x HAHAx)1/2 and the result follows from

(1.69).

If A is Hermitian, (1.78) implies that

IIAI12 = max lAd· I

~i~n

The spectral radius p(A) of a matrix is defined by p(A)

=

max I

~i~n

lAd.

Thus IIAI12 = {p(A HA)}1/2 which simplifies, if A is Hermitian, to IIAII2 In general, if x is an eigenvector of A,

IIAxl1 = IIAxl1 = IAlllxll

=

p(A).

49

MATRICES

which demonstrates via (1.75) that IAI ~

p(A)

~

IIA II, i.e. IIAII

(1.79)

for any norm of A. However, if the norms are badly chosen the norm and spectral radius need not be close; for example

but the spectral radius is zero. In contrast, it can be shown that there is always some norm which is arbitrarily close to the spectral radius. A useful theorem is: THEOREM

1.11. lim,_oo .4'

= 0 if and only if p(A)

n

L' laul

(i

= 1, ... , n).

j= 1

Hence A - Al is strictly diagonal dominant and hence, by Theorem l.llb, has an inverse. But this is impossible because A is an eigenvalue and the theorem is proved. One consequence of Gerschgorin's theorem is that p(A)

~ min (m~x ~ laijl; m~x ~ lau l) . I

J

J

I

Gerschgorin's theorem and (1.69) provide rules for locating the positions of eigenvalues. While they may not always be very precise they do at any rate limit the possibilities. A type of matrix often encountered with difference equations is a Stieltjes matrix which is a real positive definite matrix with all its off-diagonal elements non-positive.

Exercises 50. Prove that Ilxll ~ nl/xll oo and I/x1l ~ Jnllxl/oo. 51. If U is unitaryprovethat (i) II Uxll 2= IIx112' (ii) II UAII2 = IIA112' (iii) II UAU" 112 = IIAII2·

2

l

52. If a is real and A

=

G:)

show that

II A' "2 = a' [ 1 +

2 8r2 { ( a ~ 1 + 1 + 4r2

)1/2}]1/2

•

LINEAR EQUATIONS

51

53. Prove that maxlaijl ~ IIA 112 ~ n maxlaijl, the maximum being taken over all i andj. 54. Prove IIAlli ~ IIAlltllAlloo· 55. If a is real show that the spectral radius and spectral norm of

:

(~ :)

are equal only if a = o. 56. Show that (i) [p{(A"A)-I}]-1 ~lliI2~p(AHA), (ii) LI=t Ild2~{LI=t D=t laijI2}1/2. 57. For the tridiagonal matrix b 0 0

0

c a b 0

0

0 c a

b

0

0

c

a

b

0

0

c

a

a

..................

show that

Aj = a + 2(bc)1/2 cos{jn/(n + I)}. Check that this satisfies Gerschgorin's theorem. 58. In the tridiagonal matrix

d"-l

o

dn- l

an

d, = (b;C;)l/2 where b.c, > 0, a; ~ fb;l + Ic;- II (i = 2, ... , n - 1), al > Ibtl and an> Ic n- d. Prove that it is positive definite.

LINEAR EQUATIONS 1.12 Linear equations-direct methods The solution of the system of linear equations Ax = b where A is non-singular is simple in principle. In fact, the solution can be written as x = A -lb. When A -1 can be calculated easily by analytical means this is often satisfactory. However, for numerical work, it must be recognized that simple analytical formulae for A - 1 usually involve the ratio of determinants and the computation of determinants of order 4 or higher is a complex task. Therefore, if we are to realize efficient numerical methods we must seek other ways of finding a solution.

52


In practice, the systems of equations which arise are frequently of two types: (i) The matrix A may be of moderate order, say n < 100, and dense, i.e. nearly all its elements are non-zero; (ii) A may be of large order, say n > 1000, and sparse, i.e. it contains a large number of zero elements. The type of matrix is of prime importance in deciding on a method of solution. For dense matrices, direct methods are appropriate and will be described in this section. Sparse matrices should be treated by iterative techniques (see next section) and it may be possible to economize in computer storage by retaining only non-zero elements. Perhaps the most popular direct method for the numerical solution of a system of linear equations is Gaussian elimination. It has two parts-an elimination or trianqularization procedure and back substitution. Its principle is simple and can be illustrated by the problem of finding the unknowns Xl' x 2 , and X 3 in

+ x 2 + 4x 3 = 7,

XI Xl -

2x 2 + 6x 3

2x I +

Xl -

X3

= 15, = 7.

Subtract the first equation from the second; then subtract twice the first from the third. There results Xl + X2 + 4x 3 = 7,

-3x 2 + 2x 3 -X 2 -

= 8,

9x 3 = -7.

The first equation, which was used to remove X I from the other two equations, is known as the pivot equation and the coefficient of X I in it is called the pivot. Now, we make the new second equation the pivot equation and use it to eliminate X2 from the third. Thus, by subtracting t of the second from the last, we reach Xl + X 2 + 4X3 = 7,

= 8, -29x3/3 = -29/3.

-3X2

+

2X3

If these were written in matrix form, the matrix on the left would be upper triangular which explains why the process is sometimes called triangularization. Clearly, if we reversed our steps we should recover the original equations so the two systems are equivalent. The final step of back substitution is now undertaken. From the last equation X 3 = 1. Substituting this value in the second we obtain Xl = - 2. Then the first equation gives Xl = 5. The method can obviously be generalized to a system of n equations in n unknowns and is very easy to program. A simple count of the operations

53

LINEAR EQUATIONS

tn

involved reveals that about 3 multiplications and additions are required to solve a system. By taking b as a unit vector and by employing the special properties of unit vectors we find that the inverse of A can be found in n3 (and not !n 4 as might be expected) multiplications and additions. If mIl multiplications are required for a determinant of order n and additions are ignored, expansion in co-factors gives m,,+ 1 = (n + l)m n whence mIl is about nn! Thus Gaussian elimination gives a dramatic improvement over Cramer's rule. Even if more sophisticated methods of evaluating determinants are adopted this statement remains true (see, for example, Kunz (1957)). It may happen that during the elimination the normal pivot is zero. In that case two equations are interchanged so that the pivot is non-zero (there must be at least one equation with non-zero pivot so long as A is non-singular) but, if the pivot is non-zero but very small compared with other coefficients in its column, numerical instability can arise. This is caused by the fact that computers have a finite word length. As an example suppose that the computer can store only three significant digits in floating point and is working in the base of 10. Let the equations be (Forsythe and Moler (1967)) 1.00 x 10- 4 X I

1.00xI

+ 1.00x2 = 1.00,

+ 1.00x2 = 2.00.

Gaussian elimination, taking account of the limitation of word length, supplies 1.00

X

-1.00

lO-4 X I X

104 x 2

+ 1.00x2 = 1.00,

= -1.00

X

104 •

From back substitution, X2 = 1.00 and Xl = 0.00 which is obviously incorrect. By reversing the order of the original equations and performing Gaussian elimination we obtain X 2 = 1.00 and Xl = 1.00 which is acceptable. The general rule therefore, in eliminating x, from some equations, is to select as the pivot the coefficient of x, which has the largest magnitude; this is termed partial pivoting. If, however, the element of largest magnitude in both rows and columns is chosen as pivot the process is known as complete pivoting. According to Wilkinson (1965) and Ralston and Wilf (1967) it is doubtful whether complete pivoting warrants the additional complication and computer time. Wilkinson also shows that partial pivoting is not necessary for numerical stability even ifit is sufficient, e.g. if A is a real symmetric positive definite matrix, or if A is strictly diagonal dominant. Although pivotal strategy can control the difficulty of large multipliers in Gaussian elimination it still leaves open the possibility that the solution is very sensitive to small changes in the coefficients, i.e, the system is ill-conditioned. Suppose that A (which is non-singular) is perturbed to A + Band b to b + c.

54


These perturbations cause a change in x, altering it to x

+

y (say). Then

+ B)(x + y) = b + c.

(A

A bound for y is provided by: TflEOREM

1.12. If

liB IIIIA-111 < 1 then

IIYII ~CIIA-ll1(lIcli + IIBllllxlD and

ll!ll~c,,(~+

IIbll where K = IIAII IIA -111 and C = III II + IIA-lBII/(1 - IIA -1 BII). [x]

BII)

II IIAII

Proof. Since A is non-singular (I

+

A -IB)y

= A -l(C - Bx).

From (1.79) and Theorem LIla, IIA-1BII < I implies that 1+ A-1B has an inverse. Consequently, y = (1 + A-1B)-lA-1(c - Bx) whence Because of the expansion in Theorem l.lla,

+ A- 1B)-111 ~ 11111 + IIA-1BII(1 - IIA-1BII)-1; !lAllllxll and the result stated in the theorem follows. 11(1

also [b]

~

If K is small then small changes in b and A will produce only small changes in [x] and the equations can be regarded as well-conditioned. However, large K does not necessarily mean that the system is ill-conditioned because only upper bounds occur in Theorem 1.12. Nevertheless, we cannot improve those bounds, when B = 0 at any rate, since examples are known (see, for example, Forsythe and Moler (1967)) in which equality is achieved in Theorem 1.12. We call K the condition number of the system. Its precise value depends upon the choice of norm. If the spectral norm II 112 is selected then, from (1.78), K = (JJ.l/JJ.n)1/2 where J.ll and JJ.n are the largest and smallest eigenvalues of AHA; this K is sometimes described as the spectral condition number. The condition number can be altered by scaling, i.e. by multiplying each equation in Ax = b by some integer power of 10, in the decimal system, though the same power need not be used for each row. If K is large, whatever scale factors are employed, the equations are ill-conditioned. A small value of K indicates a well-conditioned system. It is desirable to have available a systematic technique for scaling that ensures that K is small as possible. Unfortunately, no method is known which applies to arbitrary matrices and arbitrary norms. One practical method is to attempt to arrange that n elements of A are of order unity, no two of these elements being in the same row or column, all other

55

LINEAR EQUATIONS

elements of A being less than unity in magnitude. Round-off error may be reduced by using the power of the machine number base closest to the largest element. It should be remarked that det A not being large does not necessarily signify ill-conditioning. Examples are available in which det A is small and IIY112/llx11 2 = IIcI1 2/llbIl 2 • If, by scaling, it is arranged that IIAII2 = 1 then" = 1/JL~/2 and the spectral condition number is large if and only if Jl.n is small. Let det A --+ 0; then u; -+ 0 provided that Jl.l is fixed. In other words, if A is normalized so as to keep JLl fixed then det A is closely related to the condition of A. But, in general, the largeness of the condition number is more significant than the smallness of det A as a criterion for determining ill-conditioning. Gaussian elimination is applicable to complex equations if the computer has a facility for complex arithmetic. Otherwise the equations must be separated into their real and imaginary parts and the resulting real equations solved. There are other inversion algorithms. A popular one is triangular decomposition in which the aim is to write A in the form (1.80)

A =LU

where L is a lower triangular matrix (i.e, all elements above the diagonal are zero) and U is an upper triangular matrix. If this can be done in such a way that Land U are non-singular then, by putting Ux = y, we have to solve the two systems Ly = b,

Ux

= y.

Since both systems are triangular the first can be solved for y by back substitution and then x can be determined from the second by back substitution. The effort involved in the back substitution is substantially less than that in the triangular decomposition. Conditions which permit (1.80) are contained in:

1.12a. Let Ak be the matrix formed by the first k rows and columns of ••• , An-I' A are all non-singular A = LU and the decomposition is unique if the diagonal elements of either L or U are specified. THEOREM

A. If A h A 2 ,

Proof. Only the situation in which all the diagonal elements of L are chosen to be unity will be considered, the general case being left to the reader. With 0 L=

12 1 In1

0 0

In2

, u=

Ul 1

U 12

U 1n

0

U 22

U 2n

0

0

Unn

56


we req uire that

A=

To get the first row of A right we need Ul j

=

alj(j

= 1, ... , n).

The first column of A will be given correctly if

(i which is possible since U I I = A is obtained by taking U2j

all

= a 2i

= 1,... , n)

i= 0 by assumption. Now the second row of

-

121 Uli

(j

= 2, ... , n)

and then the second column may be realized by

(i

= 2, ... ,n).

The last formula is legitimate provided that U2 2 # 0, i.e. a 2 2 - a21 al2/all # 0 which is true since det A 2 i= O. Proceeding in this way, a row and a column at a time, we construct Land U and the construction obviously leads to unique elements. Remark that, since A is non-singular, neither L nor U can be singular. If A is real, symmetric, and positive definite we can show by the same procedure that there is a real lower triangular matrix L such that

This is known as Cholesky decomposition. The algorithm, in this case, is highly stable. If A is Hermitian positive definite then A = LLH where the diagonal elements of L are positive. Finally, we observe that it is sometimes possible to improve the accuracy of a computed solution to a system of linear equations by iteration. If x(1) is the first approximation, calculate the residual r(l)

=b-

Ax(l)

as accurately as possible. Then solve the system Ay = r(l) and take X(2) = x(1) + y. Clearly this is the first stage of an iterative procedure which, under suitable circumstances, will lead to more accurate numerical values.

57

LINEAR EQUATIONS

Exercises 59. Is the system with

10- 4 A

=(

0.1

-2 x 100.1 1.0

0.2 -10- 4

4 )

-10- 4

0.2

badly scaled? 60. Find the triangular decomposition of

2 4

(:

-2 4 0

61. Suppose there are two triangular decompositions

A = LtUt = L 2U2 with L, and L 2 having units on the diagonal. If A is non-singular prove, without using Theorem 1.12a, that VI and V 2 are not singular. Deduce from Li 1 L 2 = VIVii that the decomposition is, in fact, unique.

1.13 Iterative methods Iterative methods, which are appropriate for large sparse matrices, are based upon the idea of starting with an initial guess x(O) to the solution of Ax = b and then deriving a sequence xU), X(2), ..• which converges to the exact solution. All of the methods are based on rewriting A in the form

A=L+D+U where D is a diagonal matrix with diagonal elements the same as those of A, L is lower triangular with zeros on the diagonal, and U is upper triangular with zeros on the diagonal. For instance, if we express Ax = b as Dx = b - (L

+

U)x

this suggests the iterative scheme x(r+ 1)

= D- 1b

- D- 1(L

+

U)x(r)

which is known as the Jacobi method. A necessary condition for the application of this method is that all the diagonal elements of A are non-zero. Alternatively, the form (L + D)x = b - Ux suggests the iteration x(r+1)

= (L + D)-lb -

(L

+ D)-lUx(r)

58


which is the Gauss-Seidel method. Again, by introducing the non-zero scalar parameter wand writing (D + wL)x = {-wU + (1 - ro)D}x + rob,

we derive the procedure x(r+l)

= (D + WL)-l{ -wC] + (1 - w)D}x(r) + (D + WL)-lWb.

This is the method of successive over-relaxation (SOR). It reduces to the Gauss-Seidel method if co = 1. Of course, one does not in practice calculate the inverse matrices on the right-hand sides in the iterative schemes but, instead, solves the linear system which arises before the application of the inverse matrix. One advantage is evident in that zero elements of the matrix need not be stored and successive vectors can be overwritten on their predecessors so that considerable economy of computer storage can be achieved. The iterations are all examples of taking an equation x=Bx+c and replacing it by

x""

I)

= Bx(r)

+ c.

Let e(r) = xC,) - x be the error at a particular stage. Then, by subtraction, we see that e(r + 1) = Be(r) whence e(r)

= B'e(O).

Now x(r) converges to x if and only if e(r) approaches zero which can happen for every choice of e(O) if and only if B' -+ O. From Theorem 1.11 we deduce: THEOREM

if p(B)
n. In the former case the linear system possesses an infinite number of solutions while,

GENERALIZED INVERSE

67

in the latter.. the system may strictly have no solution because the equations are inconsistent with one another. Yet it may be important for the application to identify a single entity which one is prepared to accept as 'the solution' of the system. One method which suggests itself is that of least squares. There are at least two ways in which this could be applied. We could consider minimizing the sum of the squares of the residual rTr where r = b - Ax or we might try minimizing xT x. Let us deal with residuals first. The rank of a matrix is the order of the largest non-singular submatrix in the matrix. It is not difficult to confirm that, when A is real, A and AT A have the same rank. With that notion we can formulate 1.15. Ifreal A is »1 x n with m > nand of rank n, the solution of(I.81) which minimizes r Tr is given by

THEOREM

Proof Since AT A is n x n and of rank n, it is non-singular and so the formula makes sense. Also rTr = bTb - xTATb - bTAx

so that o(rTr)/oxi =

°

+ xTATAx

for i = 1, ... , n leads to

ATAx

= ATb

(1.82)

and the theorem is proved. For the case m < n we have THEOREM 1.15a. If real A is m x n "vith m < n and of rank m, the solution of (1.81) which minimizes x Tx is given by

x = AT(AAT)-lb. Proof. The formula makes sense because AA T is m x m of rank m and therefore non-singular. The minimum of xTx subject to Ax = b is found by Lagrange multipliers, i.e. by minimizing S = xTx + A,T(b - Ax) where A. is a column vector with m elements. From as/ax; = 0, i = 1, ... , n we obtain x = ATA, and from as/OA j = 0, j = 1, ... ,n we have Ax = b. Hence

AATA. = Ax = b which can be solved for A. and the theorem follows. It is desirable to relax the conditions on rank in Theorems 1.15 and LISa and find a single formula which encompasses all possibilities. To this end we

68


examine whether there is a matrix A + such that x Theorem 1.15 we have

= A "b.

In the case of

and we remark that A+ AA+

= (A TA)-lA TA(A TA)-IA T = (A TA)-IA T = A+.

Similarly AA + A = A. Also AA + and A + A are symmetric. Now, for Theorem 1.15a, A + = AT(AAT)-t and a check reveals that it has the same three properties. This prompts: DEFINITION. A matrix A + with the properties (i) A + AA + = A +, (ii) AA + A = A, (iii) AA + amd A + A are symmetric, is called the generalized inverse of A. If A is complex replace symmetric in (iii) by Hermitian.

It has already been verified that the inverses of Theorems 1.15 and 1.15a comply with this definition. If A is square and non-singular, the inverse A-I obviously satisfies it. Moreover, we can show that A possesses only one generalized inverse so that A + = A -1 when A -1 exists. Suppose, in fact, that real A had a second generalized inverse B+. Then, from property (ii)

= A + AB+ A,

A +A

and then property (iii) implies that

= B+ AA + A = B+ A whence B + = A + AB ". Similarly, AA + = AB + from which hence A + = B+ so that the generalized inverse is unique. A+ A

A+

= A + AB +

and

By taking the transpose of the quantities in the definition we deduce that

(A +)T = (A T ) + so that, if A is symmetric so is A +. Obviously, (A +)+ = A and A + has the same rank as A. To obtain an explicit formula for A + it is convenient to derive first some properties of complex m x n matrices. If JL is a non-zero number and there are

vectors u, v such that

Au = JLV,

AHv =

JLU

then JL is known as a singular value of A and u, v as the corresponding pair of

singular vectors. Now

A" Au

= JlA"v =

jJ2U.

Since AHA is positive semi-definite the values of JL2 are real and non-negative. Also AHA, which is of order n x n, possesses n linearly independent eigenvectors U 1 , •.• , u, which can be arranged to be orthonormal. If the rank of A is k so is the rank of AHA and precisely k of the values of jJ2 are non-zero. Pick the order of the eigenvectors so that U1 , U2, ••• ,Uk correspond to the non-zero

69

GENERALIZED INVERSE

eigenvalues Ili, ... ,Il~. Note that uk + l' • •• , u, can be chosen to satisfy AUi = O. Define Vi for i = 1, ... .k by Vi = AUi/lli with Ili the positive square root of Ill. Then AAHvi = AAHAui/lli = lliAui = J1.fv i so that Vi is an eigenvector of AA H and, since AHvi = lliUh Ill' ... ,J1.k are the positive singular values of A and Ui,V i the corresponding singular vectors. The vectors Vi are orthonormal because JliJlj(V i , Vj)

= (AUi' AUj) = (U;, A HAu j) = JJJ(U h Uj).

The set may be completed by adding on m - k orthogonal vectors satisfying AHv = 0; they are automatically eigenvectors of AA H corresponding to the eigenvalue zero. Define the n x n matrix U and the m x m matrix V by Then we have THEOREM

U

= (u.,

· .. , u,),

V = (V 1 ,

••. ,

vm ) .

i.i5b. If A is of rank k there are unitary matrices U and V such that VHAU

(~ ~)

=

where A is a diagonal matrix of order k whose diagonal elements are the singular values of A. Proof. By construction UHU is the n x n unit matrix so that U is unitary. Similarly V is unitary. Also AU = (Ill VI"

••

,llkVk' 0, ... ,0)

and the theorem follows from the orthonormal property of the

Vi.

Since the diagonal elements of A are non-zero, A- 1 exists. This fact enables us to state THEOREM

1.15c. If A is of rank k, its generalized inverse is given by A+

=U(

Proof A+ AA+

= U( A- 1

o

0) V

A -1 0

H

0

•

O)(A O)(A- 0) V 1

0

0

0

0

H

0

from Theorem I.ISb. Property (i) of the Definition follows at once. Further,

70


since U and V are unitary

so that AA+

=

v(;

~) V

A+ A

=

uG

~)UH

H

which show that property (iii) is satisfied. Finally,

and the proof is complete. Remark now that so that x = A "b satisfies (1.82) with the affix T replaced by H. Moreover, in the analogue of Theorem 1.15a, we need a solution of AA"A, = b. But AA" is Hermitian and so has the structure of Theorem 1.15b with V = U whence (AA H)+ = (A")+ A ". Therefore x = A"(A")+ A "b = A "b. Therefore, we have proved THEOREM· 1.15d. If A is a complex m x n matrix of rank k the vector x which minimizes (a) (b - Ax, b - Ax) and (b) (x, x) subject to Ax = b is given by x = A "h where A + is specified in Theorem 1.15c.

It is sometimes possible to derive formulae for A + which do not involve finding U and V (see exercises). In practice, the system (1.82) may be ill-conditioned and, indeed, worse than the original system. For consider the case when A is square. Then, if Ax = b is ill-conditioned, det A is likely to be small and det(A T A) = (det A)2 will be much smaller again. There are similar arguments if A is not square. For this reason, when A is real, advantage is sometimes taken of the result derived in the previous section that A = QUI where Q is orthogonal and VI upper triangular to solve instead

The condition of the system may then be considerably improved.

71

GENERALIZED INVERSE

Exercises

73. If u and v are non-zero real vectors prove that (iju" = (UTU)-1 UT, (ii) A + = A T/ (VTV)(UTU) where A = uvT • 74. If A = BC where B is m x k, Cis k x n and all three matrices are of rank k prove that A+ = CT(CCT)-l(BTB)-lB T. 75. Give an example in which (AB) + :F B+ A

+.

-1

76. Calculate the singular values of

-2

2

77. If Ax = b is a consistent system prove that k

X

=

L

i=1

(Vi' b)udlli +

n

L

i=k+l

~iUi

in the notation of this section, the ~i being arbitrary constants. 78. If A is Hermitian with eigenvalues Ai and orthonormal eigenvectors

Xi

show that

n

A=

L

i=1

AiXiXr.

79. Prove from (1.83) that A + = (A" A) -1 A" if A" A is non-singular. 80. If AHA is non-singular prove that the vector x which minimizes ,...Br, where B is positive definite, is x = (A"BA)-t A"Bb.

2 WA VEGUIDES AND DIFFERENCE EQUATIONS 2.1 Introduction The phenomena of electromagnetism are governed by Maxwell's equations, which may be expressed as curl E

+ aB/at = 0,

(2.1)

aD/at = J,

(2.2)

curl H -

div D = p,

(2.3)

= 0,

(2.4)

div B

where the vector E is the electric intensity, B is the magnetic flux density, D is the electric.flux density, H is the magnetic intensity, J is the current density, and p is the charge density. The charge density and current density are connected by the equation of continuity or conservation of charge div J

+ op/ot = o.

(2.5)

Equations (2.1)-(2.5) are insufficient to determine the electromagnetic field and have to be supplemented by constitutive equations showing how the field is related to the properties of the medium. The simplest constitutive equations occur in free space, where (2.6) D = BoE, B = JloU, where Jlo and eo are certain constants. They are related by c

= 1/(Jl oeo) 1/2 ,

where c is the speed of light. In the SI system of units which will be employed here, Jlo = 4n x 10- 7 henry /metre, c is about 3 x 108 mls and eo is about (1/36n) x 10- 9 farad/metre. For many bodies the laws (2.7) D = eE, B =,uH are reasonable. The ratios e/e o and Jl/Jlo are often known as the dielectric constant, or permittivity, and permeability respectively. The permittivity is never less than unity but the permeability can be, though it is very close to unity for many substances.

WAVEGUIDES AND DIFFERENCE EQUATIONS

73

There are other constitutive laws for bodies such as ferrites and ferrotriagnets but they will not concern us in the present context. (See, for example, Jones (1986), where many subsequent statements in this section are substantiated also.) In a conductor, Ohm's law holds in the form J

= O'E,

(2.8)

where 0' is known as the conductivity. Many metals possess a high conductivity and so it is often a theoretical convenience to regard a metal as an ideal perfect conductor in which (J is taken to be infinite. At a boundary which separates one medium from another the parameters 8, u, and 0' will often change sharply and it is necessary to have formulae which connect the fields on the two sides of the boundary. These boundary conditions are as follows. (i) The normal component of B is continuous. (ii) Each tangential component of E is continuous. (iii) If the conductivities of both media are finite each tangential component of H is continuous. If one medium is a perfect conductor, (iii) is not valid and (ii) becomes (ii)' Each tangential component of E vanishes on a perfect conductor.

There are two further results which can be helpful. (iv) The change in the normal component of D is equal to the surface charge density. (v) At a perfect conductor n /\ H = J s ' where n is a unit vector normal to the boundary and J s is a surface current density. In many important cases (ii) implies (i) and it is sufficient to impose only conditions (ii) and (iii) (or only (ii)' for a perfect conductor). Fields which are produced by currents and charges whose variation with time t is simple harmonic are of considerable importance. If 8, u, and 0' are independent of time Maxwell's equations and the constitutive laws are linear. It is therefore possible to consider writing where i = .J - 1 and to is a constant, w/21t being known as the frequency. Here it is understood that E, and H, do not involve t and the real part of the complex expressions is taken at the end of the analysis. This will be legitimate provided that there are no extraneous circumstances or boundary conditions which introduce non-linearities. The boundary conditions (i)-(iii) and (ii)' are linear so that the suggested procedure is certainly acceptable for them. Accordingly, for harmonic fields subject to (2.7) the governing equations

74


can be taken as

+

iWJlHh = 0,

(2.9)

curl H, - iweEh = J h,

(2.10)

= Ph' div(Jl"h) = 0,

(2.11)

curl E,

div(eE h)

(2.12)

from which all time variations have disappeared. If OJ =1= 0, the divergence of (2.9) gives (2.12) and the divergence of (2.10) gives (2.11) when the equation of continuity (2.5) is taken into account. Therefore, for harmonic fields, which are non-static, it is sufficient to employ (2.9) and (2.10) plus the harmonic form of the equation of continuity. Also the boundary condition (i) will be covered automatically by (ii). Conductivity would also be allowed for by replacing e by e - i(J/ OJ. It can be shown that in these circumstances the solution of the electromagnetic problem is unique, except possibly if the medium extends to infinity. To ensure uniqueness when the medium is infinite it is necessary to impose extra conditions to guarantee that a source at the origin produces a field which is radiating or outgoing at infinity. If u and e are constant near infinity introduce Zo = (Jl/e)1 /2, the impedance of the medium. Then if r is the radial vector from the origin and I, a unit vector in the same direction so that r = r l, the radiation conditions are rE, rH bounded, r(E - ZoH

1\

L)

r{H - (I/Zo)l r as r

1\

} ~

0,

(2.13)

E} -+ 0,

-+ m so that

v;

(v: + with v;.) (k 2 < v;').

and then a possible electromagnetic field is given by (2.17) and


77

(2.18) as

E, E,

H,

=

=

= 4>m exp( -

iAmZ),

-(iAm/V~) exp( -iAmZ) grad c/>m'

-(iOJe/v~) exp( -iAmZ)k

1\

grad

4>m.

It is characterized by having no component of the magnetic field in the direction of propagation along the z axis and is therefore known as a transverse magnetic or TM mode. Similarly, if

(2.26) with ot/Jm/on provided by

=

°on C and

K~

= k2 E,

E,

J1~, a transverse electric or TE mode is

= 0,

= (iOJJ1/J1~) exp( -iKmz)k 1\ grad t/Jm' Hz = t/Jm exp( -iKmZ),

H, = -(iKm/J1~) exp( -iKmz) grad t/Jm.

A possible solution occurs when J1m is zero and ifim is a constant. However, (2.24) then forces the constant to be zero. Therefore J-lI is taken as the first non-zero eigenvalue which occurs. Finally, there is the case K 2 = k 2 in which both E; and Hz vanish; this is a TEM mode and E, = 0, E, = exp( -ikz) grad tjJ, Hz

= 0,

H,

= (k/wJ-l)

exp( -ikz)k

1\

grad

t/J.

It can be shown (see, for example, Jones (1986)) that any electromagnetic field in a waveguide can be expressed as a sum of constant multiples of all the modes propagating in the positive and negative z directions. A TE mode propagates without attenuation only if k 2 > J1;'; otherwise it is exponentially damped. Therefore every TE mode is damped if k 2 < J1i. Similarly every TM mode is damped if k 2 < vi. Since it is known that J1i ~ vi there are no propagating TE or TM modes if k 2 < ~i. For this reason J11 is often known as the cut-off wavenumber and 2n/J11 as the cut-off wavelength. The TE mode corresponding to J11 is called the fundamental or dominant mode. The TEM mode propagates without attenuation for all values of k. However, it may not always exist. If the cross-section of the guide is simply connected, the boundary consists of one connected piece and t/J must be a constant. Then the field disappears. If the cross-section is multiply connected there may be a TEM mode (e.g. in a coaxial cable where the boundary consists of two separate concentric circles).

78


Unless otherwise stated, the cross-section will be assumed to be simply connected so that the TEM mode does not arise from now on. Thus our prime objective becomes that of finding 11m' Vm, 4Jm' and I/Jm' If there are known charges and currents present this is still true since the main modification is to add known terms to the right-hand sides of (2.25) and (2.26). The number of boundaries C for which analytic solutions can be found is strictly limited. In numerical work, however, we can take advantage of approximating functions by their values on a discrete point set; then operations can be reduced to simple arithmetic forms which are convenient for digital computer programs. If the approximation becomes more accurate as the spacing between the points diminishes, a satisfactory procedure will have to be derived. Accordingly, we turn to the question of formulating finite-difference approximations for (2.25) and (2.26).

Exercises 1. The boundaries of a waveguide are the planes y = 0 and y = b, and the field is independent of x. Show that the y dependence of Hz in a TE mode is cos(mny/b) and find the mode. Is there a TEM mode? If so, find it. 2. A rectangular waveguide has boundaries x = 0, a and y = 0, b. Show that the modes are derived from t/Jmn = cos(m1tx/a) cos(nnyjb), ljJmn

= sin(mnx/a)

sin(nny/b),

indicating any restrictions on the integers m and n. What is the cut-off wavenumber? 3. In a circular waveguide of radius a, prove that in cylindrical polar coordinates r, 4>, Z the modes are derived from

where Jm is the customary Bessel function and Jm(jmn) = 0, J~(j~n) = O. What modes occur when the perfectly conducting sheets ljJ = 0 and ljJ = in are added? 4. The cross-section of a coaxial line consists of the concentric circles r = band r = a (a > b). Show that there is a TEM mode with t/J = A + Bin r. 5. The cross-section of a waveguide is an equilateral triangle of side a. Take the centre of the triangle as origin with the x axis parallel to one side. Find the modes by considering products of trigonometric functions whose arguments are constant multiples of !(J3x + y + 4a) and !( -J3x + 3y).

2.3 Numerical derivatives A derivative f' of a function f is defined by f '(a) -- li1m f(a h-O

+ h) h

f(a) .


79

This suggests that a possible definition for the numerical derivative at a, which will be denoted by F~, might be F~

= f(a +

h) - f(a) . h

The formula is very simple and requires the evaluation of f at two points. For small enough h, one would expect it to be a reasonably good approximation. It is known as the forward differenceformula. Since there is no change to the analytical formula when the sign of h is altered, we could equally well adopt the backward difference formula F'

= f(a) - f(a - h) h

a

for the numerical derivative. A glance at Fig. 2.2 will reveal that the two numerical derivatives are not equal in general. The formulae give the slopes of the two chords on opposite sides of the point under consideration. Since the correct direction is that of the tangent, the forward difference gives too low a line and the backward difference too high. The average value might be better, i.e. F'

= f(a + h) - I(a - h) 2h

a

or the central differenceformula. Because it is the chord joining f(a - h) and f(a + h) the slope certainly looks more satisfactory. A general justification of the superiority of the central difference formula stems from Taylor's theorem. For

f(a

+ h) = f(a) + hf'(a) + !h 2 f"(e)

f(x)

a-h

a

a+h

Fig. 2.2. Backward and forward difference formulae.

x

80


where a < ~ < a + h. Hence, for the forward difference formula, F~

- f'(a) = !hf"(~).

(2.27)

Consequently, when I" is bounded, the error in the forward difference formula is O(h). Similarly, the error in the backward difference formula is O(h). On the other hand, from

f(a

+ h) = f(a) + hf(a) + th 21"(a) + lh 3 f"'(~),

(2.28)

+ th 2 f"(a) - ih 3 f"'(~)

(2.29)

f(a - h) = f(a) - hf'(a)

we see that the central difference formula satisfies (2.30) so that the error is O(h2 ) . So long as the derivatives of f are well behaved and h is not too large, the central difference should be the most accurate of the three. Moreover, its error decreases quadratically as h is decreased, whereas the decrease is only linear for the other two formulae. In general, therefore, the central difference formula is to be preferred, though one may be obliged to use one of the other two if data are not available on both sides of the point where the numerical derivative is to be calculated. The impression may be gained that any desired accuracy can be achieved by making h small enough. However, this is not true since there is a limit to the accuracy in the evaluation of f. If this is e, the error in F ~ in the forward difference formula may be 2e/h in absolute value. It follows from (2.27) that the error in F~ - f'(a) may be as high as

thM

+ 2e/h,

where M is a bound for If"l. Since this has a minimum for h2 = 4e/M there is no point in making h any smaller than 2(e/M)1/2. The right-hand sides of (2.27) and (2.30) are known as the local truncation error of the appropriate difference formulae. By making the local truncation error small we hope to make our difference scheme accurate. Nevertheless, it must not be assumed that after applying it successively over a large number of points, as is required for partial differential equations, the accuracy all over the region will be good-this requires an investigation of the global accuracy which is usually a very complicated matter. With this reservation, difference formulae of higher orders of accuracy can be constructed. For example, F'

= -tf(a - h) - tf(a) + f(a + It) - !f(a + 2h)

a

has an error of O(h 3 ) .

h


81

Higher derivatives may be handled in a similar manner. For instance

" f(a Fa =

+

+

h) - 2f(a)

h2

f(a - h)

(2.31)

is a suitable formula for the second derivative. By extending the expansions in (2.28) and (2.29) by an additional term, we may verify that the error is O(h2 ) . Difference formulae are connected with the theory of interpolation. If the parabola y

= «(x

is required to interpolate f atx ex

(3

+

f(a

- a)2 + f3(x - a) + y

=a-

h, a, a + h it is found that

h) - 2f(a)

+

f(a - h)

= --------2h 2

= f(a + h) -

f(a - h)

2h

y = f(a).

'

Then y'(a)

= F~,

y"(a)

= F:,

where F~ and F: are given by the central difference formula and (2.31) respectively. Thus numerical derivatives are actually specified by fitting a polynomial to the data and then taking the derivatives of the polynomial. The higher-order formulae need higher-order polynomials and, as has been seen in Chapter 1, these may provide very oscillatory interpolants to the consequent detriment of the numerical derivative. To find stationary points, i.e. points where f'ex) = 0, we can either discover the zeros of the interpolating polynomial just described by the methods of §1.8 or we can carry out inverse interpolation on the values of F ' at the data points by the method of §1.2. It may happen that the data points are not equally spaced. Since interpolating polynomials are still available, difference formulae can still be generated. An example is

F: = b(b +2 1)h

2

{bf(a - h) - (l

+ b)f(a) + f(a + bh)}.

(2.32)

The accuracy of such formulae is usually not so high as those with equally spaced data points. All of these ideas can be generalized without difficulty to several variables

82


and partial derivatives. Thus, possible difference formulae in two dimensions are

F'

= f(a + h, b) - f(a - h, b) 211

a

F"

F'

h2

F" _ f(a, b

'

'

+ k) - 2f(a, b) + f(a, b - k) k2

bb -

= k, the

2k

= f(a + h, b) - 2f(a, b) + f(a - h, b)

aa

If h

= f(a, b + k) - f(a, b - k)

'b

'

difference approximation for Laplace's operator is

f(a, b + h) + f(a - h, b) + f(a + h, b) + f(a, b - h) - 4f(a, b) h2 Since five points are involved it is known as the five-point difference scheme. By means of Taylor's expansion it may be demonstrated that the local truncation error is of the form

Exercises 6. Show that the errors in

-!f(a)

+ 2f(a + h) - !f(a + 2h)

, Fa

=

' Fa

= !f(a - 2h) - 2f(a - h) +

h

,

~f(a)

-----h-----

are both O(h 2 ) . 7. Show that the error in , f(a - 2h) - 8f(a - h) + 8f(a + h) - f(a + 2h) F = . 12h a is O(h 4 ) . 8. Prove that the local truncation error in

F"

=-

f(a - 2h) + 16f(a - h) - 30f(a) 12h2

a

+

16f(a

+ h) - f(a + 2h)

is h4 f(6)( ~)/90. 9. Given the table x f(x)

determine f' (0.48).

0.40 0.389

0.44 0.426

0.48 0.462

0.52 0.497

0.56 0.531

83


10. Find the approximate values of x where f is a maximum for the table

x

f(x)

1.46 1.994

1.50 1.998

1.54

1.58

2.000

2.000

1.62 1.999

1.66 1.996

11. Show that a possible difference formulae for Laplace's operator in two dimensions is

a.f(a - h1 , b)jh1 + a.f(a + h2 , b)jh2 + pf(a, b - k1)/k 1

+ f3f(a, b + k2)/k 2 - 2(1/h 1h2 + 1/k1k2 )f (a, b), where a. = 2/(h1 + h2 ) , P = 2/(k 1 + k2 ) · 12. Show that a nine-point difference formula for Laplace's operator in two dimensions is 1 -2

6h

[4{f(a + h, b) + f(a - h, b) + f(a, b + h) + f(a, b - h)} - 20f(a, b)

+ f(a + h;b + h) + f(a + h, b - h) + f(a - h, b + h) + f(a - h, b - h)] and that the local truncation error is O(h6 ) .

2.4 Properties of difference equations Formulae for numerical derivatives are not difficult to write down, but assessing their value for solving partial differential equations is another matter. We are not concerned here with determining analytical solutions of difference equations or even with their asymptotic behaviour though many useful results have been obtained in this direction. (See, for example, Milne-Thomson (1960); Dingle and Morgan (1967).) Instead our attention is centred on how solutions of the discrete system which replaces the continuous one are related to those of the partial differential equation. Our first demand is obviously that the difference scheme should approach the partial differential equation as the distances between the data points become zero. This is usually easy to verify by examination of the local truncation error. In addition, there may also be boundary conditions or initial values that have to be confirmed. Next, it will be desirable to know how closely the numerical solution U approximates to the theoretical solution u of the partial differential equation. For example, does u(mh, nh) - U(mh, nh) tend to zero as h --. 01 For this purpose, it will not be sufficient to hold m and n fixed as h --+ 0, otherwise conclusions will be restricted to the behaviour at the origin. Thus, convergence as h -+ 0, m --+ 00, n --. 00 in such a way that mh and nh remain fixed will be relevant; this is sometimes known as fixed-station convergence. Again, if a difference scheme can produce a solution which is not bounded, the numerical effect can be dramatically unpleasant. Any error may then be quickly amplified from step to step and the sought solution rapidly swamped with consequent instability. However, it may be that a growing solution is looked for and that an increasing error term might be tolerated so long as it

84


remains substantially smaller than the growing solution. Nevertheless, it will generally be wiser to reformulate the problem so that the required solution does not get too large and insist that the numerical procedure is stable. To illustrate the ideas involved let us consider the solution of Poisson's equation (2.33) subject to the boundary condition u = g(x, y) on C when the five-point difference scheme of the preceding section for Laplace's operator is employed. Draw a mesh of squares of side h and let (mh, nh) be a typical point of intersection or node (Fig. 2.3). Nodes inside C which are the centres of four other nodes which are either inside or on C are called regular points. Let the set of regular points, denoted by crosses in Fig. 2.3, be designated Rh • Let R~ be the set of nodes inside or on C which are not regular points but which are one mesh length from at least one regular point. They are marked by points in Fig. 2.3. The reader may care to visualize Rh and R~ as the discrete analogues of a region and its boundary respectively. The difference replacement is (2.34) where Um,n == U(mh, nh) and fm.n = f(mh, nh). Strictly, it applies only at the regular points since at points of R~ it would necessitate asserting that u satisfies Poisson's equation outside C. This difficulty will be dealt with later.

Fig. 2.3. Grid for Poisson's equation: x points of R,,; • points of R~.

85


If the right-hand side of (2.34) is zero, the Laplace difference equation Um,n+l

+

U".-l,n

+

Um+l,n

+

4Um, n = 0

Um,n-l -

(2.35)

is obtained. An important maximum principle is valid for it because the equation asserts that at a regular point Um , n coincides with the average of its values at the four neighbouring nodes. Therefore, not all the neighbouring values can exceed Um , ,. , nor.can everyone of them be less than Um , ,. . Thus Um , ,. cannot be a strict maximum or a strict minimum at a regular point. It follows that, if U vanishes at every point of R~, U is zero at every regular point, for if U differed from zero on R, it would have to have either a positive maximum or a negative minimum there contrary to our previous conclusion. An immediate consequence is that the solution of the Laplace difference equation is unique if the values of U are specified on R~. More generally, let w(x, y) be any function such that Wm,,.+l

+ \v".-l,,. + wm + l, n + wm,n-t

4w m , n ~ 0

-

(2.36)

on R h • Then w cannot have a strict minimum on R h • Accordingly, if w ~ 0 on R~ then \V ~ 0 on R h • This result enables the estimate of a bound on a general function e(x, y). For let w ~ 0 on R~ and denote the left-hand side of (2.36) by L(w). Consider v(x, y)

=

w(x, y) max {-IL(e)IIL(\v)} Rh

+ max le(x,

y)l.

Rh

Since the last term on the right-hand side is a constant L(v) = L(lV) max {-IL(e)I/L(w)} ~ -IL(e)1 Rh

on account of (2.36). This implies that L(v - e)

whence, since v ~

lei on

R~,

= L(v) + e)

gives v ~ - e on R h • Consequently lei arbitrary e, on R; + R~ \V o

~

0

v ~ e on R h • Also L(v

le(x, y)1 ~

L(e)

~

~

0

v on R h +

max { - IL(e)I/L(w)} Rh

R~.

In other words, for

+ max le(x,

y)1

(2.37)

Rh

with W o the maximum of Iwl on R h + R~. Now, suppose that the fourth partial derivatives of u are bounded by M. Then, by analysis of the local truncation error, u"..,.+

1

+ um-1,n + um+ 1,,. + Um,,.-l

-

4u m , ,.

= h2 f m,n + O(h4 )

where the order term is, in fact, bounded by !Mh 4 • We then subtract (2.34).

86


Then the error V - u in the numerical solution satisfies

L(V - u)

= O(h4 )

(2.38)

from which can be deduced

(2.39) Choose w above as

where a, x o, and Yo are picked conveniently so that R; + R~ lies within the circle whose perimeter is w = O. Note that w ~ 0, L(w) = -h 2ja2 and we can certainly take Wo ~ 1. Hence, from (2.37) and (2.39)

IV - ul

~ tMa2h2

+

max

R"

IV - ul

(2.40)

on R; + R~. The inequality (2.40) provides a bound on the error of the numerical solution. It is evident that the error tends to zero as h --+ 0 provided that the errors on R~ tend to zero. If we ensure that points of R~ tend to points of C and that U takes the specified values of u there we can be certain that as the mesh size tends to zero our numerical solution will approach the analytical one. One way of doing this is by the following strategy: if a point P of R~ is on C put U(P) = g(P); if P is not on C select the point Q on the grid lines through P which is nearest to P and then impose U(P) = g(Q). Only the points of R~ not on C need be considered. For them

= lu(Q) -

IU(P) - u(P)1

u(P)1 ~ MId

where d is the distance between P and Q, and M 1 is a bound for the first partial derivatives of u. By construction d ~ h and so max

R"

IV - ul

~

M 1h.

Thus, our rule does make sure that the numerical solution does approach the theoretical one as h --+ O. Observe, however, that the treatment of the boundary condition can introduce a larger error than that due to truncation. The above analysis assumes that the solution to the difference equation is obtained exactly. In practice, it will not because of round-off error. The effect of round-off, if only I decimal places are retained in the computation of Um,n' is to add a term on the right-hand side of (2.38) which possesses a bound of the form 10- IM. There is a corresponding addition in (2.39) and then (2.40) becomes

1 (6

+ -10-') 2 +

max IV- ul· h R" The numerical answer now converges to the exact solution as the mesh size

IV - ul

~ Ma 2 _h 2

87


tends to zero only if the number I of decimal places kept in the computation goes to infinity fast enough. The formula suggests that there is little point in choosing h smaller than the value ho which gives the minimum of the right-hand side, i.e. h~ = 6 X 10- 1• Remark also that, for a fixed choice of I, the expected size of a2 /h2 of the round-off error is proportional to the number of difference equations or mesh points that have been employed. The boundary condition adopted above is that of Dirichlet in which u is specified on C. Often the Neumann boundary condition in which the normal derivative of u is given on C is relevant. More generally, a condition such as a(p)u(p)

+ P(p)

ou(p) = g(p)

(2.41)

an

may be imposed at each point p of C. If f3 == 0, ex #: 0 this is the Dirichlet problem, and if ex == 0, P=1= 0 it is the Neumann problem. The derivation of (2.40) is unaffected by the boundary condition but the magnitude of the last term depends upon how the boundary condition is implemented. If the boundary has horizontal or vertical sides, a relatively simple technique may be adopted. For example, in Fig. 2.4 the normal derivative at p can be approximated by the backward difference formula and (2.41) becomes a.(p)U(p)

+

P(p){U(p) - U(q)}/h

= g(p).

It is probably more accurate to take into account that the partial differential equation is satisfied within the boundary. To do this introduce an extra node s and use the central difference formula for the normal derivative.Then (2.41)gives a.(p)U(p)

+

P(p){U(s) - U(q)}/2h

= g(p).

(2.42)

To remove the unknown U(s) we treat p as a regular point so that U(s)

+

U(t)

+

U(q)

+

VCr) - 4U(p) = h 2 f(p).

(2.43)

Elimination of U(s) from (2.42) and (2.43) gives {2ha(p)

+ 4fJ(p)}U(p)

- {J(p){2U(q)

+

U(r)

+ Vet)} =

2hg(p) - h 2f3(p)f(p)

(2.44)

as the equation to be applied at the node p. t

q

P

--s

r

Fig. 2.4. Vertical boundary.

88


a

q

c Fig. 2.5. General curved boundary.

When the boundary is curved, the situation is more awkward. A typical situation is indicated in Fig. 2.5 where the normal at p intersects the grid line between a and b at d. The normal derivative is approximated by the one-sided difference formula ou(p)jon = {u(p) - u(d)}jpd and substitution in (2.41) then gives a relation between u(p) and u(d). The value of u(d) is estimated by linear interpolation so that u(d)

= {dbu(a) + adu(b)}jab

and substitution in (2.41) permits u(p) to be expressed in terms of u(a) and u(b). Finally, we apply the difference replacement for Laplace's operator with unequal mesh lengths (Exercise 11) at b to obtain

+

~ {U(p) + U(C)} ~ {U(q) + u(a)} cp

pb

h

aq

bq

h

_

~ {~+ ~}U(b) = f(b). h pb

bq

Inserting the derived value for u(p) and a similar one for u(q), we obtain a relation to be satisfied at b by the values of u at a, b, and c respectively. In general, one can anticipate that less accuracy will occur for a given mesh size for normal derivative problems than for Dirichlet conditions since the representation of the normal derivative is somewhat inexact. There is another point which has to be watched when the boundary condition is pure Neumann, i.e, <X == 0 on C. No loss of generality is incurred by putting p = 1. Then g must satisfy a consistency condition if u is to exist, namely

f

ds = v~ u dx dy = Jcr 9 ds = Jcr au an A

f

f

dx dy.

(2.45)

A

The matrix of the associated finite difference scheme will be singular and the solution not unique. Even if g is compatible with (2.45), the discrete analogue may not be complied with unless special steps are undertaken. If the discrete condition is not satisfied one can expect that a numerical solution will fail to meet the discrete boundary conditions. An implicit assumption in the foregoing has been that u possesses at least four bounded partial derivatives. If the boundary possesses a re-entrant corner this will not be true and first derivatives will become infinite as the corner is


89

approached. There is some evidence (Fox 1962) that it pays to work with the nine-point difference formula (Exercise 12) in these circumstances rather than the five-point difference formula though the opposite opinion has also been supported (Duncan 1967). The preceding ideas can be generalized in a straightforward manner to the elliptic equation

a2u A- 2 +

ax

a2u

c- + ay2

au au D - + E - + Fu = f

ox

oy

(2.46)

where the functions A, C, and F satisfy the conditions A > 0, C > 0, and F ~ 0 throughout the domain under consideration. One can arrange that the discrete operator takes the form a1Um+1,n

+

a 2Um,n+l

+

a3 Um-l,n

+

a4Um,n-l - aoUm,n

(2.47)

where ao, al' ... , CX 4 are all positive and

The theory leading to (2.40) can be adapted to this case with very little modification if A + C > a(IDI + lEI); if this inequality does not hold corresponding results may still be derived but the analysis is more complicated since more complex comparison functions ware involved. If there is also a mixed derivative Bo 2ujox oy in (2.46) but AC > B2 so the equation is still elliptic, the term can then be removed in principle by a suitable change of coordinates. Should this be impracticable one has to accept that the discrete operator will lose the simple form (2.47) and its desirable properties. An interesting feature of difference equations is the way in which they reflect whether boundary or initial conditions have been correctly posed. Suppose, for instance, the discrete analogue of Laplace's equation is taken in the form

where hand k are the increments in mesh in the x and y directions respectively. Set up an initial value problem by specifying, say, Urn, 0 and Um, 1. By representing the initial data as a Fourier series and concentrating on a single term we are seeking a solution of (2.48) in which Um,o = eimb where b is some known real number. Try as a solution of (2.48) Um,n

= exp{i(mb + nc)}.

(2.49)

k: sin" tb = O.

(2.50)

Then band c must satisfy

sin? !C +

h

It is evident at once that c cannot be real. The roots must, in fact, occur in

90


conjugate complex pairs and therefore at least one of them has a negative imaginary part. The resulting solution (2.49) of the difference equations grows exponentially as n -+ 00, but at a fixed station n -+ 00 as k -+ 0 and so Urn. n can take arbitrarily large values even for very small initial data. Consequently, Urn, n is unstable in its dependence on initial values. Although the stability is due to trying to solve an initial value problem for an elliptic equation, the device by which it was detected can be applied to other problems. Suppose that the partial differential equation was the one-dimensional wave equation (2.51) instead of Laplace's equation. Then, (2.48) will be appropriate so long as the sign of the second set of parentheses is altered. In place of (2.50)

is obtained. If klh > 1 there are b such that the right-hand side is greater than unity and again instability occurs despite the problem being correctly posed. If, however, k/h ~ 1 all the values of c are real and (2.49) remains bounded as n --. 00. This is an illustration of numerical instability depending on the mesh size and demonstrates that to hope for stability and convergence of the numerical solution to the theoretical solution as h --+ 0 one must insist that at least klh ~ 1. In fact, this is not sufficient for when k = h the difference equation has the solution

which displays instability though not exponential growth. Nevertheless, the notions described form the basis of the von Neumann criterion, which is a necessary condition for numerical stability, in which all exponential solutions of the difference equation are examined and the system is declared unstable unless every possible solution remains bounded. The von Neumann criterion can be expressed in matrix form. Suppose that we have started from a 'vector differential equation so that we are concerned with a vector difference Urn," and suppose further that it has been arranged that in the difference scheme Urn,n+ 1 can be expressed in a linear combination of Um+,.n with r taking a certain finite number of integer values. Specifically, we assume

Urn,n+ 1 =

L A,(h, k)Um+ r.n

the A r being matrices in general. Make the substitution imb U m.n =Ve n


91

where Vn is independent of m. Then where

G = L A,(h, k) eib, . Since Vn = Gnv o, G is called the amplification matrix. Obviously, if the solution is to remain bounded, G" must not be allowed to grow. In general, only points in some finite domain will be under consideration. There, take the following as the stability requirement: for some positive k o there is a constant C such that

IIGnl1 for 0 < k < k o, 0

~

nk

~

~ C

K and all real b.

The spectral radius p(G) (§1.11) has the property {p(G)}n ~ IIGn II so that, on taking n = Klk, a necessary condition for stability is that p(G) ~ C k / K• There is no loss of generality in assuming C ~ 1; then C k/ K ~ 1 + (k/K)Cko/K for o < k < ko. Hence we have the von Neumann necessary condition of stability: there must be a constant C I such that the modulus of every eigenvalue of the amplification matrix does not exceed 1 + elk for 0 < k < ko and all real b. A sufficient conditionfor stability is that for some M

IIGII ~ 1 + Mk for 0 < k < ko. For then IIGlln ~ eM nk ~ eM K and so IIGUn is bounded. If G is a normal matrix, i.e, GHG = GG", then p(G) = IIGlI and we see that the von Neumann condition is both necessary and sufficient for stability. This includes the cases when G is Hermitian or unitary. The general investigation of necessary and sufficient conditions is involved and culminates in the Kreiss-Buchanan theorem, details of which will be found elsewhere (Richtmyer and Morton 1967). As an illustration, consider again the wave equation (2.51) but this time put v = ou/oy and w = au/ax so that the system

av/oy

= ow/ax,

av/ax

= ow/oy

is acquired. Replace this system by the difference equations (see also Exercise 19) Vm.n+1 - Vm.n = k(Wm+l . n - Wm- l .n)/2h,

= k(Vm.n+1 - Vm- 2 ,n+I)/2h. ~ e imb , Wm,n = ~ eimb and then,

Wm-l,n+l - Wm-1,n Make the substitution Vm,n = manipulation, the equation (

~ + 1)

(

1

J¥,.+ 1 = ic

iC)( J¥,.~ ) = G (J¥,.~ )

1 - c2

(2.52) (2.53) after slight

92


where c = (k/h) sin b arises. The characteristic equation of G is

A2

-

A(2 - c2 )

+ -I = o.

If c ~ 4, both roots of this equation have absolute value 1. Therefore, if we hold k/h constant as k -+ 0, the von Neumann necessary condition requires that k/2h ~ 1. If c 2 > 4 the von Neumann condition cannot be met with k/h kept constant. Since GG" =1= G"G, G is not normal and it cannot be asserted that the von Neumann condition is also sufficient. In fact, instability occurs when k = 2h. An alternative more stable difference scheme is furnished by 2

Vm,n+l - Vm,n

= k{Wm+l,n -

Wm-1,n

+

Wm+1,n+l - Wm- l, n+l }/4h,

Wm-l,n+l - Wm-1,n = k{Vm,n+l - Vm- 2,n+l

+

Vm,n - Vm- 2,n}/4h.

In this case

(1 +

ic )G = ( 2

1-

.

ic 2

IC

IC

1-

ic2

) .

Now G is a unitary matrix and both its eigenvalues have absolute value 1. The von Neumann condition is both necessary and sufficient and it is met without any restriction on the mesh size. The system is unconditionally stable. The stability criteria may be applied when the coefficients are variable by introducing new constant coefficients equal to the values of the original ones frozen at some particular point of interest and investigating the modified problem. Finally, there is an interpretation of Laplace's difference equations via random walks which is worth remarking. Suppose that a particle moves at random in such a fashion that when it leaves a point of R; the probability of it stepping to each one of the adjacent mesh points is !. In addition, the particle comes to rest whenever it reaches a point R h. Let Pm,n be the probability that the path of the particle starting at the grid point (mh, nh) terminates at one of the set S of points of R;'. Clearly, Pm,n = 1 when (mh, nh) lies in S, whereas Pm,n = 0 when (mh, nh) is located elsewhere in R h. If (mh, nh) is in R h , the probability is the sum of the probabilities of travelling to any adjacent mesh point followed by a path terminating from there. Hence

Pm,n = !(Pm+ l,n

+ Pm,n+ 1 + Pm-1,n + Pm,n-l)·

Consequently, Pm,n solves Laplace's difference equation subject to taking the value 1 on Sand 0 on R;' - S. On this basis, a statistical procedure, known as the Monte Carlo method, may be devised for calculating the solution of a Dirichlet problem.

Exercises

t

13. Set up difference equations with h = for solving Laplace's equation on the unit square with u = 1 on y = 1 and u = 0 on the other three sides.

93


14. In a calculation of the solution of (2.35) the computed value V~,n agrees exactly with Vm,n on R~ but is found not to satisfy (2.35) exactly because L(U') = 0 subject to u = 0 on y = 0, - 1 ~ x ~ 1 and ou/iJn = x on the remainder of the boundary by taking h = !. 18. The function u satisfies

au au

p ou

ox

x ox

2

2

-+ 2 ----=0 iJy2

P being a positive constant, and u is specified on the boundary which lies entirely in x > o. Show that on Ria + R~ lu - VI ~ !Mah 2J2(1 + tal +

max Rh

Iu - VI

where M is a bound for the partial derivatives of u. Hint: Consider w

= -hM{a 2 -

(x - XO)2 - (y - YO)2}

+ iM(1 + ta)(J2a + x - xo -

y

+ Yo).

19. If the right-hand side of (2.53) is replaced by k(v".,n - Vm- 2,n)/2h show that stability requires k/h2 to be bounded as k -+ 0, which is undesirably demanding. 20. In the discrete analogue of

iJ2 u

iJ2 u

a2u

ox

oy2

iJt

- 2+ - = -2

a formula of type (2.31) is used for each derivative, with k for increments in t and h for increments in both x and y. Prove that stability requires h ~ kJ2.

2.5 TEM modes The fields in TEM modes are expressed entirely in terms of a solution t/J of Laplace's equation which takes constant values on the boundary. Therefore the Laplace difference equation of the preceding section is especially relevant. If, for example, the boundary of the cross-section of the guide consists of two distinct portions as in the coaxial line put t/J = 0 on one connected part and t/J = 1 on the other. The difference scheme will then supply numerical values

94


for 1/1. The field components may then be deduced from the formula for a TEM mode but note, however, that they may be less accurate than 1/1 because numerical derivatives are involved in determining them. The only matter which merits further attention is how the numerical solution of the difference equations is to be accomplished. The most common practice is to carry it out iteratively using the successive over-relaxation (SOR) method described in §1.13 because the matrices for difference equations are sparse and usually large. The practical implementation of SOR entails the determination of the parameter to which occurs. Theorem 1.13a shows that w must be limited to the range 0 < w < 2. In Exercise 64 of §1.13 an optimal choice of w is given when the matrix A of the linear system of equations is tridiagonal. Actually, the relation given there between the eigenvalues A and J1 of the Jacobi and SOR methods is valid for a wider class of matrices, though the explicit formula for the optimal to holds only if all A are real. The wider class consists of matrices which are suitably sparse and whose non-zero elements follow a certain pattern. Choose a number of rows from A and call them S1 rows; the corresponding columns are called 8 1 columns. Term the remaining rows and corresponding columns S2 rows and 8 2 columns respectively. There will, of course, be no S2 rows if S1 includes all the rows of A. Then we have the following.

A matrix has Property A if every non-zero off-diagonal element lies either in an 8 1 row and an 82 column or an S2 row and an 8 1 column. YOUNG'S PROPERTY A.

If there are no 8 2 rows then a matrix with Property A is diagonal. It is advisable to know conditions under which it can be affirmed that, for a matrix with Property A, the optimal formula Wo

= 2{1 + (1 - Ai)I/2} -1

(2.54)

is correct. Now, a matrix with Property A can be converted by permutations of the rows and columns into a consistently ordered matrix whose properties will be described later. Then, if A is a positive definite consistently ordered matrix, the formula (2.54) can be confirmed. Despite this being a sufficient but not necessary condition it is highly relevant to difference methods. Broadly speaking, one can say that, given a selection of difference equations for the same partial differential equation, it is wise to restrict oneself to those which have Property A and preferable to pick one which is positive definite when consistently ordered. The question crops up of how to arrange that a matrix is consistently ordered. In other words, if the Umn are components of x, which Umn is to be identified with a particular Xi. For five-point difference schemes the natural ordering, in which one goes along a mesh line from left to right and then moves vertically one increment and repeats the process, achieves consistent ordering. Thus, for nine mesh points arranged on a square of side 3, Xl = U1 1 , X 2 = U12 , X 3 = U1 3 ,


95

X4 = U2 h X s = U2 2 , • • • • Another possibility is take Umn before Upq if m + n < p + q which orders by diagonals. Yet another is to place all Umn with m + n even ahead of those with m + n odd. After organizing the difference scheme suitably, there still remains the topic of deriving a pragmatic estimate for A. 1 in (2.54). One device is to select a value of co, believed to be less than roo, and compute

b, = [x, + 2

-

X, + 1 " 00 / " x, + 1 -

x, II 00

where x, is an iterate of the vector x above with components Umn • Then, as in Theorem 1.14, b, should converge to p(ftIw ) and so, when b, has stabilized at bo (say), we make the estimate

Al

= (bo + co

- 1)/ro{)A/ 2 •

With the value of roo so gained the process is repeated. If ro has been chosen too large initially, it will be evident from the oscillation of b,. Practical schemes based on this idea have been reported by a number of authors (Kulsrud 1961; Carre 1961; Reid 1966). Another version uses the 12 norm in the definition of b, and changes to roo only when to and roo differ significantly. It may happen that the difference scheme Ax = b can be partitioned so that it can be manipulated into an equivalent system x(1)

= A 1 x (2)

+ bI ,

X(2)

= A 2 x (1 ) + b2 •

There is then the, possibility of using SOR with parameter OJ for the first equation of the system and ro' for the second. The method, known as modified SOR, might be expected to give improved convergence. For details and for information about the symmetric SOR, which has better characteristics than SOR in some circumstances, the reader is referred elsewhere (Young 1971). SOR is simple for the computer to handle and it is quite feasible to contemplate having 10000 mesh points on the cross-section of the waveguide. Some advantage accrues from commencing with a relatively coarse mesh. Then the mesh length is halved and the field values already found used as a starting approximation. The mesh halving can be repeated until a specified accuracy is attained. Usually, this gives quicker results than adopting 'the final mesh at the beginning and it automatically arrives at the mesh size for the desired accuracy. By incorporating it in the computer program complete calculations can be carried out in a minute or two at most of computer time. The rate of convergence is controlled to some extent by the boundary conditions (discussed in more detail in the next section) and, generally speaking, the rate is poorer the more curved the boundary. In the Dirichlet problem the mesh points carrying boundary values end up on the right-hand side of the equations and the matrix A is symmetric. For the Neumann problem, however, A is slightly non-symmetric because of the grid points near the boundary. For this reason it may be beneficial to contruct the difference scheme via a variational formula (Chap. 4) since then the resulting matrix is symmetric.

96


Exercise 21. Find the TEM mode of a waveguide the boundary of whose cross-section consists of (a) two rectangles with the same centre and parallel sides, (b) the same as (a) except that one side of the inner rectangle is zero so that it is a strip, (c) a rectangle containing two circles symmetrically placed with their line of centres parallel to the longer side of the rectangle, and (d) two circles, one containing the other, but not necessarily concentric.

2.6 The dominant mode To unearth the dominant TE mode renders necessary the solution of (Vt

+

fJ~)t/Jl

=0

(2.55)

subject to 8t/J l/on = 0 on C. The field components will, in addition, require numerical derivatives when numerical values for t/! 1 have been ascertained. To avoid this process it has been suggested (Harrington 1968) that (2.55) should be replaced by the system

iJt/! l/OX = - JJ1 u, at/! 1/0Y = -

JJl v,

au/ax + ov/oy = fJl f/J 1

so that the fields are evaluated to the same accuracy as f/J 1. However, the matrices are now tripled in size and, although some benefit may be drawn in faster convergence because the equations are first order instead of second order, it is not at all evident that improvement occurs for the general problem. Accordingly, we shall concentrate on (2.55). The problem is very similar to that of the preceding section with the extra feature that fJ1 has to be found. A way of performing this which has been successful in practice (Davies and Muilwyk 1966) will now be described. Let the difference replacement of (2.55) be (A - A)'I'

= o.

If we can find an eigenvalue A we expect, if the mesh is fine enough, that it will be related to a corresponding eigenvalue of the continuous equation by )\0 = JJ~h2. First a coarse mesh is superimposed on the cross-section of the guide. The eigenvalues of the resulting A are computed. One should be zero, as has been seen in §2.2, but is not wanted, so, if that one is ignored, the smallest eigenvalue is chosen as an approximation to A.; denote it by A.(O). Then solve

A'I'(1) =

).(0)'1'(0)

by the technique of the foregoing section starting from any convenient mesh. The approximation '1'(0) may be chosen to suit one's taste, e.g. with components assuming the values 1 and - 1 on alternate grid points (putting the components all equal to 1 is inadvisable since the constant eigenfunction is to be rejected). Having found '1'(1) a new value of A is prepared from )\0(1)

= 'I'(1)T A'I'(1)/'I'(1)T'I'(1).

(2.56)


97

Next, the approximation '1'(2) is determined from A'I'(2) = A,(l)'I'(1) and an iteration procedure has been set up. It may be expected to have good qualities because of the maximum principle established in §2.4 (cf. Exercise 14) and the properties of the Rayleigh quotient (2.56) because A is positive semidefinite (§1.10). Acceleration in the convergence of l(k) from three consecutive values by Aitken's l5 2 method (§§1.8 and 1.14) should be considered; it will produce substantial improvement if the error is proportioned to a constant power of h. To guarantee that the procedure stays away from the constant eigenfunction the weighted average of 'I' can be subtracted from each value of 'I' after each coverage of the mesh, thereby complying with (2.24). So far little has been said about the approximation to the boundary. The requisite action can be undertaken by the computer and essentially consists of replacing the boundary by a polygon all of whose sides are horizontal, or vertical, or inclined at 45° to' the horizontal. On each horizontal mesh line the mesh points nearest to the boundary and consistent with the above are chosen as the points of the polygonal perimeter (Fig. 2.6). Since the polygon is neither wholly inside nor wholly outside, but varies in a somewhat random fashion, the effects of the perturbations from the true boundary should tend to cancel out especially as the mesh is refined. Greater accuracy will, of course, be obtained if the curved boundary is retained and the procedure of §2.4 adopted but at the expense of some increase in complexity. It may well be best to leave this possibility on one side until preliminary results from the polygon are available. The method can also be applied to calculate the first TM mode. Thus the cut-off frequencies (easily determined from Jl1 and v 1 ) of both types of mode are found. Also the mode impedances Zo(l- Jlf/k 2 ) - 1/2 and Zo(l- vf/k 2 ) 1/2, where

Fig. 2.6. Polygonal approximation of boundary.

98

WA VEGUIDES AND DIFFERENCE EQUATIONS

2 0 = (J,l0/e o)1/2, follow without difficulty. However, the method has certain deficiencies when it comes to higher-order modes as we shall see in the next section.

Exercises 22. Find the cut-off frequency of the first TM mode by the preceding method in (a) a rectangle of sides d, 4d/9, (b) a square of side d, (c) a circle of diameter d, and (d) an ellipse with axes d and id. (The theoretical answers are dV 1 = 7.7352, 4.4429, 4.8100, and 7.5543 respectively.) 23. Find the dominant mode of a circular waveguide of diameter d (dILl = 3.6820). 24. A ridge waveguide has boundaries y = !d, - td ~ x ~ td; x = ±!d, 0 ~ y ~ !d; x = ±id, 0 ~ y ~ ~d; y = !d, -!d ~ x ~ !d; y = 0, -td ~ x ~ -id and id ~ x ~ id. Find the dominant mode. (Experimentally dILl ~ 2.25.) 25. A lunar waveguide consists of a circle of radius !d containing a circle of radius 0.286d with the centres displaced so that a piece of metal of length 0.055d along the line of centres joins the two circles. Show that dILl is about 1.95 in the fundamental mode.

2.7 Higher modes A first attempt to find the second or higher mode might be to start from a reasonable approximation to it and go through the procedure of the preceding section. It can be anticipated that such an attempt would be doomed to failure since the numerical process would almost certainly produce an element of the first mode which would become accentuated as the iteration advanced because (2.56) provides a bound only for the first eigenvalue. To circumvent this, the orthogonal properties of the modes can be invoked. Suppose that the first numerical mode 'I' 1 has been determined and suppose that the vector 'I'(r) for the second mode has been reached after a relaxation sweep across the guide. Then pick b so that 'l'T('I'(r) - b'l'1) = 0 and use 'P(r) - b'P 1 for the next stage of the iteration. If this is done after each sweep the vector should be kept orthogonal to 'I' 1 and (2.56) should approach the relevant eigenvalue. The method has been implemented (Pontoppidan 1969) but needs quite a bit of extra computer time and storage. As an alternative (Beaubien and Wexler 1968; Silvester 1970) rewrite the governing partial differential equation for t/Jn as

C\l; + Jl;-l)t/Jn = (J,l;-l - Jl;)"'n· The positive definite character of the operator has now been lost, and this renders impotent most of the methods for determining the optimal parameter to in SORe But a further application of the operator on the left gives (V~

+

Jl;-1)2t/Jn

= (Jl;-l

- Jl;)2t/Jn

and the property of positive semi-definiteness has been regained.

(2.57)


99

Before pondering difference replacements for (2.57), we must examine whether there are other solutions to (2.57) besides t/J n. Since (2.57) can be expressed as (V;

+ /-l;)(V; + 2/-l;-1 - /-l;)X = 0

we have whence where

x = - _1_ 2/-l;

(x 01/1" + y 01/1,,) + X ox iJy

I .

If Jl; - 1 '# Jl; and X satisfies the same boundary conditions as the sought function t/J" then so must Xl; then 2/-l;-1 - J,.l; would have to be an eigenvalue. So, precluding the case where 2J,.l;-1 - Jl; is an eigenvalue one would hope that only "'" is generated. If J,.l; _ 1 = J.l;, i.e. the eigenvalue is degenerate with at least two possible distinct modal fields, the situation is more complicated. Even though Xl may now be a multiple of t/J" there is no guarantee that different starting approximations for different modal structures will remain disentangled as the iteration proceeds. Indeed, they may tend to the same final result. In that case, recourse must be had to the more complex techniques of Chapter 1. Assuming that no degeneracy is involved we may, after resorting to a coarse mesh for an initial I and eigenvector, adopt the iteration of §2.6 provided that we have the difference formula for A which now emanates from the left-hand side of (2.57). The new ingredient is V:, the biharmonic operator. A difference replacement can be realized by applying the Laplacian twice. When the five-point Laplacian is employed a 13-point formula is supplied, while the mne-point Laplacian originates a 25-point formula. Unfortunately neither of these formulae possesses Young's Property A which has been seen in §2.5 to be a dClirable concomitant for SOR purposes. Providentially, 17-point biharmonic difference formulae are known (Tee 1963) which do have Young's Property A and whose local truncation error is O(h 2 ) . They are

+ 3(Um + 2•n + 1 + Um + 1•n + 2 + Um - 1•n + 2 + Um - 2 , n + l + Um - 2 •n - 1 + Um - L n - 2 + Um + L n - 2 + Um + 2 , n - l ) - 39(Um , n + l + Um - L n + Um• n - 1 + Um + L n ) + 128Um,, } ,

(2.58)

100

V~U ~


-dis

{3(Um + 3 • n + 2 + Um + 2 • n + 3 + Um -

2•n + 3

+ Um - 3 • n + 2

+ Um - 3 •n - 2 + Um - 2 •n - 3 + Um + 2 • n - 3 + Um + 3 •n - 2 ) + 11(Um + 3 • n + Um , n + 3 + Um - 3 • n + Um• n - 3 ) - 177(Um , n + l + Um - 1. n + Um• n - 1 + Um + 1. n ) + 640Umn } . (2.59) Of these (2.59) has greater diagonal dominance than (2.58) and therefore will be preferred in general. If (2.59) is used in conjunction with the five-point Laplacian for the term 2jJ;-1 V: on the left-hand side of (2.57), the appropriate difference formula is prepared. Provided that good starting values are available it should supply higher modes so long as degeneracies do not arise, but then all methods appear to experience difficulty.

Exercise 26. Find the first four modes of each type in the waveguides of Exercises 22-25.

2.8 Direct methods So far attention has been concentrated on the iterative procedure ofSOR. There are other techniques of iteration such as ADI (alternating direction implicit), but since we now wish to discuss a direct method we refer the reader elsewhere for details (Mitchell 1969; Mitchell and Griffiths 1980). The basic idea is to take advantage of the fast algorithms which are available for computing Fourier series (see §2.14). If the cross-section is a rectangle this can be accomplished in a satisfactory manner (Hockney 1965, 1970; Buneman 1969) and such methods are currently the fastest available. For other crosssections one tries to approximate by rectangles and then, by a suitable matrix decomposition, transforms the equations into a style for which Fourier methods are applicable. The method was first described by Buzbee and co-workers (Buzbee et ale 1970; Buzbee et ale 1971) but has been substantially enlarged and extended (Nokes et al. 1974; Nokes 1974). To fix ideas, consider the problem of solving Poisson's equation in a rectangle with sides a and b parallel to the x and y axes respectively. Superimpose a rectangular mesh, the sides of the rectangle being mesh lines. The mesh lines are equally spaced at a distance h along the x axis but the vertical spacing may be variable, being kj between rows j and j + 1 (Fig. 2.7). Let there be m small rectangles along a horizontal row and n along a vertical column. At a mesh point in the interior of the rectangle the five-point difference equation is (Exercise 11)


101

h

t-----+----+-----+----+---""1-----t

j+1 k,

b

.----+----+-----+----+---""1-----tj

a Fig. 2.7. Rectangular mesh for Fourier method.

where Yj = 2h2jkj_l(kj_l + kj), 8j

bj

= 2h2jkJ{k j_ 1 + kj ) ,

= 2 + 2h 2 jkj-1k j

for j = 1, ... , n - 1. For the Dirichlet problem the values of Vi,j at mesh points on the boundary are known and may be transferred to the right-hand side. If these be assumed to be absorbed in the /i,j then we are confronted by the problem of solving (2.60) conditional on U vanishing on the boundary. This suggests an expansion in a Fourier sine series, say m

o., = L

Uj(k) sin(nkijm).

k=l

If the right-hand side of (2.60) is similarly expanded (cr. §1.7), then equating the coefficients of each Fourier harmonic leads to 2 c5 U + 1 (k) + {2 cos(nk/m) - 8j} Uj(k) + yjUj- (k) = h fi(k). j

1

j

Thus for each Fourier mode k, a tridiagonal matrix equation for the unknown Uj is obtained. Each tridiagonal matrix equation may be solved in turn (see Chapter 1) and the cyclic reduction methods of Hockney and Bunemann can be considered. However, we wish to take a somewhat different point of view. Return to the original system (2.60) and express it in matrix form. Adopt the natural ordering of mesh points so that the unknown vector x has components Xl = Ul,l' X2 = U2 , 1" ' " X m = UI , 2' · · · . The matrix equation is Ax

=b

where x is vector with (m - 1)(n - 1) components and A is a square matrix of the same order. The matrix A can be partitioned in (m - 1) x (n - 1) matrices by grouping together all the equations from a single mesh row. The result is

102


that A has block tridiagonal form, namely

A=

(2.61)

where the (m - 1) x (m - 1) matrices L h Mh N, are defined by

Li =

I m - 1 being the unit matrix of order (m - 1) x (m - 1). The array for A contains (n - 1) x (n - 1) blocks but all of them except the L, M, and N are zero matrices of order (m - 1) x (m - 1). Now L h being symmetric, can be reduced to diagonal form by a similarity transformation. In fact, if Aij is the j th eigenvalue of L, we have (Exercise 57 of Chapter 1) Ai j =

-ei

+ 2 cos(jn/m).

If the matrix Q is such that Q-1LiQ is in diagonal form the kjth component of Q is given by Qkj = (2jm)1/2 sin(nkjjm)

(2.63)

and Q-t has the same elements (§1.7). It will be noted that Q does not involve e, so that only a single Q is required for all the L; Also Q -1 MiQ = Mi and Q-1NiQ = N;. Partition x into n - 1 vectors Xl' X 2, ... , X,.-t each with m - 1 elements and partition b similarly. Put x j = QX j and bj = QB j . Then the matrix equation for the Xj will be the same as that for the xj except that each L, will be replaced by the diagonal matrix similar to it and the right-hand side will have B, instead of bj . Consequently, define new vectors Xj and Hj so that CXj)k

= (Xk)j,

(Bj)k

= (Bk)j


where

(Xj)k

signifies the kth element of the vector AjX j

Xj.

103

Then

= Hj

(2.64)

where

c5 m }'m-l

2

A.m-l,j

Thus three steps occur: (i) form Bj = Q-1bj ; (ii) solve the tridiagonal system (2.64); (iii) form xj = QX j . The solution of the tridiagonal sysem can be carried out by any convenient algorithm. One based on Gaussian elimination is as follows: (a) put U 1 = I/Al j and then calculate recursively u, = (A,j - Y'~'_lU'_l)-l for r = 2, , m - 1; (b) put VI = (Bj)t and then form V, = (Hj), - y,U'-l V,-l for r = 2, , m - 2; (c) set (Xj)m-l = Um- 1 Vm - l and then calculate

CXj)m-l-,

= um- 1 - , { Vm -

1- , -

c5 m-

1

-,(Xj )m- r}

recursively for r = 1, ... , m - 2. On account of (2.63) the steps (i) and (ii), in fact, involve the summation of Fourier sine series and, in effect, recover the expansions made earlier. The advantage of the latest approach, however, is that it can be generalized in a relatively straightforward fashion, for whenever A has the block tridiagonal structure (2.61) one can follow the same procedure even if L i , Mi , and N, do not have the explicit form (2.62) provided that a single matrix Q can be found which makes them all diagonal in a similarity transformation. Suppose, for example, that the Dirichlet condition on the top side is changed to a Neumann condition in which aulan is given. The Neumann condition is implemented in the form analogous to (2.44), namely that

is given where Yn = 2h2 Ik;_I and en = 2 + 2h2 Ik;_I which boils down to the standard interior difference operator applied at a mesh point on the boundary with the value at the point outside equal to that at its reflection in the boundary. There are now n(m - 1) unknowns so that A is now an n x n block array, but otherwise its structure is precisely the same as (2.61) and no change in Q is necessary to operate the procedure. Similarly, if a Neumann condition is imposed on the right-hand side while Dirichlet conditions apply on the other three sides, there are (n - I)m unknowns. A is still an (n - 1) x (n - 1) block array though the submatrices Li, Mh and N; are m x m but, apart from that, the only change is that the 1 in the lowest

104


row of L, becomes 2. In this case

Ai j

= - ei + 2 cos{(j - t)njm}

and, although L, is no longer symmetric, Q exists with Qki

= (2/tn)1/2 sin{k(j - t)n/m},

(Q -1 )kj = (2/m)1/2 sin{j(k - !)n/m}

(j

=1=

m),

(Q -1 )km = t(2/m)1/2 sin{(k - !)n}.

Again, the procedure may be followed through. These are typical problems for which fast Fourier techniques are relevant so long as we choose the denominators in the arguments of the Fourier series to be powers of 2. Let us now consider the Dirichlet problem for the rather more complicated boundary of Fig. 2.8. Continue to draw the mesh so that each side of the boundary lies along a grid line. The mesh points are placed in their natural order but with those on the interface (shown broken) between the two rectangles omitted. Let the non-interface grid points form the vector Xl of order M, say, and let the interface mesh points in their natural order make the vector X2 of order N, say. Then the unknowns can be represented by the partitioned vector

and the difference equations may now be written as (2.65) The matrix A, coming from the interior points of the rectangles, is of order M x M and has the same block structure that has been described above. The matrix B, of order M x N, represents the influence of interface points on adjacent interior points, and therefore consists mainly of zeros. The matrix D, of order N x N, is drawn from the coefficients for the interface points in the difference equations which hold on the interface.

....--.------------

Fig. 2.8. Composite region for Fourier method.


105

The aim is to extract the A so that the fast Fourier transform method may be applied to it. Assume A is non-singular. Then there is a matrix E and vector y such that AE=B, Ay=b l . Consequently,

Xl

= Y-

EX 2

and CX 2

= b2 -

BTy

(2.66)

where C = D - BT E. Therefore (2.66) is first solved for X 2 and then Xl is determined from AX I = b l - Bx 2 . The success of this strategy depends upon C, known as the capacitance matrix, which is independent of b I and b2 , being solely a function of the differential equation and the geometry. Once it is known it does not need to be recalculated when repeatedly solving the same differential equation in the same region. Usually it can be computed without too great an effort or too large a demand on storage. Equation (2.66) can be solved by splitting by triangular decomposition (§1.12). The calculation of C and its triangular factors is often known as

preprocessing. Next, turn to the topic of what happens when part of the boundary is not rectilinear, as when a corner in the upper rectangle in Fig. 2.8 is torn off (see Fig. 2.9). The artifice now is to use the same rectangular mesh as in Fig. 2.8 and introduce, as new points to be taken into account in the difference formula, the intersections of the mesh lines with the curved boundary. Set up the usual equations for the region employing, as necessary, the formula for arbitrary spacing (Exercise 11) for interior points near the curvilinear boundary. Do the same for the additional region between the arch and the broken straight lines, prescribing U on the broken lines in any convenient fashion. These last equations solve a Poisson problem outside the domain of interest and are therefore strictly unnecessary. However, by adding them to the first set we are able to preserve the rectangular structure and that is more important than solving two Poisson problems simultaneously, one of which is superfluous. Actually, the rectangular configuration has not quite been conserved because of the irregular spacing formula adopted near the curved portion of the boundary. To overcome this some artificial variables are added. It will fix ideas

Fig. 2.9. Non-rectilinear boundary.

106


if a simple example is studied. Suppose that the shape of a regular equation is VI + V2 = f1 and of the irregular is «U, + pU2 = fl. The latter can be shifted into the guise of the former by inserting a new variable V 3 and writing VI

+

U2

= f1

-

V3 ,

V3

= (rl -

l)UI

+ (P -

1)V 2 •

Thus an additional equation is involved and the artificial variable also contributes to the right-hand side of the regular equations. Hence, in our problem, sufficient artificial variables (and their attendant extra equations) are added to render all the irregular equations regular. Denote the artificial variables by the vector X 3• Then, the system is

(

A BT

B D

F)(XI) (b 0 x = b

G

H

0

i

2

2

x3

b3

(2.67)

The matrix F represents the addition of the artificial variables to the standard whereas G, H, and b3 are concerned with the extra equations required with the artificial variables on the right-hand side, i.e. in b3 . Now define new matrices by B1

= (B

F),

B2

=

(:T).

D1

=

(~ ~)

and new vectors by

Then (2.67) can be expressed as

and the same path as traced for resolving (2.65) may be followed. Accordingly, solve first Cz = c - B 2 y (2.68) where now C = D1 - B 2E 1 , AE I = B 1 , and then find Xl from AX I = b, - BIZ. Notice that, in fact, the artificial variables are confined to (2.68). Again, the problem has been reduced to dealing with a capacitance matrix following by fast Fourier transform techniques. It is clear that, despite attention having been concentrated on the Dirichlet problem, other boundary conditions can be handled because of the generality of the procedure. There is, however, one exceptional case and that is when all the boundary conditions are of Neuman type, for then the matrix A is singular (cf. §2.4) contrary to the assumption made in solving (2.65). Nevertheless, it turns out that by suitable modification it is still possible to devise a similar procedure. For details the reader is referred to the papers already cited.

107

WA VEGUIDES AND DIFFERENCE EQUATIONS

Eigenvalues may also be tackled in this way. If the difference equation for Poisson's equation is Ax = b then the corresponding eigenvalue problem for Helmholtz's equation is Ax = AX. Inverse iteration (§1.14) will then supply the desired answer via (A - A')X(r) = x(r-l) (2.69) where ),,' is some approximation to the sought eigenvalue and the process is initiated by some x(o>. Considerable saving in computer time in grappling with (2.69) is achieved by omitting the intermediate fast Fourier transform between successive iterations. For, according to (iii) following (2.64) an interation will end by manufacturing xy> = Qx~r> and the next stage will commence with (i) in which X y> = Q- 1 xY> is formed. Therefore Xj found in step (ii) of one iteration can be carried straight over to stage (ii) of the next iteration. The relative merits of SOR and the direct method of this section are difficult to assess for general domains. Broadly speaking, it seems reasonable to expect that the more nearly a cross-section approximates to a union of rectangles the more advantageous the direct method is likely to be. For a highly crooked boundary it is not at all obvious which will have superiority.

Exercises 27. For a rectangular domain with the Neumann condition on the left side and the Dirichlet condition on the right side show that

Aij =

-ei

+

2 cos{(j - t)n/m}

and that Qkj = (2/m)I/2 sin{n(m

+

1 - k)(j - !)/m} ,

(Q -1 )kj = ej (2/m)I/2 sin{ n(m

+

1 - j)(k - !)/rn}

where el = t and ej = 1 ifj i= 1. If-the Dirichlet condition is replaced by a Neumann condition prove that the corresponding results are Aij

=

Qkj = (Q-l)kj

-ei

+ 2 cos{(j - l)n/rn},

= (2/m)I/2jj cos{(j -

l)(k - l)1t/m}

where 11 = Im+ 1 = ! and Jj = 1 if j i= 1 or m + 1. 28. If periodic boundary conditions are applied to the rectangle, i.e. U takes the same values on the left and right sides prove that

Aij =

-ei

+ 2 cos{2n(j

- 1)/m} ,

Qkj = (2/m)I/2gj cos{2n(k - I)(j - I)/m}

= (2/m)1/2

sin{2n(j -

!n -

l)(k - 1)/m}

(1 ~ j ~ 1 + 1m) (I

+ !m < j

~ m)

where gl = gl +tm = IJ2 and gj = 1 if j i= 1 or 1 + !m. For (Q-l)ltj interchange j and k on the right-hand side except in gj. 29. Explain the detailed steps that would be necessary to undertake preprocessing for (2.67).

108


30. Calculate the number of operations required for preprocessing and also for the complete direct method. 31. Obtain the form of the matrix equations when the boundary condition is (2.41). 32. Find the eigenvalues of the first five TM modes in an L-shaped waveguide.

2.9 Other equations

Although the theory so far developed has been discussed in the context of two-dimensional equations there is no difficulty in principle in taking it over to problems in higher dimensions, e.g. Poisson's equation in a box. The storage and time requirements are greater but the principles are unaffected. Also there is no reason why the coefficients in the partial differential equation should not depend upon the coordinates though the direct method effectively excludes dependence on x. Nor is the theory restricted to Cartesian coordinates, though the order in which to take the nodes of the mesh to supply desired matrix properties may not be quite so transparent. For example, the Laplacian with axial symmetry

has the discrete analogue k ) Um,n+i h12 (Um+l.n + Um- i,n) + k12 ( 1 + 2r n

1(

+ k2

k)

1 - 2r Um,n- 1 n

-

2(h2

+ k 2 )Um,nlh 22 k

on a rectangular mesh, hand k being mesh lengths parallel to z and r respectively. There is nothing fresh to be said about SOR but, for the direct method, Fourier analysis would have to be parallel to the z axis. For two-dimensional polar coordinates where the Laplacian is

a mesh of radial lines through the origin and concentric circles can be tried. The mesh length parallel to r can be selected as a constant k but that in the () direction will depend on r, being r"()o on r = r" where ()o is the angle between the radial lines. A discrete analogue in this case is


109

The Fourier analysis in the direct method is now along the () direction, with periodic boundary conditions if the whole range 0 ~ () ~ 2n is involved. When the origin lies in the domain the appropriate formula there is

4(}2

2

r1

0

{! s

(Vl,l + V2 , 1 + · . · +

o; 1) -

Vo,o}

where s is the number of grid points on the innermost mesh circle. 2.10 Conformal mapping The convenience of rectangular boundaries for numerical methods suggests that it may be worth while transforming the curved boundary of a waveguide into a rectangle. In fact, this method was attempted quite early on (Meinke, Lange, and Ruger 1963; Meinke and Baier 1966; see also Howe 1973). Define the complex variable' by , = x + iy. Then the conformal mapping w = F( () will convert the cross-section in the , plane into another one in the w plane. Further, eqn (2.26) becomes

V;t/Jm + Jl;'t/Jm/I F' 12 = 0 Laplacian in the w plane and F' = dF/d(. The equation has

where V~ is the to be solved under the condition that the normal derivative of t/Jm vanishes on the transformed boundary. Further, ¢m satisfies a similar equation with apposite boundary condition. If F be such that the new boundary is rectangular, the original problem of a waveguide of given shape containing a uniform isotropic medium has been converted to one for a rectangular guide filled with a uniaxial transversely inhomogeneous medium. When an explicit formula is available for F, the numerical procedures delineated earlier can be considered for deriving the solution. If the initial boundary is highly curved this may be beneficial since some of the errors associated with boundary fitting will be eliminated. If this is not so or a closed form is not to hand, the value of conformal mapping before starting numerical work is dubious. In particular, if F has to be calculated numerically it is fairly certain that the conformal mapping should be abandoned. 2.11 Waveguides containing dielectric While the interior. of a waveguide is often composed of a single homogeneous dielectric, such as air, this is not by any means always true. When, however, the cross-section consists of two or more separate media as in Fig. 2.10 the form of the modes is changed in general. Assume that the z dependence is still e- hcz • Then eqns (2.17)-(2.20) continue to hold between C and Ct. They are, in addition, valid inside C l provided that u, 8, and k are replaced by Jll' 8 1 , and

k1

= W(Jlt 8 t ) 1/ 2•

110


Ji,E

c Fig. 2.10. A dielectric cylinder within a waveguide.

As far as TEM modes are concerned, they would require K 2 = k 2 and K 2 = ki which is possible only if ue = Pt f,t. In general this will not be true and so TEM modes can be ignored in waveguides containing two or more dielectric media. If, indeed, they are present, then Laplace's equation is satisfied throughout with extra boundary conditions on Ct. Since these are particular cases of those to be dealt with later, it is not difficult to devise the relevant numerical techniques (see, for example, Seeger 1968) and so we shall not consider TEM modes further. The boundary conditions are that E; and oHz/on are zero on C while the tangential components of E and H are continuous across Ct. This implies four conditions on C t , namely that

must be continuous across Ct. First examine whether TE and TM modes are possible. If E; = 0 then Hz and {K/(k2 - K2 )}(oHz/osl ) must be continuous across Cl. Since oHz/os l is a tangential derivative this is possible only if k = k 1 or oHz/os t = O. Therefore, apart from these special exceptions, no TE modes can exist. Similarly, no TM modes exist unless k = k 1 or oE z /osl = Consequently, it can be safely accepted that in general the field will not split into TE and TM modes. As a result, two partial differential equations, coupled through the boundary conditions on C t , have to be dealt with in each region simultaneously. The eigenvalue, effectively K, appears in both partial differential equations as well as the boundary conditions on C l . In order to assure that the dimensions of the unknowns are the same it is convenient to put t/J = Hz and 4> = (tto/f,o)t/2E z. If a square mesh is employed, then in one region both t/J and 4> satisfy

o.

t/Jm.n+l

+

t/Jm.n-t

+ t/Jm+l.n + t/Jm-t.n - 4t/Jm.n + vt/Jm.n = 0

WAVEGUIDES AND DIFFERENCE EQUATIONS .N

• W

•

111

p,E

•

P

E

11,,£,

·S

Fig. 2.11. Nodal points for a horizontal dielectric interface.

where v = k 2 - ,,2, and in the second region they satisfy the same equation with v replaced by vir where r = (k2 - K 2 )/(ki - K 2). For a point on the dielectric interface consider the horizontal boundary shown in Fig. 2.11 for simplicity. Then I/J(P) = 1/J(1)(P) and cP(P) = cP(l)(p) where the superscript (1) indicates values appropriate to the lower region. Also (1

+ Jl'r){I/J(E) +

t/J(W)}

+

2Jl'rl/J(1)(S)

+

21/J(N) - K'(r - 1){cP(E) - cP(W)}

- 4(1 + Jl't)I/J(P) + v(1 + Jl')I/J(P) = 0, (1

+ e'r){cP(E) +

cP(W)}

+

2e'rcP(l)(S) + 2cP(N) - ,,"(1 - r){-/J(E) - t/!(W)}

-4(1 + e'r)cP(P) + v(1 + e')cP(P) = 0 where ,,' = KJ JlolwJlJeo, x" = KJeolweJ Jlo = Jl6oK'IJloe, Jl'~ JlI/Jl and e' = e 1/e. Let the unknowns be ordered so that all cP except interface values are taken first, then all t/J apart from interface values, and finally interface values are added. The matrix equation now takes the form (2.70) when the dielectric has rectangular sides parallel to the walls of a rectangular guide. Although excellent methods (Peters and Wilkinson 1969) based on Sturm sequences (§1.14) are known for equations of the type (2.70) they are not necessary when D 2 is diagonal, for then we put y = D~/2X where D~/2 is the square root of D2 (Exercise 46, §1.10) and then (2.70) goes over to Di 1 / 2 ADi 1 / 2 y = vy. Further multiplication by diagonal matrices may be executed if modification to elements stemming from Neumann conditions is desired. The equation has now been cast into a shape in which the application of previous procedures can be considered.

Exercises 33. A rectangular guide of sides a and b (a > b) has permittivity £1 for 0 ~ y ~ d and permittivity e for d ~ y ~ b. If u = PI' check whether the matrices above have Young's Property A. 34. A circular guide of radius a contains a concentric circle of dielectric of radius b. Find the first four modes.

112

-(a)

(b)

(c)

(d)


Fig. 2.12. Microstrip transmission lines.

2.12 Microstrip transmission lines The name microstrip is an abbreviation whose purpose is to describe a microwave circuit which is fabricated by printed-circuit techniques. The microstrip transmission line usually consists of a conducting ribbon or ribbons mounted on a dielectric substrate which is often backed by a large conducting plane. Some typical examples are shown in Fig. 2.12. Note that Figs. 2.12(a) and (b) are essentially equivalent. When the line is totally enclosed, as in Figs. 2.12(c) and (d), the line is often referred to as shielded microstrip. Their simplicity of manufacture carries obvious economic merit as well as advantages in size and reliability. Designing them so that reflections, loss, and spurious coupling are kept to a tolerable level is a far from trivial task. Even when the frequency is low enough (say at the bottom end of the gigahertz range) for the propagation to be regarded as near enough to TEM for a quasistatic approximation to be acceptable, theoretical analysis is difficult. In spite of that, first attempts were in this direction (Wheeler 1965; Stinehelfer 1968). Nevertheless, it was not long before the problems were tackled by means of difference equations and there are now several papers on the subject (Green 1965; Bryant and Weiss 1968; Whiting 1968; Silvester 1968; Cermak and Silvester 1968; Gelder 1970; Hornsby and Gopinath 1969; Corr and Davies 1972). From our point of view the problems for shielded microstrip lines are natural extensions of those of the preceding section. There are now extra boundary conditions to be satisfied on the conducting strip(s) where 4> must vanish. If these conditions be added on at the end of the other equations the structure of (2.70) is unaltered so that the numerical methods which are applicable there are also available here. It has been suggested (Mittra and Itoh 1971) that it is profitable to tackle


113

eP=O I I

:otP =0

e

lax

I I

b

eP=1

I

yrIY~ tP=O I I I I I

:.

x

I

I

I

I

I

I

.:

Fig. 2.13. Quasistatic problem for the microstrip of Fig. 2.l2(a).

microstrip lines by a hybrid approach in which a certain amount of analytical preprocessing is carried out before any numerical work is tried. It could be anticipated that this would be so when a good portion of the problem can be resolved in an exact fashion by analysis and some numerical difficulties thereby removed. As an illustration consider the problem of Fig. 2.12{a), with a shielding conductor above, operating at a frequency where the quasistatic approximation is valid. Confining attention to fields which are symmetric about the centre line, we have to find a solution to Laplace's equation which complies with the boundary conditions depicted in Fig. 2.13. The following Fourier series expansions for 4> may then be assumed 00

cP

= y/d + L

n=1

= (b where

y)/t

+

L

(0
1, 0 < y < d)

en sin ctnY exp{ - ctn(x - I)}

sin ctnd . en -.SIn cxn(b - y) exp{ -cxn{x - I)} n=1 sm (ln t

=L 00

where the

(Xn

(x

>

I, d

< y < b)

are positive solutions of

e1 cos(cxnd) sin ctnt or (el

+

e) sin f.Xnb

+

+ s sin ctnd cos cxnt = 0

(8 -

8 1)

sin cxn(d - t) = O.

(2.71)

It will be observed that these expressions automatically satisfy the boundary

114


conditions on y = 0, d, b, x = 0 and behave properly as x -+ 00. The continuity of ¢ and o¢lox on x = 1 have still to be guaranteed. This can be achieved by req uiring that

f:

4> sin PnY dy,

{b

4> sin }'nY dy,

and the corresponding quantities with al/Jlox in place of 4> are the same as x approaches I from above and below, Pn being nnld and Yn being nnlt. Hence

~ - -1 + 2Id()n an cos h 13 ,.1 = i..J 13n

m= 1

C 13n 2--2' ~m - Pn

m

(2.72) (2.73)

h I ~ - -1 + 2:1 t()nb n COS Yn = i..J -

emyn 2' m= 1 am - Yn

'Yn

1

2 t( -)

2

m nb· h I ~ n sin }' n = c: - 2 C ~m2 m= 1 ~m - i',.

(2.74) (2.75)

where C; = em sin amd. Eliminating an from (2.72) and (2.73), and b; from (2.74) and (2.75) we obtain

(1_ + ~) ~ L em (1 -_- + --", + em

(~n - 1)/Pn = ~ 00

m- 1

('1,. - l)/Yn =

am

P

n

am

Y,.

(Xm

P ,

'1,.)

00

m= 1

am

(2.76)

n

(2.77)

i,.

where ~n = exp( - 2Pn l ), Yin = exp( - 2Yn l). To tackle (2.76) and (2.77) let z be a complex variable and consider the meromorphic function j(z) defined by j(z)

= Kg(z)h(z)

(2.78)

where g(z) = exp (

x

!

Z{d In (bid) + t In (bit)})

Ii (1 -

n z/Pm~

z m= 1 h(z)

=1+

exp(zd/mn)(1 - z/}'m) exp(zt/mn) , (1 - zlam) exp(zblmn)

f (1 - AmzlPm + 1 - BmzlYm )

m= 1

and K, Am' and Bmare constants to be determined. As Izi -+ 00 the dependence of g on z can be derived by supposing m to be large so that am can be


115

approximated by mttfb. Then, the formula

n {(I + zlm) e00

liz! = eY%

z m / } ,

(2.79)

m=l

where "I is Euler's constant, implies that the asymptotic behaviour of the infini te product in g is

( - zb/n) !/( - zd/n)!( - zt/n)! apart from a multiplicative constant. Stirling's formula z! '" (2n)1/2 exp{(z

+ t) In z -

z}

(2.80)

then supplies

Izr 1/ 2 expe{d In d + t ~n t - bIn b}) unless z is positive real. Consequently, g(z) '" [z]" 3/2 as Izl -+ 00, except possibly on the real axis. It follows that f has the same asymptotic behaviour provided that h is bounded away from its poles. Also, if C is a closed contour enclosing z = ±Pn and the origin, 1 -.

f

2nl c

(1-

z - p,.

e)

+ _n_ z + Pn

f(z) dz

=

L R(rx m ) ( 1

+ en) + R(O) Pn (Im + Pn {en - 1}/Pn + f(Pn) + enf(- Pn) (2.81)

m= 1

X

(Im -

where R(zo) is the residue of f(z) at z = Zo and the upper limit of summation is governed by the number of poles am within C. If C moves off to infinity so as to embrace the whole plane the left-hand side tends to zero because of the behaviour of f at infinity. Therefore, if the constants at our disposal are elected to make

+

enf( - Pn)

=0

we recover (2.76) and can find Cm from C; with if

= R(rx m ).

Similarly, (2.77) is coped

R(O)

=-

f(Pn)

I,

f(}'n)

+

tlnf( - "In)

= o.

(2.82)

(2.83)

The problem has thus been converted into discovering Am' Bm, and K so that (2.82) and (2.83) are complied with. Now en decreases exponentially as n increases so that f(Pn) and hence An is exponentially small as n -+ 00. Similarly, B; decays exponentially; so, as Izi -+ 00, h is bounded and f '" Izl- 3/2 and a self-consistent scheme has been arrived at. Therefore, for a good approximation, it should be sufficient to ignore any Am or Bm after the first 10 or so and just solve the first few of (2.82) and (2.83). It is evident that the advantage of this process lies in the rapid convergence of the series in Am and Bm as compared with those for em in (2.76) and (2.77). Remark also that replacing en by 1 in (2.81) leads, via (2.73) and (2.82), to

116


As a consequence, the charge density on the lower surface of the strip y = d, x < 1is

o
= 0 show that the resulting equations can still be solved by a pertinent f(z). 39. Make a critical comparison, including numerical values, of methods for the boxed microstrip line. 40. The boundary condition on x = 0 in Fig. 2.13 is changed from o4>/ox = 0 to 4> = 0, corresponding to antisymmetric modes. Find the f(z) which is germane to this problem.


117

41. If, in Fig. 2.13, the metal strip is removed from 0 < x < 1, y = d, and strip is added on 1 < x, Y = d, a slot line is produced. Can you discover an appropriate f(z) in this case? 42. In the coupled line of Fig. 2.l2(d) the side walls are taken to infinity. Show that two functions 11(z) and 12(Z) are needed to resolve the matching equations.

2.13 Other methods for guides This chapter has concentrated on a group of methods for tackling the propagation of waves in guiding structures. These are not, by any means, the only ways which have been devised for attacking these problems. The next chapter will be concerned with a variety of approaches which may be classed under the general heading of variational methods. Some of these are relevant to waveguide calculations. However, before turning to a discussion of these matters, we shall devote the next section to giving the basic facts about fast Fourier transform techniques. 2.14 The fast Fourier transform There are many occasions when one is faced with calculating a Fourier series or Fourier transform of a function. Let us deal with the case of series (Cooley and Tukey 1965) first so that we require N-l

X(j)

= L

(2.84)

a(k) exp(2nijk/N)

k=O

for j = 0, 1, ... , N - 1 with given complex coefficients a(k). Suppose that N is a composite number so that it has integer factors n1 , n2 such that N = n 1n2. There are integers jg.j. such thatj =jl nl + jo with 0 ~jo ~ 1, 0 0, so A must be real, i.e. the eigenvalues of a self-adjoint operator are real. Also, if x j is an eigenvector corresponding to the eigenvalue Aj'

since Ak is real. If Aj #= Ak then ix], x k ) = 0, i.e. the eigenvectors of distinct eigenvalues are orthogonal. If )\,j = Ak linearly independent eigenvectors can be arranged to be orthogonal by the Schmidt process of §1.5. Thus any countable set of eigenvectors of a self-adjoint operator can be regarded as forming an orthonormal set. A linear operator is said to be compact or completely continuous if and only if for every infinite sequence {Yj} of bounded elements (i.e. II Yj II ~ c for all j) the sequence {TYj} has a convergent subsequence. A compact operator is necessarily bounded. For, if T were not bounded there would be a sequence {Yj} with II Yj II = 1 such that II TYjl1 > j for j = 1,2, ... and there could be no subsequence of TYj which converged. A compact self-adjoint linear operator will be denoted by C. We now wish to show that if C is on H into H

IICII = sup !lxll =

(Cx, x). 1

Firstly,

I(Cx, x)1

~

IICxl1 = IICII

so that

sup(Cx,

x) ~

IICII.

Secondly,

IICxl1 2 = (Cx,

Cx)

= (C 2x,

x)

= (C 2x/a,

ax)

OPERATORS AND EIGENVALUES

130

where a 2

= [Cx]: Hence

//Cxl/ 2

= i(C[ax + Cx]a - {ax - Cx/a}], ax + Cx]a + {ax - Cx/a}) = i[(C {ax + Cx/a} , ax + Cx/a) + (C {ax - Cx/a}, ax - Cx/a)] ~

t{ [ex + Cx/all2 + llax - Cx/all 2 } sup (Cx, x)

~

!{a 11 x ll + /ICxI/ /a } sup(Cx, x)

~

[Cx] sup(Cx, x)

IIxll =

2

2

2

1

2

whence sup(C, x) ~ [Cx] so that sup(Cx, x) ~ IICII. Combining the two inequalities we have IICII = sup (Cx, x). /lxl/ = t

Asa consequence there is a sequence {Yj} with II YjII = 1 such that I(CYn, Yn)1 -. IIC II as n -. 00. However, I(CYn' Yn)1 ~ IICYn II ~ IIC II and therefore IICYn II -. IIC II also. Suppose that the limit of (CYn' Yn) is denoted by At; then At = ± IIC II. Also IICYn - A1Yn 11 2 = IICYn 1/2 - 2A 1(CYn, Yn) + Ai -. 0 as n -. 00. Therefore CYn - )"1 Yn converges to zero. Since C is compact, the sequence {CYn} contains a subsequence {CYnk} which converges to an element of H. On account of the result just proved {Ynk} converges to the same element x (say). Since C is bounded it is continuous and so {CYnk} converges to Cx. Hence Cx = AtX i.e. x is an eigenvector and At, with IAtl Accordingly we have

= IIC II, the corresponding eigenvalue.

THEOREM 3.3. For a compact self-adjoint linear operator C on H into H the extremum problem I(Cx, x)l/llxll 2 = maximum

has solutions. Each solution is an eigenvector and the eigenvalue is equal ill modulus to the maximum attained. Every such compact operator has at least one eigenvalue different from zero. Alternatively, 1)~11 = IICII = maxIICxll/llxll. Again, if A1 > 0 it will be the maximum of (Cx, x) under IIxll = 1; correspondingly. if A. t < 0 it will be the minimum of (ex, x). Let Xl be one of the eigenvectors determined by Theorem 3.3 and consider the same extremum problem with elements x orthogonal to Xl' The entire argument may be repeated except that we work in a subspace of H. The process may be carried out again and again. Thus the eigenvector x, is a solution of the

PRELIMINARIES

extremum problem I(Cx,

131

x)l/llxll 2 = maximum

under (x, x m) = 0 (m = 1, 2, ... , n - 1) and the corresponding eigenvalue An is equal in modulus to the maximum. Obviously IAll ~ IA 2 1 ~ •.•• When H is of finite dimension the process stops when the finite number of eigenvalues has been discovered. When H is of infinite dimension the possibility of an infinite number of eigenvectors for a single eigenvalue must be examined. Suppose the eigenvectors form an orthonormal set {xU)} (which we know to be permissible) and that the eigenvalue A :f:. O. Then the sequence {x U)/ ),, } has II XU)/A II ~ I/IAI and so must have a subsequence such that {CxU)jA} or {xU)} converges to an element of H. However, this is impossible since !IxU) - X(k) 11 2 = 2 for j "# k. Hence, corresponding to a non-zero eigenvalue, there is only a finite number of linearly independent eigenvectors. The number of linearly independent eigenvectors for a single eigenvalue is called its multiplicity; if there is only one eigenvector the eigenvalue is called simple. What has just been established is that every non-zero eigenvalue is offinite multiplicity. Suppose now that )"n does not tend to zero as n -+ 00. Then, if x, is the corresponding orthonormal eigenvector,the argument of the previous paragraph may be applied to the sequence {Xn/A n} to show that a contradiction arises. Hence, in an infinite dimensional H, it is necessary that An -+ 0 as n -+ 00. This last result demonstrates that C cannot possess a bounded inverse in an infinite dimensional H for (3.10) is violated. It is not, of course, assumed that C has an infinite number of non-zero eigenvalues. It may happen that An = 0 for all but a finite set of integers n. There is an important expansion theorem associated with compact selfadjoint operators which we consider for infinite dimensional H. Let x be any element of H and put Yn

=X

n

-

2:

m=l with the eigenvectors orthonormal. Then (Yn, x m) = 0

(x, xm)x m

(m = 1, ... , n)

and therefore IICYnll ~ 1)"n+lII1Ynll·

Since (Yn' Yn) = (Yn' x), II Yn II ~ [x]; also An+ CYn converges to zero, i.e,

1

-+

0 as n -+

00

and it follows that

00

Cx

= L

)"m(x, xm)xmo m=l Now, by Bessel's inequality (§1.5), L:=l I(x, xm)1 2 ~ IIxII2 so that the series of numbers on the left is convergent. Hence

I/ktm (x, Xk)Xkl/

Z

= kt I(x, XkW

m

132


must tend to zero as m and n tend to infinity. Hence the completeness of H guarantees that there is y E H such that

can be made as small as we like by choosing n large enough. Because of the boundedness of C

is correspondingly small. In other words 00

= L Am(X, xm)x m·

Cy We conclude that Cx

m-I

= Cy or x = y +

= O.

Consequently

L (x, xm)x m·

=z +

x

z where Cz

00

m=1

In summary we have THEOREM

3.3a. The eigenvectorsform allorthonormal set such thatfor any x E H 00

Cx

= L Am(X, xm)x m, m=1

00

x where Cz

= o.

=z+ L

m=1

(x, xm)x m

If Cz :1= 0 for any z :f:. 0 the set is complete and 00

= L (x, xm)x m·

x

m=1

One particular consequence of Theorem 3.3a is important. Suppose that A. is neither zero nor an eigenvalue. Then, from the first equation of Theorem 3.3a and (Cx, x m) = Am(X, x m),

Cx

=

z, L= - -A. (Cx oo

j

m

~

1 Am -

AX Xm )x m ,

which may be rewritten as

A _m_ (Cx - AX, xm)x m. A. A m= I Am - A The definition of A entails the existence of (C - A) -1. Therefore, if Cx - AX 1

x = -~ (Cx - AX)

1

+-

L 00

= y,

we have

(3.12)

PARTIAL DIFFERENTIAL EQUATIONS

133

Another offspring of Theorem 3.3a is that, if (x, x m) = am, 2

(x, Cx) _ L~= 1 Amla m l

~Cx112 - L~J=l A~lamI2· Hence, if all the eigenvalues are positive,

1 . (x, Cx) -=mln--

(3.13)

Ilexll

At

and other eigenvalues are given by minimizing the same quantity with x orthogonal to the earlier eigenvectors. These theorems have far-reaching applications but they are only available when it can be proved that an operator is both compact and self-adjoint. Since compactness implies boundedness the discussion is limited to bounded operators. The demonstration of self-adjointness is usually the easier of the two. Often it is relatively easy to prove that T is symmetric, perhaps not for the whole of H, but only for a dense subset S of it. However, given a bounded operator T on a dense subset, there is a unique bounded operator To on H such that Tox = Tx for XES and II To" = II Til. Effectively, To is defined by saying that Tox = Tx when XES and, when x ~ S, we select a sequence {x n } from S such that [x - x; II -+ 0 as n -+ 00, defining Tox as lim n -+ oo TXn i.e. II Tox - TXn II -+ O. By this device, if T is symmetric on S, To is symmetric on H and hence self-adjoint. Therefore, it is sufficient for the self-adjointness of a bounded operator to check that it is symmetric on a dense sub-space of H. To deal with compactness we must either show that the operator satisfies the original definition or use one of the two following sufficient conditions. The first is that, if for every e > 0 there is a compact operator 1;. such that

IITx -

~xll ~

ellxll

for all x E H then Tis compact. The second says that, if Tis a bounded operator on Hand Yj' Zle are two complete orthonormal sets, then T is compact if 00

00

L L

j= 1 Ie = 1

I(TYj, zle)1

2

is finite.

PARTIAL DIFFERENTIAL EQUATIONS 3.4 Integral and partial differential equations Let us first examine an integral operator Tx

=

f

k(s, t)x(t) dt


134

with inner product

(x, y) =

I

x(t)y*(t) dt.

Although only one variable of integration is shown explicitly, the integrals may, in fact, be over an n-dimensional region. The Hilbert space is composed of those functions such that Ix(t)1 2 dt is finite. Assume that JJ Ik(s, t)1 2 ds dt is finite. Then we already know that T is self-adjoint if and only if k(s, t) = k*(t, s). As to compactness, consider the criterion given at the end of the last section, since T is clearly bounded. Suppose that we have two complete orthonormal sets Yj' Zk; such sets can be constructed using, for example, polynomials or Fourier series as building blocks. Then, by eqn (1.32)

J

k~1 I(TYj' zk)1 2 =

II TYjll2

But, by Bessel's inequality

jt If

k(s, t)yi t) dtr

=

=

III

JI If

k(s, t)yP) dtl2 ds.

k*(s, t)yj(t) dtl2

: :; flk(S, tW dt by regarding the integral as the inner product of k* and Yj and hence

Jl k~l

I(TYj' ZkW

:::;;

I Ilk(S, tW dt ds < 00.

Consequently, T is compact. Notice that there is no necessity in this proof for T to be self-adjoint. If k(s, t) = k~(t, s) so that T is self-adjoint, as well as compact, the theorems of the preceding section are available. Thus, the integral equation

f

k(s, t)x(t) dt = AX(S)

will have square-integrable solutions only when l is an eigenvalue and for each eigenvalue there will be only a finite number of linearly independent eigenvectors. According to Theorem 3.3a, if X(t) is any square-integrable function

I

k(s, t)X(t) dt

where

Xm

=

m~l AmXm(S)

I

X(t)x:(t) dt

(3.14)

is a typical eigenvector of the integral equation. If, in addition, the


f

only solution of

k(s, t)z(t) dt

is z

= 0, then X(s)

135

=0

= m~l xm(s) f X(t)x:(t) dt.

(3.15)

Furthermore, the solution of

f

k(s, t)x(t) dt - AX(S)

= y(s)

where y E H is, by (3.12), 1 )"

x(s) = - - y(s)

+ -1 ~ i..J

AmXm(S) A m = 1 Am - A-

f

y(t)x:(t) dt

(3.16)

(3.17)

if )" is neither zero nor an eigenvalue. There are naturally applications in which integral operators are not square integrable, e.g. potential theory in higher dimensions. Here it may be on the cards to write k = k1 + k 2 where k, leads to a compact self-adjoint operator on the above H whereas k 2 leads to a bounded self-adjoint operator (this does not force k2 to be a bounded function of course). Then k gives rise to a bounded self-adjoint operator. If, further, the split can be arranged to depend on a parameter which can be chosen so that, given e > 0,

then k creates a compact operator by the criterion at the end of the last section. This can be done for kernels in potential theory by selecting the parameter as the radius of a sphere surrounding the singularity and taking k 1 , k 2 as the kernel outside and inside the sphere respectively. Allowing the radius to tend to zero supplies the desired result. (See also §6.16.) Next consider partial differential equations. Start with the equation

V;u

+ Aru = 0

where V; is the Laplacian in n-dimensional space and r is a known function. It will be assumed that the equation holds on an open simply connected point set G of Euclidean space. The boundary of G will be denoted by aG and Gwill be used to denote the union of the sets G and oG. A basic assumption will be that G is such that the divergence (or Green's) theorem holds. There is no difficulty in extending results to regions which can be split up into a finite number of regions of the assumed type.

136


The function r will be taken to be real and positive on G. In addition, rand its first partial derivatives will be supposed to be continuous on G. The Hilbert space H will be chosen real and consists of those real functions u such that

t

Iu(t)1 2r(t) dt < CX);

the inner product 0 for all u ~ O. Put r 1/

2u

=

"1 and all

L

k(s, t)v(t) dt.

Then the eigenvalues Ah A2' ... of the partial differential equation are such that

o < Al ~ A2 ••• and

(u, Tu) 1 - SG uV;u dt A. 1= max fG r(V~u)2 dt = max II Tull2

(3.23)

138


the maximum being taken over functions u which vanish on iJG. Alternatively, we could adopt (3.13) for K 1 since all the eigenvalues are positive and then ~

At

. -fa uV;u dt JG ru 2 dt

= mm

. (u, Tu)

= mm

II u] 2

(3.24)

the functions in the minimizing process being subject to the boundary condition

u = O. Because of the boundary condition - fa uV;u dt in both (3.23) and (3.24) can be replaced by fa grad" u dt; then the competing functions in the minimum

may be chosen from the more extensive class of continuous functions on G (with boundary values zero) which have piecewise continuous first derivatives because this class contains functions with continuous first derivatives as a dense subset. The Vm form a complete orthonormal set with respect to the inner product (., .) because there is no solution to the integral equation when K = O. Now, for any u E H, (r1 /2u, r 1 /2u) < 00 and so by Theorem 3.3a (:J:)

r 1 / 2u

= L

(r 1 / 2u, vm)vm·

m=l

Hence 00

L

u=

m=l

(u, wm)wm

showing that any function in H can be expanded in terms of the eigenvectors of the partial differential equation. We now enquire whether a similar expansion for derivatives exists; namely 00

grad u

= L

m=l

bm grad

Wm

assuming, of course, that u possesses derivatives. Remark, firstly, that

L

grad W m· grad w. dt

because the eigenvectors

Wm

L L

=-

wmV;w. dt = A.,<wm• w.> = A..l5 m•

are zero on aGo This suggests

Ambm =

grad u. grad

Wm

dt

(3.25)

there being no difficulty about division by Am since it is never zero. The series then converges in L 2 norm if fa grad? u dt < 00 and so the expansion can fail only if there is a non-zero u for which all b; vanish. When this happens (3.18) gives, provided that V;u E H,

f

V n2 u dt = 0

Wm

G

since ~Vm = 0 on aGe Hence V;u is orthogonal to all

Wm

and must be zero on


account of their being complete. Thus, for any u such that V;u grad u = curl A +

139 E

H

00

L

b; grad

m=l

(3.26)

Wm

where A is arbitrary and bm is given by (3.25). Thus the grad Wm are orthonormal but not complete. Impose now the extra condition that u = 0 on aGo Then (3.25) supplies bm = (u, ~vm) so that bm can be zero for all m only if u vanishes identically. Accordingly, if SG grad? u dt < 00 and u = 0 on aG then

L 00

grad u =

m=l

(u, w m ) grad Wm •

(3.27)

When the boundary condition for the partial differential equation involves the normal derivative the direct application of the preceding technique encounters a difficulty because the existence of T- 1 cannot be asserted unless (J > O. A simple way around this is to redefine T by

T=

-0)(V~u

- aru)

where a is a positive constant whose value will be specified presently. The partial differential equation remains unaltered if A is replaced by A + a. Clearly, the domain and symmetry of T are unaffected by this change but

(u, Tu) =

f

G

(grad? u + aru2 ) dt

+

f

oG

(Ju

2

dS.

Evidently (u, Tu) > 0 if (J ~ 0; it will now be shown that, even if (J is not restricted in sign, ('J., can be picked to make the inner product positive. Let I(JI ~ (Jo on aGo Let h be a function on G such that h = n on aGo There are such h, continuous with continuous first derivatives on G, if the boundary aG is sufficiently smooth. Then

IfaG

2 (fU

-

~

(fo

= (J 0

=

fa

faG

f

cG

2 u

dS 2

n · hu dS

(2uh. grad u + u2 div h) dt.

Ihl and div h on the region. Then 2luh. grad ul ~ 2(1..11ullgrad ul ~ (l..1(YJ grad? u + u2/YJ )

Let a 1 , (1..2 be bounds for for every YJ > O. Hence

IfaG

(fU

2

dsl ~ fa {a ('1 grad? (fo

1

u

+ u 2 /'1) + 2a2 u 2 } dt


140

and

(u, Tu) ~ fa {(I -

UOlX t ,, )

grad"

u+ (IXr - 2UoIXz - uolt1 )U

Z

}

dt.

First make '1 small enough for (J oCt1 '1 < 1 and then select Ct large enough for ar > 2(JoCt 2 + ao/'1 (possible since r is bounded away from zero). In this way, the right-hand side is positive and the desired result has been achieved. Having ensured the existence of the inverse of T (cf. (3.10)) we can repeat our analysis. For example, corresponding to (3.23) and (3.24) we have A. I

and

- JG u(V;u - Ctru) dt = max - - - - - - SG r(V;u - aru)2 dt

+ Ct

, _ . - SG uV;u dt

JG ru

min

Al -

2

dt

•

The permissible u in these formulae are required to satisfy the boundary condition au/an + au = 0 on aGo Therefore, the upper integral in the last formula can be converted to

f

G

grad? u dt +

f

uu2 dS.

oG

To enlarge the class of competing functions let

= min (fa grad" udt +

111

IG uu z Ifa ru z dS)

dt

when no boundary condition is imposed on u. In particular, put u = WI + ef where lV I gives the minimum, ! has piecewise continuous first derivatives, and e is an arbitrary constant. Then,

f

grad? u dt

G

implies that

+

f

cG

au2 dS - J.ll

f

ru

2

G

2e(fa grad Wt • grad f dt + lG uwtf dS + ez{fa grad? f

dt

+

I1t

dt

~0

fa rw.! dt)

IG up

dS - 111

fa rt? dt} ~ O.

By making e small enough the term in e 2 will be negligible in comparison with that in e and, by an appropriate choice of the sign of e, the inequality is violated unless

f

G

grad

WI •

grad! dt

+

f

oG

aw.f dS -

J1l

f

oG

rw.! dt =

o.


141

Applying the divergence theorem we obtain

+ CTW I) dS - f f.eo 1 (aWl an

G

I(V 2\4)1 +

,ulrw l)

dt = O.

Since 1 is at our disposal we may elect to have it vanish on aG and since such 1 are dense in H we deduce that

+ ,ulrw l = O.

V2 W l

Once that has been established aG so that we must have

1 may

be allowed to take arbitrary values on (3.28)

on aGo Thus WI must be the first eigenvector of our original problem and can be identified with AlIt has therefore been demonstrated that

IG

-. = min (fa grad" u +

CTU

2

Ifa

dS)

2

ru dt

,ul

(3.29)

subject only to u being continuous with piecewise continuous first derivatives. Higher eigenvalues can be found by minimizing the same expression provided that the u are orthogonal to the earlier eigenvectors. Owing to the fact that no boundary condition is imposed on the trial functions u in (3.29) the condition (3.28) is known as a natural boundary condition. The eigenvectors for the boundary condition (3.28) form a complete orthonormal set so that the standard expansion for any u E H holds. For the derivatives there are some slight differences. Consider, firstly, the Neumann problem so that (J is identically zero. The orthogonal properties of the grad Wm are synonymous with those for the Dirichlet problem and (3.25) is still true, though b l is zero because A. l = 0 and \\-'1 is a constant. On account of vm/on being zero on aG we always have bm = am (m =1= 1) so that the vanishing of the bm entails u being constant. Consequently, for the Neumann eigenvectors.

a..

00

L

grad u =

(u,

m=2

\V m )

grad

\4)m

so long as grad u is square integrable. If (J is non-zero, the property of orthogonality now takes the shape

f

grad

\V m •

grad

"-'s

G

dt +

f

eo

GWmW s

and the resultant formula for bm is

i»; =

f

G

grad

U•

grad

Wm

dt -

dS

f anau eo

= As

L L

(3.36)

O. Since

En· curl curl Em dt

curl En. curl Em dt

= k;

L

En. Em dt

(3.37)

it can be seen that {Em} can be arranged as an orthonormal set, i.e.

L

Em • En dt

This orthonormal set is complete.

= «:

(3.38)

145


An immediate deduction from (3.37) and (3.38) is that

(3.39) Let e be such that dive

=0

in G and

fG e. e* dt

A ~ A2 or, indeed, for Am > )" ~ Am + l' Next, if A > J1"

the summation being over those m for which A ~ Am > J1,. Thus El - EJl is a positive operator for A > JL Further, if A. ~ u, 00

00

E1EJlx = ElL (x, xm)x m = L (x, xm)x m = EJlx the summation being over those m for which Am (if Am

=0

~

~

u, since

A)

(if Am > A).

This show that E1EJl = EJl for A ~ J1,. Finally, from Theorem 3.3a, Tx

=

L

m=l

A.m(Elm - EAm-O)x

(3.46)

where El - O means the limit of E;._£ as the positive e tends to zero. These are the ideas that we wish to generalize for self-adjoint operators which are not compact. The generalization is achieved by means of the Stieltjes integral. The Stieltjes integral

148


is defined as the limit of n

L

k=l

f(~k){g(Pk) - g(Pk-I)}

as maxk(Pk - Pk-l) --+ 0 where a = Po < PI < ... < Pn = band Pk-l ~ ~k ~ Pk' Clearly, there is no contribution to the integral from intervals on which g is constant. That observation allows us to write (3.46) as

Tx

= f"'oo A. dE;.x

(3.47)

with A as the variable of integration. Actually, the limits of integration could have been chosen finite for (3.46) because E).x is constant for A > Al and for A < 0 but the infinite limits are more convenient for future developments. The next step is to release E). from the obligation of being constant except for discontinuous changes. Instead, E). will be required to be such that and E). - Ell is a positive operator for ,t > J1.. These requirements are the same as properties in the compact case but with the necessity for constancy in E). dropped. It is possible still to have a discontinuity of A where E).-o =1= E).; such values of A are eigenvalues. Points where E). is continuous but not constant form the continuous spectrum. On the other hand, intervals where E). is constant are not in the spectrum. For instance, let H consist of functions h(t) which are of integrable square for t on the interval (0, 1). Suppose that Th(t) = th(t) so that II Til = 1. Take E). = 0 for A < -1, E). = I for A. > 1 and otherwise

E;.h(t)

=

r 0

(A ~ t) (A < t).

It can be verified that E). has the requisite properties and that (3.47) holds. The verification will be left as an exercise (remember that h(t) does not involve A). In this case there is no point spectrum because E). is not discontinuous; the spectrum is entirely continuous. The formula (3.47) can be shown to be valid for any self-adjoint operator whether bounded or not. Other formulae which can be obtained as generalizations of the compact case are

f Ad(E;.x, y),

(3.48)

IITxI1 2 = A2 dIlE;.xI1 2 ,

(3.49)

= f(A) dE;.x,

(3.50)

(Tx, y) =

f(T)x

f f

UNBOUNDED OPERATORS AND EIGENVALUES

149

f(A.) being a continuous function of A.. These results suggest that it is reasonable to write (3.51)

Exercises 14. H consists of the functions which are square integrable on (0, 1) and Tx(t) = tx(t). Show that T 1/2X(t) = t 1/2X(t). 15. For the operator Tof Exercise 14 verify that the E). given in the text has the required properties. 16. If H consists of the functions which are square integrable on (- 00, C() and Tx(t) = - i(djdt)x(t), then

-i -d = dt

where

f''Xi Ad(UE).U- 1 ) -(Xl

E).x(t)

= x(t)

(t ~ l) (t > A)

=0

and

Ux(s)

1 f'Xi = --1-/2 (2n)

-

eistx(s)ds.

00

3.7 Approximation theorems It has already been indicated in Exercise 13 how the general theory can lead to techniques which are of practical value in determining the eigenvalues of an operator. Our aim in this section is to derive two theorems which are of wide applicability. We commence by establishing LEMMA 3.7. For real ).., b such that IA - bl < m the self-adjoint operator T - A.I has a bounded inverse if, and only if, II(T - b)xll ~ mllxll for every x E H.

Proof. It is obvious that II(T - b)xll ~ II(T- A)xll

+ IA - bill x].

Therefore, if II(T- b)xll ~ mil x], the inequality IA - bl < m ensures that T - AI has a bounded inverse. Conversely, if T - A.I has a bounded inverse for A satisfying IA - bl < m the choice A. = b show that T - bI has a bounded inverse. Assume that there is an x such that II(T- b)xll < mil x]. Then II(T- bI)-l\1 > 11m. Denoting II(T- bI)-lll by M there is, as in §3.3, a sequence {Yi} with IIYjll = 1 such that «T- bI)-lYj'Yj)

--+

M,

II(T- bI)-l Yj ll --+ M.

150


Putting Xj = (T - bI)-lYj we have II(T- b)xjll = 1 and

(x j ' (T - b)x j )

Therefore

II(T- A)x j11 2 = II(T- b)xjll2 --+

--+

M,

+ 2(b -

II x j II

--+

M.

)~)(Xj' (T - b)x j )

+ (A - b)211xj112

{I + (b - A)M}2.

The choice A - b = 11M gives IA - bl < m and makes the right-hand side zero. This is impossible since T - A.I has a bounded inverse for such a A.. On account of the contradiction the initial assumption must be valid. Consequently, it must be true that II(T- b)xll ~ mil xII for all x. The Lemma is proved. A useful corollary is LEMMA

3.7a. If no point of the spectrum of T lies in (a, c) then «T - aI)(T - cI)x, x)

~

o.

Proof. Take b = (a + c)/2 and m = (c - a)/2. Then the absence of the spectrum from (a, c) means that T - )1.1 has a bounded inverse for IA - bl < m. Therefore, Lemma 3.7 enforces IITx - t(a

+ c)x11 2

~!(c

- a)211x1l 2

which is another way of writing the inequality in the Lemma. It is now possible to demonstrate the following theorem. 3.7 (TEMPLE-KATO). Let T be self-adjoint and x a non-zero element of H. Suppose that b > (x, Tx)/IITxIl 2 > a and that there is no point of the spectrum of T in (a, b) other than the isolated eigenvalue Ao . Then

THEOREM

pwhere p

= (x, Tx)/lIxI1 2 ,

(1

(J2 _ p2

b-p

(12 _

p2

~A.o~p+---

p-a

= IITxll/llxll.

Proof If e is sufficiently small there is no point of the spectrum in (a, Ao - e) and so, by Lemma 3.7a, «T - aI)(T - (Ao - e)I)x, x) ~ 0 whence (A o - e)(p - a) ~ (J2 - ap. By hypothesis, p > a and so the upper bound on Ao follows by letting e --+ O. A similar procedure for the interval (A o + s, b) gives the lower bound and the proof is terminated.

UNBOUNDED OPERATORS AND EIGENVALUES

151

As an illustration of the utility of the theorem suppose that Ao is the lowest eigenvalue of T and Al is the next highest. Assume that it is known that Al ~ c and that Ao is the only point of the spectrum below Al . Then choose a = - 00, b = c in Theorem 3.7. For any p < c, bounds on Ao are provided by (12 _

P-

p2

c-p

~

Ao

~ p.

The main difficulty in this method is obtaining an estimate of c, though this can be done by any means which supplies one-sided bounds. Theorem 3.7 caters for the case when the existence of an isolated eigenvalue is known and bounds on its location are required. The next theorem specifies an interval which guarantees that an eigenvalue lies within it. THEOREM 3.7a (KRYLOV-WEINSTEIN). If T is self-adjoint then there is at least one eigenvalue Ao such that

p_

where

(1

«(12 _ p2)l/2 ~

Ao ~ p

+ «(12

_ p2)l/2

and p are defined in Theorem 3.7

Proof. The proof will be given for compact T; the proof for general T travels similar lines but using the representation (3.47). When T is compact we can write x = L amxm in terms of the orthonormal set {x m } . Hence 2 2 (12 - p2 = II(T- pI)xI1 / llxIl

= L (Am m

~

P)2a

;. jLa;' m

min (Am - p)2 = (Ao _ p)2 m

for some Ao. The result stated in the theorem is an immediate conclusion.

Exercises 17. For the matrix

(

3.4 -2 0)

-2

4

-2

use p to show that an upper bound for the lowest

o -2 4 eigenvalue is 1.14.Show that a lower bound for the next eigenvalue is 3.6 and deduce that a lower bound for the first eigenvalue is 0.96. Will the Krylov-Weinstein theorem help? 18. If C is a compact self-adjoint operator, prove that max (x, Cx)/(x, x)

~

A,

152


the maximum being taken over those x for which (x, Yi) the Yj being some fixed element of H. Deduce that

=0

for j

= 1, ... ,r -

1,

A, = minjrnaxtx, Cx)/(x, x)} the minimum being taken over all possible Yj. 19. The matrix A 1 is formed from the Hermitian matrix A by deleting the rth row and the rth column. By means of Exercise 18, prove that the eigenvalues of A 1 separate those of A. This is the basis of a method of estimating the positions of eigenvalues from those of a matrix of lower order.

3.8 Point matching

Consider the problem of finding the eigenvectors of

au) + Aru = 0 Ln -iJ ( Pj --

j= 1 aXj

(3.52)

OXj

in a domain G with a Dirichlet boundary condition on iJG. While it may be possible to construct solutions of (3.52) (by separation of variables, for example) often they will not comply with the specified boundary condition. Then, an expansion of u in terms of such functions cannot be employed in some of the preceding methods because the trial functions must comply with the boundary condition. However, it might be asked that the coefficients in the expansion are determined by imposing the boundary condition at a sufficient number of points. This is known as point matching and the question arises as to whether it is likely to generate accurate answers. Suppose, for example, that when A = A' the solution u = u' of (3.52) can be found. If u' vanishes on oG then an eigenvalue and eigenvector have been found. If not, write

fa

u' 2rdt

= Ilu'11 2 •

Then the following assertion can be made (Fox, Henrici, and Moler 1967). THEOREM 3.8. If 11 value A such that

Proof Let

= maxseaG lu'(s)1 SG r dr/] u' II and 11 < 1, there is an eigen-

lila au) L - ( Pi::

Tu= -and define

r

1 (see Exercise 8). It is possible to apply Theorem 4.2 to each component Jj of f but then the t j need not all be the same so that the component formulae cannot be combined into a single equation of the type in Theorem 4.2. It is, however, possible to obtain an upper bound of wide applicability.

4.2a. Let X and Y be normed linear spaces. Let T be an operatorfrom X to Y which possesses a Frechet derivative at each point of a convex set Do in

THEOREM

D(T). If, for given x, Y E Do, sup IIT'(x + t(y - x))11

O~t~

then

~

M,

1

IIT(y) - T(x) II ~ Mlly - x].

Proof. For given e > 0, let B be the set of t E [0, 1] for which

II T(x + t(y - x)) - T(x)1I

°

~

Mtlly - x] + sr] y - xII.

(4.5)

Obviously, E B so that sup B is well defined. Because T has a Frechet derivative T(x + t(y - x)) is continuous in t and so IIT(x + to(Y - x)) - T(x) II ~ (M + e)tolly - x]

(4.6)

where to = sup B. Evidently, since e is arbitrary, the theorem will be proved if to = 1. Suppose that to < 1. Then the existence of T' implies that there is t 1 with to < t 1 < 1 such that

IIT(x

+ t 1 (y -

x)) - T(x

+ to(Y -

x)) - T'(x

+ to(Y -

x)).

«. - to)(Y -

~

whence IIT(x + t1(y - x)) - T(x

+ to(Y - x))11

~ (M

+ e)(t 1 -

8(t1 -

x)11 to)lly - xII

t o) II Y - x].

It follows from (4.6) that IIT(x + t1(y - x)) - T(x) II ~ (M + 8)t11l Y - xII. From (4.5) this implies that tIE B contradicting the definition of to' Hence to = 1 and the proof is terminated.

164

VARIATIONAL METHODS AND OPTIMIZATION

A useful extension is

4.2a. Under the conditionsof Theorem 4.2a, if there is a bounded linear operator 8 such that

COROLLARY

IIT'(x

+

811

t(y - x) -

~

M,

for all t E [0, 1], then IIT(y) - T(x) - S(y - x)II ~ MillY -

xII.

Proof Define f(x) = T(x) - S(x). Then f is continuous on Do and has a Frechet derivative f'(x) = T'(x) - S for all x E Do. Therefore IIf'(x + t(y -

x»11

~ M

and Theorem 4.2a gives

x)1I

IIT(y) - T(x) - S(y -

= IIf(y) - f(x)

II

~ Milly -

x].

The proof is concluded. By choosing S = T'(z) where z

II T(y)

- T(x) - T'(z) . (y -

x)1I

E

Do we can deduce from Corollary 4.2a that sup

~

O~t~

1

II T'(x + t(y - x) -

T'(z) II II y - xII.

Another approach leading to mean-value theorems is via integration. First, however, a suitable definition of integration is needed (cf. §3.6). Let t E [0, 1] and let Tt E Y, Y being a nonned linear space. Partition [0, 1] by choosing points to, t l , . . . , t, so that 0 = to < t l < ... < t; = 1. Then, if given e > 0 there are a y e Y and a ~ > 0 such that

I/Y -

jtl

(t j - tj-l)Ttil/ < e

for any partition with maxu, - ti - l ) ~ ~ and tj E [ti Riemann integral of T from 0 to 1 and we write

y=

Ii

(4.7) l,

ti ] , y is called the

T(t) dt.

Now, suppose that X is a normed linear space and that

Ii

ti»;

+ t(Xl

-

xo»

dt

exists for given x o, Xl E X in accordance with the definition just given. Then it is known as the Riemann integral of T from X o to Xl and denoted by

I

x . T(x)

Xo

dx.

THE DERIVATIVE OF AN OPERATOR

165

This definition differs slightly from the conventional one for real-valued functions of a real variable. For, let X = Y = R and then, with Tthe function f,

II

f X! f(x) dx = xo

f(x o

+ t(X I

1

fX!

0

=

Xl -

Xo

-

Xo)

dt

f(s) ds

xo

on making the change of variable s = X o + t(x 1 - xo). The integral on the right is the usual one so the new definition differs from the customary one by the factor (x 1 - xo) in the denominator; the two definitions thus agree for conventional integrals only when Xl - Xo = 1. Nevertheless, this should occasion no difficulty in the following. If T is a continuous operator, the same argument as is used for conventional integrals may be adopted to show that J~~ T(x) dx exists for all X o, Xl E X when Y is complete, i.e. when Y is a Banach space. If cP(t) E Rand IIT(xo + t(x i - xo» 11 ~ cP(t) for 0 ~ t ~ 1 then, from (4.7), n

L

(t j

-

xo)11

~

cP(t) for 0

/I y/l ~ e +

j=I

tj-l)cP(tj).

An immediate deduction is THEOREM

4.2b. If II T(xo

+ t(x i

-

~

t

~ 1

then

provided the integrals exist. In particular

IlL:'

T(x) dxll

~ L:' II T(x) II dx.

The mean-value theorem can now be established. THEOREM

4.2c. If T has a continuous Frechet derivative and Y is a Banach space

then

for all

Xl' X o E

X.

Proof Let S(t)

= T(xo + t(x i T'(xo

-

x o» and let S'(t) denote

+ t(x i

-

x o» · (Xl

-

x o)

on account of Theorem 4.1a. Pick' the partition of [0, 1] in which t j

= j/n.

166


Then n

L

S(I) - S(O) -

j=l

S'(tj)(t j - t j- I)

n

=

L

j=l

{S(t j) - S(t j- 1 )

-

(t j - tj_I)S'(t j)}.

The definition of a derivative implies that, given e > 0, there is N, such that IIS(tj) - S(t j- l) - (t j - tj_l)S'(tj)11 ~ e/2n

for n

N; Hence, for n

~

Nt,

~

/lS(1) - S(O) - it S'(tJ(ti -

ti-l)/I

~k

From the definition of an integral there is N2 such that

/lit S'(tJ(ti -

Il ~ t6 II dtll ~

ti- l) -

S'(t) dt/l

for n ~ N2 • Hence, if n exceeds the larger of N, and N2

IIS(l) - S(O) -

S'(t)

6

which proves the theorem. One consequence of this theorem is THEOREM

4.2d. If Y is a Banach

space and

IIT'(x) - T'(y) II ~

Kllx -

yllP

for all x, y in a convex set Do with p > 0 then IIT(x l )

-

T(xo) - T'(x o).

(Xl -

xo)11 ~ Kllx 1

-

xollp+1/(p + 1).

Proof Since p > 0 the hypothesis on T' signifies that it is continuous. Therefore, by Theorem 4.2c,

T(x l) - T(xo)

=

Il

».(Xl -

T'ix., + t(x l - x o

xo) dt.

Hence, Theorem 4.2b gives

II T(x 1 )

-

IIII ~ Il

T(xo) - T'(xo) . (x, - xo)11

»- T'(xo)} · (Xl -

{T'(x o + t(x l - x o

=

K

and the theorem follows at once.

tPllxl - xolI p + 1 dt

x o)

dtll

167

THE DERIVATIVE OF AN OPERATOR

Exercises

8. If f, from R 2 to R 2, has components fl(X) = x~, f2(X) t E [0, 1] such that f(y) - f(x)

= f'(x + t(y -

= x~

show that there is no

x) . (y - x)

when x = 0 and y is the point (1, 1). 9. If T has a Gateaux derivative at each y such that 1/ y - all < 1 and bT is continuous at a show by Corollary 4.2a that T has a Frechet derivative at a. 10. Show that Theorem 4.2d is valid for p ~ 0 if the additional assumption that T' is continuous on Do is incorporated.

4.3 Higher derivatives The Frechet derivative T' is such that T' . h for fixed hEX assigns the element T'(x) . h to x E X. It may happen that this operator has a Frechet derivative at x = a. If that is so, this' second derivative is written as T"(a) . h. Acting on k E X it produces the element (T"(a) . h) . k of Y. It is more convenient to rewrite this as T"(a) . (h, k). Since both hand k are in X, (h, k) is an element of the product space X x X and so T"(a) is an operator with domain in X x X and range in Y. By the definition of a Frechet derivative T" is linear for changes in hand for changes in k. Therefore T" is a bilinear operator from X x X to Y and T"(a) is known as the second Frechet derivative of T at a. Remark that the existence of T' requires T to be continuous so that a necessary condition for the existence of T"(a) is that T'(a) be continuous. The norm of T" can be calculated in a straightforward manner via

II T"(a) . hll = so that

II T"(a) II = As an example let second derivative

sup

IIhll=1

f

lim If'(a

Ikl-+O

sup

IlkII=

1

II T"(a). hll =

II (T"(a) . h) . kll sup

sup

111111=1 IlkII=1

II T"(a). (h, k)ll.

be a functional from R" to R. By the definition of the

+ k) . h - f'(a) . h - f"(a) . (h, k)I/lkl = O.

From §4.1,

f'(a) · h =

n

L

j= 1

hj of/oaj

when a and h have components at, ... , an and hi' ... , h; respectively. Further

If'(at, · · · , aj + k j, · · · , an) · h - f'(a) · hi

ki

-+

~

± oaof .

Oa i j = l

h

j

j

168


Hence

in matrix notation, H being the n x n Hessian matrix o2f

o2f

oar

oa loa 2

Thus the second Frechet derivative carries with it all the information contained in the Hessian matrix. The second Frechet derivative has a property of symmetry, as described in the following theorem. THEOREM

4.3. T"(a). (h, k)

= T"(a) . (k, h).

Proof. Given e > 0, choose

°

so that T'(x) exists for [x - all < o. Next, we want to ensure that we can take a non-zero step in the chosen direction, i.e, am ¥- o. One possible condition is I(gm + l' Pm)1 ~ e 2 1(gm, Pm)1

for some selected

62

(4.29)

such that 1 > e2 > O. Another is

If(x m) + (gm' Pm) - f(x m+ 1)1> 63 aml(gm' Pm)1

(4.30)

with 6 3 > o. If e2 is about !, am should turn out to be of reasonable size. Finally, to make sure that there is a non-zero change in [, the restriction

f(x m) - f(X m+ l )

~ -6 4 am(Pm, gm)

(4.31)

is imposed, the right-hand side being positive when (4.28) holds. Algorithms in which f(x m + 1) ~ f(x m ) are often known as descent methods. One of the earliest was the method of steepest descent in which Pm = - gIrl and a". is varied until It»; + 1) is a minimum. The steepest descent method has extremely slow convergence in general, primarily because it makes no allowance for the curvature of f, and so it is now rarely used. Once the direction of search has been settled, the choice of am has to be considered. Ifit is to be chosen so that f(x". + amP".) is a minimum, the question arises of how this is to be done numerically. Although it is a problem in a single variable only one cannot usually make a reliable estimate of it from a few values of f. Contrariwise, one does not wish to calculate f too often so that frequently a compromise is involved. At any rate, am will seldom be located at the precise minimum of It»; + 1) in practice though theoretical investigations often assume that an exact line search can be undertaken. Inexact line searches can cause algorithms to behave differently from one another when theory would predict that they gave the same results.

184


A practical method for computing elm is to estimate a value (x' for which it is certain that am < o'. Then choose am = and if f is say, decreasing for am > test f at + j{J for j = 1, 2, ... until a point a" is found at which f begins to increase. Then repeat the procedure starting at a" with a smaller steplength fJ 1. More information on line searches can be found in Fletcher (1987). A useful theorem (see also Wolfe 1971) in connection with descent methods is

ta'

ta'

ta'

4.6. If there are finite Land M such that f(x) ~ L for all x and M for all xfor which f(x) ~ f(x t ) the descentmethod (4.27) will terminate where II gil < eo under conditions (4.28), (4.31), and either of (4.29) and (4.30).

THEOREM

IIGII

~

Since the customary L 2 inner product on R" has been adopted the condition on is equivalent to specifying that all the eigenvalues of G lie between - M and M.

"G"

Proof. From Theorem 4.2c (gm+1' Pm)

= (gm. Pm) + (Pm.

Hence

Ll

G(Xm + t(Xm+ 1

-

xm»amPm dt)'

l(gm+hPm)1 ~ l(gm,Pm)l- MIIPmI1 2Ctm·

Then (4.29) implies that

(1 - G2)I(gm,Pm)1 ~ Ma mllPml12. Therefore, it follows from (4.28) and (4.31) that

f(x m )

-

- G2)I(gm,Pm)1 2/MIIPmI1 2/ ~ £4ei(1 - G2)llgmI1 M.

f(x m + 1 ) ~

6 4 (1

2

5(

Consequently, if ·11 gm II ~ £0' f(x m) - f(x m+ 1) cannot be less than G4£i e 1- £2)/M. Since f is bounded below, a finite number of such steps is possible at most, i.e. after a finite number of iterations a point at which II gil < eo must be reached. The theorem has been proved when (4.28), (4.29), and (4.31) are valid. If (4.30) is invoked instead of (4.29), we infer from Theorem 4.3a that

f(x m+ 1) = f(x m) + am(gm. Pm) +

a~(Pm.

Ll

(1 - t)G(Xm+ t(Xm+ 1 - Xm»Pm dt)

and so, from Theorem 4.2b,

If(xm+ 1 )

-

f(x m) - Ctm(gm,Pm)1 ~ ta~MIlPmI12.

We deduce from (4.30) that

2 tCtmM IIPm 11 ~ e31(gm, Pm)l;

185

OPTIMIZATION

then (4.28) and (4.31) give

f(x m )

-

f(x m + 1) ~

Again only a finite number of steps with is finished.

2G4G3GT

II gmll

II e; 1 2·

~ GO

are possible and the proof

How near the iteration terminates to a point where 11911 = 0 depends upon the flatness of f, i.e. on how large the set for which II gil < Go is. Remark that the proof does not require (4.28)-(4.31) to be applied at consecutive steps. Therefore we have the following corollary. COROLLARY 4.6. The conclusion of Theorem 4.6 will hold after a finite number of iterations at which (4.28), (4.31) and one of (4.28), (4.30) are imposed provided that on other steps f(x m+ 1) ~ f(x m).

In a large number of algorithms the search direction is taken as Pm = - Hrng m where H; is some suitable matrix. For such directions (4.28) can be confirmed in certain conditions. THEOREM 4.6a. If A.~) and A.~) are the smallestand largesteigenvalues respectively ofthe positivedefinite Hm , (4.28) is valid when A.~)/A.~) is uniformly boundedbelow by a positive constant.

Proof. Since (gm~ H".gm) ~ A.~)II gmll 2 and

IIHmgmll

~

IIHmll llgml1~ A.~)llgmll,

-(Pm' gm)/IIPmllllgmll ~ l~)/l~) ~ a where a is the positive lower bound. Thus (4.28) is verified with proof is complete.

G1

=

a and the

On account of Theorem 4.6a it is not usual in many algorithms to redefine H; if it is discovered that A.~)/A.~) is becoming unduly small. The advantage of working with H; instead of G;1 is that G; 1 requires the calculation of the second partial derivatives of f, whereas this can be avoided for H; so long as it is a good enough approximation to G~ 1. One simple method is to use differences of gradients, i.e, to replace the ij component of Gm by {gj(x m

+ hej) -

9j(Xm )}/h

where ej is the j th coordinate vector and h is a suitable step length. The new matrix may be made symmetric by replacing Gm by !(G". + G~); the symmetric matrix may not be positive definite which will entail further modification. Numerical experimentation (Gill, Murray, and Picken 1972) would appear to suggest that h = 2- 1', where t is the number of bits in the word length of the

186


computer, will keep round-off error at a tolerable level. It may, of course, be not worth updating Gmif IIxm- xm-111 < h. Gradients can, however, be employed for other purposes than estimating the elements of G. For example, let

and then form the iterative scheme for H as

Hm+l -- Hm - HmYmy~Hm (Ym' HmYm)

+

£5m£5~ ,k T + o/mUmUm (£5 m, Ym)

(4.32)

starting from some arbitrarily chosen positive definite HI' though the unit matrix is the most often chosen. In (4.32)

=

U m

s; (£5 m, Ym)

- - HmYm --(Ym, Hm'Ym)

(4.33)

and 0 and cPm ~ o. With perfect line searches (gm+ l' £5 m ) = 0 so that

(£5 m, Ym)

= a.m(g"., Hmg".)

and the condition on (£5 m , Ym) is achieved when H; is positive definite. In fact, (h, Hm+ Ih) can vanish when O. It can therefore be concluded that H; is positive definite when HI is positive definite and for all m provided that all line searches are perfect. Thirdly, it can be proved that, when f is a quadratic function, the iteration reaches the minimum of f after n perfect line searches (Powell 1971a, b) when

187

OPTIMIZATION

G is positive definite. Even if G is not positive definite the iteration either diverges to - 00 or arrives at the stationary point (Jones 1973) after n perfect line searches. In both cases it is assumed that any iteration, on which the line search is not perfect, is such that f is not increased. Powell also shows that the iteration converges to the minimum of f when f is a convex functional and all line searches are perfect. Fourthly, the quasi-Newton algorithms have the extraordinary property (Dixon 1972, 1973) that, starting from a given Xl and Hit the same sequence of points X m is generated whatever choice is made for

o.

Interchanging (z, v) and (y, Jl) merely reverses the sign of the left-hand side. This contradiction renders the initial assumption erroneous. Thus z = y and v = u must hold and uniqueness is proved. The solution of eqns (4.64) and (4.65) has been converted, by the above analysis, to two variational problems which are connected with each other. One problem involves maximizing some functional, the other requires minimization of a related functional, and the maximum and minimum values of the two functionals respectively are the same. Two such variational principles are said to be complementary or, sometimes, dual. Complementary variational principles are of considerable practical importance when the common stationary value is a quantity of physical interest. Trial functions can be substituted in the complementary functionals; they yield upper and lower bounds at once on the physical quantity sought. Usually, the insertion of trial functions will be far simpler than attempting to solve the governing equations (4.64) and (4.65) and then computing the quantity. Even if the common stationary value is not of direct engineering significance, the separation of the complementary bounds can be a guide to how closely the trial functions approximate solutions of the governing equations. To indicate how the last sentence can be justified, take X to be given by X(x, A.)

Then VxX = VF, VAX

= -A.

= F(x) - 1 0 and q(t) > 0 on [a, b]. There is no difficulty in checking that X is strictly saddle-shaped so that there is a unique solution. The complementary variational principles are minimize

! fb {;.2 +! (dA. + g)2} dt _ cA(a) 2

a

p

q

dt

subject to A(b) = -dp(b), and maximize

f:

-~ {p(~:r + qx 2+ 2g X} dt + dp(b)x(b) subject to x(a) = c.

4.9.4 Poisson's equation Suppose that one is concerned with solving Poisson's equation

for the scalar function ljJ in a domain t surrounded by a boundary (J on which ljJ has to satisfy certain conditions. Explicitly, assume that (J consists of two parts (J 1 and (J 2 on one of which 4> is specified and on the other the normal derivative of 4> is known. Thus ljJ = g on (J 1 and n , grad 4> = h on (J 2' n being the unit outward normal to (J. Then, if u = - grad ljJ, we have - div u into three parts, 4>(t) associated with the domain t and ljJ«(Jl),4>«(J2) from the two portions of the boundary.

206


Then define grad 4»

-:0

TlfJ = (

the three rows corresponding to t, 0" l' and 0"2 respectively. For an inner product take (u, v) =

f

u( t) • v( t) dt

t

+

f

+

u( 0"1)• v( 0" 1) do

at

Then (u, T4J >

=

f

u( 0"2)• v( 0"2) de .

a2

f

u.grad 4> dt -

t

=f

f

u.n4> da

at

u.n4>dt-f4>divudt t

a2

by the divergence theorem. Defining

we have where TAU

=

-div

0

(

U)

.

n.u

r

Consequently, is the adjoint of T. Eqn (4.90) can now be written as

11

F- u

T4> =

-gn

,

:fAu

=

0

o which can be expressed in the form (4.64) and (4.65) on taking

X(4),U)=f(F.U+4>ll-t u 2 ) dt - f gn.uda-f hl 4> da. t

at

The analogue of the lefthand side of (4.66), with

az

4>, U, t/J, v in place of x, A, y, Jl

207

VARIATIONAL PRINCIPLES

J

can be reduced to

i(u -

V)2

dt

which is positive unless u = v. Therefore X is saddle-shaped and the complementary variational principles are as follows. (i) Minimize

f

(! U2 - F . U) dt + f

t

subject to -div u (ii) Maximize

= il

-J

(4.91)

gn.udO' (1t

= -hI

in t, and n.u

{1fl} dt +

Iz

h l 4> de

(4.92)

subject to ¢ = g on 0' iIt will be remarked that (ii) takes the customary form of Dirichlet's principle when F = 0 and 0'2 is absent. To see what happens when inequalities are introduced consider the problem of solving (4.93) u = F - grad ¢ (in t), ¢ = g (on 0'1) under the restrictions -div u ~ 11 (in t),

¢

f t

ljJ(div u +

~

n.u ~ -h (on

0 (in t and on

f

11) dt -

0'2)'

(4.95)

0'),

¢(n.u

+ h) de

(4.94)

= O.

(12

In this case (4.91) is minimized subject to (4.94) while (4.92) is maximized subject to ¢ = g on 0' 1 and (4.95).

Exercises 28. Obtain complementary variational principles for the Poisson-Boltzmann equation d 2x/dt2

= e'

- e-

X

on 0 < t < 1, subject to x(O) = 0, x(l) = 1. With the trial function x = sinh at/sinh a show that a = 1.46 is optimal and that II x - sinh at/sinh a II ~ 0.009 with this choice of a. 29. In communication theory the integral equation l/x(t)

=

r i

Jo

"

sin(t - s) xes) ds x(t - s)

occurs. Formulate complementary variational principles (assuming that the operator is self-adjoint and positive definite). Show that IIx(t) - 1.36 - 0.06t2 11 ~ 0.040.

208


30. If P is a positive operator with decomposition TAT and Px

+

vx

=g

where v =F 0 and g are given prove that a variational principle is to maximize -t(Px, x) - tv(x, x)

+ (g,

x).

Derive a complementary variational principle and express it in terms of the same trial function. If z be used as a trial function in both expressions prove that their difference is fllPz + vz - gIl2/v. 31. In the scheme (4.70) the iteration rTX"+l + VF(x,,) = 0 is introduced. Examine whether the complementary variational expressions converge to one another for the sequence {x n } of trial functions, assuming that the sequence itself converges to a solution of (4.71). 32. Obtain complementary variational principles for the partial differential equation

V2 l/J in three dimensions, subject to l/J 33. If W(x, l) is saddle-shaped and

= 4n(l/J

- 1 + l/r)3/2

-.. 1 - l/r as r

-+ 00.

show that complementary principles are provided by minimizing (x, Vx W) - W and by maximizing band

"0 = (k

2

-

n2 /a2 ) 1/2 ,

"0

it being assumed that the frequency is such that is positive. In view of the x dependence of the incident field and the fact that the electric intensity tangential to the iris must vanish, the field produced by the iris must have the same x dependence. Also no Ex can be generated so the modal structure must ensure this. Expressing the total field in terms of such modes, we assume an expansion

E,

= {eXP(-iKOZ) + nto an exp(-iKnlzl) COS(n7r Y/b)} SiD(1tx/a) ,

-iWJ.lOHx={iKO exp(-iKoz)-

f ~5 an exp(-iKnlzl) cos(n1ty/b)

,.=0 1"11

sgn z} sin(nx/a)

and the other transverse components Ex and H, as zero. As usual, sgn z is 1 if z > 0 and - 1 if z < o. Also

"II = (k

2

-

n2/a 2

-

n 2 n 2 /b2 ) 1/2

with ",. negative imaginary when the quantity inside the radical is negative. The determination of the complex constants a; leads to the complete field.

210


However, in many circumstances and, in particular, if only the fundamental mode can propagate, it will be sufficient to find ao, for ao is the complex amplitude of the reflected wave and this will be the only significant wave away from the iris in single-mode operation. From now on it will be assumed that only the fundamental mode propagates so that "n is pure imaginary for n ~ 1. If, for the moment, only the fundamental mode is retained and [HxJ denotes the discontinuity in H; across z = 0 Ey/[Hx ]

=-

(1

+ ao)roJlo/2"oao

at z = O. On account of the continuity of E, this may be regarded as a shunt impedance Z, where (4.96)

placed across the line at z = 0 in the equivalent circuit. Therefore, it will be sufficient for our purposes to evaluate Z (a o being obtained as a by-product). The aim is to derive an integral equation on z = 0 which will permit the evaluation of the field. However, there are two ways in which this can be approached. One will involve the field on the aperture, i.e. the portion of z = 0 where there is no metal, and the other will concern the current induced in the metal strip. It will be discovered that these two integral equations lead to complementary variational principles for the quantity sought. Let S be the perfectly conducting metal portion of the iris and A the aperture section. On z = 0, write E; as E(y) sin(nx/a). Then, from the theory of Fourier series, for n ~ 1

an

= -2 fb E(t) cos -nnt dt = -2

f

nnt E(t) cos - dt b A b

bob

since E, vanishes on S. Similarly 1 + ao = !

b

f

..t

E(t) dt.

(4.97)

The tangential field H; is continuous across the aperture and so y L --.!!a cos (nn - ) =0 <X)

"=0 x;

b

. A). (y In

On substituting the integral formulae for the an we obtain

~

"0

{! f

b A

E(t) dt - I}

+~f

E(t)

f

!

b A n = 1 x;

cos nny cos nnt dt = 0 (y in A) b b

which constitutes an integral equation to determine the tangential electric intensity in the aperture. It can be converted into a more convenient version

211

WAVEGUIDES

by making the substitution

E(t)

= !biaog(t)

and using (4.97) in the first term. Then

1 fool nny nnt g(t) L :- cos - cos - dt = 0

- Ko

n= 1

A

b

lK n

b

(y in A).

(4.98)

Once 9 has been found from this integral equation, a o can be determined from

L

1 + ao = tiao

g(t) dt

as a consequence of (4.97). It follows from (4.96) that

Z/Zo = -ti

"0

L

(4.99)

g(t) dt.

Since and ix, (n ~ 1) are real and positive, the operator in (4.98) is real and so 9 is real. Thus (4.99) implies that Z is pure imaginary. Also, by multiplying (4.98) by 9 and integrating over A, we see that the integral in (4.99) is positive. Consequently, Z is negative imaginary and the iris is capacitive. The second integral equation can be inferred from a consideration of the current in the metal strip, which is proportional to the discontinuity in H" across the strip. Let J(y) sin(nx/a) = iwPo{(H,,)z= +0 - (H,,)z= -o} so that 00

J(y)

=2 L

n=O

(,,~/iKn)an cos(nny/b).

Since there is no current in the aperture, we deduce that

an

iKn = -2-

f

Kob s ao

J(t)

=

nnt dt b

(n ~ 1),

COS -

-i-f

2K ob s

J(t)dt.

The application of the boundary condition that E; vanishes on S gives

1+

00

L

n=O

whence 1 + ao

+ L 00

11=

1

an cos(nny/b)

f

=0

(y in S)

ix, ntu nx y l(t)cos-cos-dt Kob s b b

-2-

=0

(y in S).

212


Putting we obtain 1+

L

00.

n= t

ix,

f S

nnt

nny

gt(t)cos-cos-dt=O b b

(yinS)

(4.100)

as the integral equation to determine gt and hence the field. In this case

ao = ti(l

+ ao)K o

and

Zo/Z = -iKo

1

gl(t) dt

1

gl(t) dr.

(4.101)

Again, it is clear that gl is real and Z negative imaginary. Variational principles of a rather different type from the ones already discussed can be derived for both (4.98) and (4.100). However, they are particular cases of a more general theory so their consideration will be temporarily postponed.

4.11 Another form of variational principle In operator form the integral equations are Tg=f

(4.102)

where T is a linear operator. It is sufficient for the impedance in (4.99) and (4.101) if the integral of g is determined rather than g itself. In the present language this may be expressed as saying that the determination of (g, h), where h is known, will be enough for our purposes. In fact, taking h as unity would be adequate for (4.99) and (4.101). Let g' be such that (4.103) Then (4.104) (g, h) = (g, TAg') = (Tg, g') = (I, g') which may be viewed as a sort of reciprocity theorem. On account of (4.104) we have h) = (g, h)(I, g') ( g, (Tg, g') which will now be demonstrated to have variational properties. Make a variation in the expression on the right-hand side by replacing g by g + ego where s is considered to be small and go is any element in the space under consideration so long as g + ego is in the space. The right-hand side

213

WAVEGUIDES

becomes

(f,g') [( h) + e{< h) - - y or y -< x to indicate that the vector x is as good as or better than y. Obviously, it is desirable to have x as good as itself, i.e. x >- x should be true. Also, if x is better than y and y is better than z, it ought to follow that x is better than z, i.e. x >- y and y >- z should imply x >- z. Such a relation is said to be reflexive and transitive. A further desirable property is that the pair x >- y and y >- x should enforce x =y. Usually, in optimization, examples are based on spaces called cones. A convex cone C is a convex set such that, if x is a member of C and the scalar a ~ 0, ax is also a member of C. It might happen that both x and - x were members of C which would be undesirable from the present point of view. Therefore, a convex semi-cone is introduced which is a cone in which there is no x (except x = 0) such that x and -x are both members. If the linear space X contains a convex semi-cone C the relation x >- y can be defined by saying that x >- y when x - y is a member of C. An easy check confirms that the properties set out in the preceding paragraph follow from this definition. In the plane of points (Xl' x 2 ) a convex semi-cone is formed by points in which Xl ~ 0 and X2 ~ 0 are both true. Then x >- y when Xl ~ Yl and X 2 ~ Y2 are both valid. An example which does not involve points is provided by n x n positive sem-definite real matrices which form a convex semi-cone in the space of real symmetric n x n matrices. Then x >- y when the real symmetric x and y differ by a positive semi-definite matrix.

WAVEGUIDES

219

The interpretation above of the symbol >- indicates that Xo should be regarded as best in a set of x whenever x >- Xo entails x = Xo. Of course, there may be several best or none at all in a given set. More generally, one can say that a vector v(t) is best at a point to when v(t) >- v(to) forces v(t) = v(to). By analogy with the terminology for a maximum one might call to a stationary point of v(t). Consider points (t 1, t 2 ) in the set t 1 ~ 0, t 2 ~ 0, t 1t2 ~ 1. Let v(y) = (t 1 , t 2 ) T . Then to = (t h 1/t 1 ) is a stationary point of v(t) for any finite non-zero t 1 • For, suppose v(T) >- v(to) which means T1 ~ t 1 , T2 ~ l/t l • Then T1T2 ~ 1 which is not permitted except with equality. Thus, T2 = 11Tl and this substitution in v(T) >- v(to) leads to an inconsistency unless T1 = t 1 whence T2 = 1ltt. The proof is complete. This example reveals that there can be an unlimited number of stationary points in vector optimization. Theorems about the existence of stationary points and other properties are available but would take us too far afield. The interested reader is referred to the books already quoted. For an application to an antenna problem see Angell and Kirsch (1992). 4.14 Sobolev spaces The definition of a one-dimensional Sobolev space was mentioned in §3.1. It is now convenient to say something about generalizations and a property analogous to that of the gradient in §4.1. Let Q be a domain, i.e, an open set in the real Euclidean space R". The space Lp(Q) with 1 ~ p < 00 consists of those functions x(t) such that

Ilxli p = {fa Ix(t)IP dtf'P
j l12 1.Jj=1 Itjl

(5.12)

2

over vectors with components t 1 , ••• , t.. Evidently A\n+l) ~ AT) as can be seen by placing t n + 1 = O. In general ),,~+1) ~ l~) from the minimization procedure so that l~) for fixed m does not increase as n increases. (Note that this is only true for fixed m, for a similar process shows that A~++/) ~ A~n) so that A~) cannot decrease.) It follows that if AT) does not become zero as n increases, the eigenvalues of (n) are positive for any n. For this reason we introduce the following definition.

227

MINIMAL SYSTEMS DEFINITION

S.2a. The system {tP".} is called strongly minimal iflim n -

oo

l\n) >

o.

This definition is appropriate if {tP".} is an infinite set. If {tP".} is a finite set with N elements it will be called strongly minimal if l n. In m

lo

= ,.

U

L la1

k=1

m

) -

a~")12

a similar way to the above

< Ilum - u,.11 2

230

NUMERICAL ASPECTS OF VARIATIONAL METHODS

which ensures

Let m ~

00

Now, n ~

with n fixed; then a~m) ~

00

ak

and

Um

~

u. Hence

gives the last result of the theorem and the proof is finished.

5.3 Positive-definite operators Let P be a positive operator such that (Pu, u) ~ cllull 2 for some c > 0 so that (Pu, u) > 0 if u =F 0; in such a case P will be called a positive-definite operator. Define a new inner product [,] by [u, v] = (Pu, v).

With the new inner product comes a norm to be denoted by

II lip which satisfies

where p l / 2 is the square root of P (§3.6). The Hilbert space based on this new inner product and norm will be designated Hp ; it consists of all elements in the domain of p 1 / 2 . It will be assumed that H; is separable. Starting from the equation

Pg=f with an approximation n

e, = L

k=l

a~n)l/Jn

(5.18)

231

MINIMAL SYSTEMS

the Galerkin process leads to n

L [4>k' 4>j]a~) = (f, 4>j).

(5.19)

k=l

The right-hand side can be written as [P-lj; 4>i] or [g,4>j] because p-1 is a bounded operator since P is positive-definite (§3.3). Eqns (5.19) then become the same as (5.16) with a change of inner product and the best approximation to g being sought instead of u. Hence we deduce from Theorems S.2c and S.2d

5.3. If {4>".} is' complete and minimal in Hp , there exist constants at such that, in the Galerkin process for (5.18),

THEOREM

(k = 1,...)

and the convergence is uniform with respect to k if the biorthonormal set in H; then Lk= 1 1ak 12 < 00 satisfies II t/J". II p ~ c. If {4>".} is also strongly minimal in 2 - 0 · ~n I (n)1 and IImn _ oo i.Jk= 1 ak - ak - .

n,

If {4>".} is non-minimal in Hp , the limit of a~n) may not exist as n -+ 00 and, indeed, the values of a~n) may oscillate widely as n varies. In view of the fact that minimality is specified in H; rather than H it is desirable to have theorems which permit one to move from one space to another. A convenient procedure is by imbedding. While a general definition is available the following is sufficient for our purposes. If all the elements of H1 lie in H2 and there is a constant C such that (5.20) for every u E H l ,

II Ilk being the norm of Hk , H, is said to be imbedded in H2 •

THEOREM 5.3a. Let {4>".} lie in H, and H1 be imbedded in H2 • If {4>".} is (strongly) minimal in H2 , it is (strongly) minimal in H1 •

Proof. If {4>".} is non-minimal in H1 there is a 4>j such that

IlcjJ for arbitrary positive

6.

j -

t=

f

1,k¢i

a."cjJ" II < s 1

Hence, from (5.20),

and {4>".} is non-minimal in H2 • Thus, the theorem concerning minimality is proved.

232


For strong minimality let \") and ~) be the matrices in HI and H2 respectively, with eigenvalues ),,~) and JJ~). Then JJ\") ~ JJo > O. Also, from (5.12), 1(") Al

. II Li =1 tjc/> j II r = mIn . II Li =1 tjc/>j II r.IID =1 t jcP j 1\ ~ = mIn - - - -22 D= 1 Itj 1 IID= 1 t jc/>j II ~ Li = 1 Itj 1

>-

H(n)

s-: rl

min IILj= 1 tjc/>j Iii >II~" "" l..Jj= 1 t j'Yj

11 2 s-:

2

H

ro

/c 2

from (5.20). Consequently, strong minimality in HI is shown and the proof is complete.

5.3b. If HI and H2 can be imbedded in each other and {cPm} is strong in one it is strong in the other.

THEOREM

Proof. If JJ\") is bounded below, Theorem 5.3a implies that A\") is, and so it has only to be demonstrated that A~") is bounded above if JJ~n) < Mo. Now ACn) "

since II u 111 ~

= max IILl=1 tjlPjllr ~ Cn)ma IID=1 tjlPjlli ~ M C 2 ~n 2 "'" JJ" x ~n l..Jj = 1 Itj I IIl..Jj = 1 t jlP j 1122 "'" 0 I C111 U tl2 in this case. The theorem is proved.

The essence of the proof in Theorem 5.3b lies in being able to make the assertion (5.20). Now suppose that Hp is contained in H. Then, since P is a positive-definite operator, for U E Hp IlulI~ = (Pu, u) ~

cllul1 2

which is an analogue of (5.20). Hence we have from Theorem 5.3a COROLLARY 5.3a. If H; is contained in H, if {c/>m} is in H; and is (strongly) minimal in H, then {c/>m} is (strongly) minimal in Hp •

The advantage of Corollary 5.3a is that minimality need only be demonstrated in H. For.example, if on the interval (0,1) Pu = -d 2u/dt 2 subject to u(O) = u(l) = 0 I du dv*

[U,v] =

I

--dt

o dt dt

and H; consists of functions which are absolutely continuous, vanish at the endpoints, and have first derivatives of integrable square. With H as the space of functions of integrable square on (0, 1) we see that {sin mnt} is in H; and orthonormal in H. Accordingly, {sin mnt} is strongly minimal in Hp • A somewhat more general result can be obtained from the following theorem.

MINIMAL SYSTEMS

233

5.3c. If P and Q are positive-defnite operators and H; is contained in HQ there is a c > 0 such that

THEOREM

Ilullp ~ cllullQ for u E Hp . Proof. p-l/2 is bounded and self-adjoint so that its domain is H. Hence. for any U E H, p- 1/ 2U is in the domain of p 1/2 and therefore in the domain of Ql /2. Consequently Ql/2 p-l/2 U is well defined. Also, suppose that Un -+ 0 and Ql/2 P" 1/2 u n -+ v. Since p- 1/2 is bounded,

Because Ql /2 is self-adjoint it is closed and Ql /2(p-1 /2 un) --+ Ql/20 = O. Therefore v = 0 and Ql /2 p-l/2 is closed. It follows, on account of p-1 /2un --+ p- 1/20=O.

its domain being H, that it is bounded. Consequently, there is c > 0 such that IIQ1/2p- 1/2ull ~ lIull/c for U E H. Putting p- 1 / 2U = W, so that w is in the domain of p1!2, IIQ1/ 2 wlI ~ IIp1/2 wll/c and the proof is terminated. An immediate conclusion from Theorem 5.3a COROLLARY 5.3c. Under the conditions of Theorem 5.3c, {m} is strongly minimal because the stability theorems rely on the exact calculation of numbers. Should they, for example, exceed the capacity of the machine, numerical instability could arise even though analytical stability would hold. Another case where numerical instability could occur despite analytical stability is when A\n) remains positive but becomes too small to be recognized as positive by the computer. Should iterative procedures (§1.13) be adopted for the solution of (5.19) for fixed n it will be necessary to ensure that the spectral radius of the appropriate matrix is less than unity. In certain circumstances it may be possible to relate this to )"\n) and A~n). Theorems 5.4 and 5.4a provide information about the stability of the numerical process but make no statement about whether the approximate PUn converges to Pu. Indeed, there is no warranty that PUn converges to f. Actually, if PUn converges to f for arbitrary {l/Jm} then P must be a bounded operator. Therefore, one can be sure that for unbounded P only special choices of {m} consistsofthe eigenelements of Q,

THEOREM

as n -+

00.

Proof Let Ilk be the eigenvalue of Q corresponding to l/Jk' i.e. Ql/Jk = Ilk4>k' and arrange that III ~ 112 ~ •.•• Let En be the operator which, for any v E H, gives Env = v -

n

L

k=l

(v, l/Jk)4>k·

Since the {4>m} can be taken as orthonormal in H the approximation Un to the exact solution U of Pu = f can, from (5.16), be expressed as u - Enu. Hence

Now, for any v E H,

E;v so that E;

= En

= Env,

QEnv

= EnQv

and QEn = EnQ. Therefore

Also IIQ-l/2 Env 11

2

00

= L

k=n+l

I(v, tPk)r 2fllk ~ IIv 11 2 flln+ 1

238


with equality when v = 4>n+1' Hence IIQ-I/2Enll = 1/J.,l~t.;1 and ~

[u - Un lip ~

Ilpl/2Q-I/2/1/1E nQu/I 1/2

J.,ln+ I

Consequently

/lQl/2(U - un)11

~

.

IIQl/2p-I/2/1 /lpl/2(U - un)11

-/ K II EnQu II ~--1-/2-

Jln+ i

where K

= IIQl/2p-l/2111lp 1 / 2Q-l/2/1. But un) = Q(I - En + En)(u - Un) = QI/2(I -

En)Ql/2(U - Un) + QEnu since QI/2En = EnQl/2 and Enu n = O. Moreover IIQ I/2(I - En)/I = J.,l~/2, as may be proved in a similar way to that for Q-1/2 En' and so Q(u -

because J.,ln ~ Jin + I' Thus

IIPu n - f

/I

= IIP(u n - u)11

Since \I Env \I --. 0 as n --. theorem is proved.

00

~ lIPQ- 111(K

+

l)IIEnQull·

for any fixed v E H because {4>k} is complete, the

The main difficulty in the application of this theorem is the discovery of an operator Q with the relevant properties. Obviously, if P itself has a discrete spectrum, the appropriate choice is Q = P and then Theorem 5.4b assures that the Galerkin process with the eigenelements of P will converge to the correct solution. Furthermore, the fact that {4Jm} is orthonormal makes certain, via Theorems 5.3a, 5.4, and 5.4a, that the numerical procedures are stable. In general, there are no rules for determining Q and each case has to be treated on its merits. As an illustration, consider the problem of solving

-~. (P(t) dU) + q(t)u = f(t)

(5.26)

t ~ 1 subject to the boundary conditions u(O)

= 0 and u(l) = O. Define

dt

on 0

~

Pu =

dt

-~ (P(t) dU) + q(t)u. dt

dt

Then P is positive definite with

IlulI; =

It

{p(t)1

du/dtl 2 + Q(t)luI 2 } dt

if p(t) ~ Po > 0 and q always exceeds of P when q == O.

Vi

where

VI

is the smallest eigenvalue

239

MINIMAL SYSTEMS

In this case Hp consists of functions which are absolutely continuous for 1, vanish at the endpoints, and have first derivatives of integrable square. In contrast, the domain of P coincides with functions which vanish at the endpoints, have absolutely continuous first derivatives and second derivatives of integrable square. The choice to make in these circumstances is

o~ t ~

Qu = -d 2u/dt 2 for then the domain of Q is the same as that of P because P plays no role in the specification of the domain of P. Also the spectrum of Q with the boundary conditions u(O) = u( 1) = 0 is discrete with eigenelements proportional to sin ntu. It follows that an expansion in terms of {sin mnt} will provide a Galerkin process which converges in a numerically stable way to the solution of (5.26). Indeed, rather more can be said, because Theorem 5.4b asserts that

II p(u:

- u")

+ p'(u~ -

u') - q(u n

-

u)11

-+

0

"un -

as n -+ 00. Since u] -+ 0, lIu~ - u'll -+ 0 it follows that lIu: - u" II -+ 0 because p is bounded below and so the derivatives of the approximate solution converge to the derivatives of the exact solution. The analogous result in higher dimensions is that, for the elliptic equation

au ) + qu = f - Lr kL=r -aXa ( aXk Pjk -

j= 1

1

(5.27)

j

subject to u = 0 on the boundary, a suitable { O. Then CI and C2 become c fc and c21c respectively. By picking c = 2C 1C 2/(c l + c2) the right-hand side of (5.28) attains its lowest value and

II

Uo -

(c,

+ C2)U 1 II

2CtC2

Q

~

(C2 -

c 1)/Iu 1 1I Q

2C 1C 2

(5.29)

.

By interchanging the roles of P and Q we can obtain a similar inequality involving II lip. Exercises

6. If IID(n) II ~ C t(l\n»I+\' and is not strongly minimal

Ilgll

~

Ilb(n) -

C2(l r » 1/2+" with v> 0, show that even if {j is non-zero vanishes in the limit, liB - An 1100 ~ 0 as n ~ 00. Now, for any n x n matrix C,

IIClI~ = maxIIJ1 it CiiaitPill

lit t CijaJtPil1 1

~ max By taking aj conclude that

= ± liP

1=1

)=

(maxla.] ~

1

according as

Ll=

1

CijQ>i is positive or negative, we

With

we see by choosing t

= t i that (i

whence

liP).

= 1, ... , n)

INTEGRAL EQUATIONS

249

= (I - An) - 1, 11(1 - An)-lll ~ PII(I - An)-liit/> ~ fJll(I- FnK)-lFnll ~ PIl(I - FnK)-lIIIlFn II

Therefore, on taking C

00

the second step following by (5.37). But IIFn II is bounded and so is U(I - f;.K) -111 for large enough n (by (5.33». Consequently 11(1 - All) -1 n00 is bounded for large enough n. An immediate deduction is that, for sufficiently large n,

11(1 - An)-l(B - An)lloo < 1 because UB - An

II 00 -+

(I -B)-l_(I -An)-l

o. Since

= {I-(I -An)-l(B-A n)} -1(1 -A n)-l(B-A n)(I -An)-l

it follows that (I - B) -

1

exists when n is big enough and

11(1 -

B)-l - (I - An)-ll\oo -+ 0

as n -+ 00. The point-matching approximation is L~= 1 CilPi where c, replaces u(t i ) in (5.42). Hence Ila~n) - cdl~ = II{(I - B)-l - (I - An)-l}hILx> ~

11(1 -

B)-l - (I - An)-liloollfil

(5.45)

and the right-hand side tends to zero as n -+ 00 by what has just been proved. Since we know Un -+ U the proof is terminated. It will be noted that the interpolation formula used in the theorem is not necessarily the same as (5.41). On the other hand, the proof of the theorem demonstrates that replacing the coefficients of (5.42) by the more complicated ones of (5.43) makes little difference to the approximation when n is large enough. The inequality (5.45) proves a measure of the difference. Whether one or other is more accurate for smaller values of n is more difficult to determine since the theorem does not supply a bound on the error between the exact and approximate solutions. However, an error bound involving Un can be derived from (5.32). For we have seen that 11(1 - FnK)-lU is bounded, uniformly in n, by B 1 (say) and so (5.46) [u - unll ~ B1IIFnu - ull· Although the right-hand side of (5.46) embodies the unknown exact solution

u, it may be possible to make estimates for it which render (5.46) useful.

The main burden in implementing Theorem 5.6 is proving that ilF"v - vII ~ 0 since most of the other conditions are usually verified readily. As a concrete example, let the t j be equally spaced throughout the interval; specifically pick t j = a - h + jh with h = (b - a)/(n - 1). A possible choice for lPj is provided

250


by linear interpolation, namely j(t) = (t - tj-t)/h

(t j- 1

= (t j + 1 - t)/h

~

(t j ~ t

t

~

~

t j)

tj+ 1)

and zero elsewhere, with obvious adjustments for 1 and ,.. Since the total interval where j is non-zero is 2h, which tends to zero as n -+ 00, the understood assumption of Theorem 5.6 is met. Also, from (5.44), (Xj = h. Moreover, IIF,. II = 1. Further, the maximum condition in Theorem 5.6 is clearly satisfied with f3 = 1. Finally,

IlFn v - vII = an:~:b

±

I

v(tj)eJ>it) - V(t)1

= l(ti+ 1 -

Ijt

t){v(t i )

(5.47)

v(tj)c/Jit) - V(t)/.

-

v(t)}

+ (t -

ti){V(ti+t) - v(t)}1

h

j=l

~

max{lv(ti )

-

(5.48)

v(t)l, Iv(ti + 1) - v(t)I}.

Since t.; 1 - t i -+ 0 as n -+ 00 and v is continuous the right-hand side of (5.48) and hence of (5.47) tends to zero as n -+ 00. Thus Theorem 5.6 is applicable when j is given by linear interpolation and (Xj = h. Better results can be obtained if one is prepared to assume that v has a continuous second derivative. In this case choose any fixed to such that t j < to < t j+1 and consider h{v(t) - v(t j)} - (t - t j) {v(t j + 1) - v(t j)} t j+ 1) --------=:---h---=--~--~-

(to - tj)(t o -

- (t - t j )(t - t j + 1)

h{v(t o) - v(t j)} - (to - t j ) { v(t j+1) - v(tj)} h

·

This function vanishes for t = tj' to, tj + 1 and therefore, by Rolle's theorem, its derivative is zero at t = C t , C2 where tj < C 1 < to < C2 < t j + 1. Hence the second derivative vanishes at t = C where C 1 < C < C2. Accordingly (to - t)(t - t + - 2[h{V(t o) - v(t j)} j 1)v"(c) o

(to;

t j){v(t j + 1 )

-

V(tj)}]

=0

Since to was selected arbitrarily we can say that

,.

L v(tj) = (r2' L'Ir, + P2Lpl) = (r 2, Lp2) = 0 by (5.74). Therefore, for n = 2, the relations are verified. Now suppose they are true for n = 2, 3, ... , N. Then (LpN + h LpN) = 0 by (5.76) and, for m = 1, ... , N - 1, (LpN+I' LPm) = (LTrN + f3N+I LpN' LPm> = (LTrN, LPm) = (LTrN, (r m - 1 - rm)/cx m) = 0 by assumption. Further and

(r N+ l' LpN) (r N+ 1, LTrN)

= (rN -

= (r N+ 1, LTrN +

by (5.74) and (5.76). Moreover, for m

(r N+ 1, LTrm)

C(N+ lLPN+ l' LPN) f3N+I LPN)

=0

= (r N+ 1 , LpN+l) = 0

= 0, 1, ... ,N -

1

= (rN - CX N+ 1LpN+l' LTrm) = -CXN+1(LPN+hLPm+l-Pm+lLPm) =0

by what has been proved already. Hence, the relations are valid for N + 1 if they are for N. Since they hold for n = 2, induction completes the proof.

261

INTEGRAL EQUATIONS

Convergence is covered by THEOREM

that

5.7a. If am =F 0, IIrm II < IIrm+ 111 and,

if LT has a bounded inverse such

I(Y, LTY)III(LT)-lll ~ IIYII 2 , then

Ilr",1I Proof. IIrm11 2

2

~ [1 - {I1(LT)-:IIIILTIW]IIr

= (r"., r m-l = IIrm _

2

m_dI •

1 11

2

- a".LPm)

= (r"., r ".-1)

-1(r".-I' LPm)1

2/IILPmIl 2

from (5.75). The first part of the theorem has been demonstrated. For the second part

l(r m-

h

LPm)1 2 = l(rm_l,LTrm_I)12 ~ lIr m_t1/4/II(LT)-111 2

by hypothesis. Also

IILp".1I 2 = (LPm,LTr m-l) = IILTrm _ 1 112 + Pm(LPm-l,LTr m- 1 ) = IILTrm - 1 112 -1(LPm_l,LTrm_ 1)1 2/IILPm_111 2 by (5.77). Thus

IILPml1 2 ~ \ILTrm_ 1 11 2 ~ IILT11 211r m _ 1 1l 2 and the inequality of the theorem follows now. The proof is finished. The inequality of Theorem 5.7a suggests that a good convergence rate is achieved by making II(Ln- 111IlLTIl as close to unity as is feasible. The straightforward choices of T = I and T = LA can be improved on usually. For instance, T = PpAL A gives IILTII = IILPII 2 = IIp ALA11 2 (§3.3) and one can aim to have pAL A or LP a reasonable approximation to the identity. This choice also ensures the non-vanishing of am. For further details of practical applications see van den Berg (1984), van den Berg and Kleinman (1988), Kleinman et ale (1990), Zwamborn and van den Berg (1991), Sarkar (1991), Xu (1992).

Exercises 15. A perfectly conducting lamina of sides 2a and 2b is maintained at unit potential. The total charge density (1 satisfies a

f -a

fb -b

{(x -

(1(S, t) ds dt - 1 S)2 + (y - t)2}1/2 -

on the lamina. Find (1 by collocation, approximating functions, and determine the total charge on the lamina.

(1

by piecewise constant

262


If the potential is represented by the double layer

~fV~(~)dS 2n an, where' is the distance from a point on the lamina to the point of observation the integral equation

f

~ v ~ (~) dS 2n on,

v= 1

results. Compare the total charge obtained by this and a variational method with the previous one (Noble 1960, 1971). 16. Investigate the potential of a rectangular solid in a similar way to Exercise 15. 17. Construct a sequence {xm } for X m -+ x on the same lines as in the text but using the inner product (.') and norm IIxl1 2 = (x, x). The conditions to be satisfied are (x - x m, Pm) = 0 and (Pm, Pm - 1) = O. Show that, for n = 2, 3, ... , (Ph' Pm) = 0 for m = 1, ... .n - 1 and (x - X n ' T'm) = 0 for m = 0, 1, ... .n - 1. Show also that Ct m

= (x

- x m- 1 , Pm)/ II PmIl

2

= (L -I'm_I' T'm_l)/IIPmI1 2 ,

Pm = - (T'm- it Pm-l)/IIPm- til

2

•

Deduce that [x - X m II < IIx - xm - 111 if Ct... :/:: O. 18. Solve the integral equations in Exercises 15 and 16 by iteration.

NUMERICAL TRIAL FUNCTIONS 5.8 Finite elements One way of constructing trial functions is the method of finite elements. In this method the domain under consideration is split up into a finite number of elements and on each element an approximation is prepared which depends upon a number of parameters. These parameters are then determined by one of the procedures already described. In a sense, therefore, the method is subsumed under the foregoing. However, the name is usually reserved for approximants which are polynomials and elements of particular shape, usually triangles or rectangles in two dimensions and tetrahedra or hexahedra (i.e. solids with six faces) in three dimensions. As a rule, every effort is made to have the sides of the triangle straight or the faces of the tetrahedron flat in the respective cases. This is not always feasible when the element is adjacent to a boundary and triangles with a single curved side may have to be accepted. It is not, however, possible to organize a tetrahedral division in three dimensions so that no tetrahedron has more than one curved side. When dealing with a triangular division in two dimensions it is common practice to regard each triangle as having been transformed into a standard triangle. By this device it is only necessary to quote results for the standard triangle, the formulae for the original division being obtained by transformation.

263

NUMERICAL TRIAL FUNCTIONS

q

(0,1) 2

3

p

(1,0)

Fig. 5.1. The standard triangle.

The standard triangle is taken to be right-angled with the two sides adjoining the right angle of unit length (Fig. 5.1) and placed in the (p, q) plane. A triangle with vertices (x., YI)' (x 2, Y2)' and (X3'Y3) in the (x, y) plane can be gained from the standard triangle by the transformation X = X3 + Y

(Xl -

= Y3 + (Yl

x 3 )p + (x 2 - X3)q,}

(5.78)

- Y3)P + (Y2 - Y3)q·

The inverse transformation is Sp = X2Y3 - X3Y2

+ X(Y2 -

Y3) - y(x 2 - X3)'}

Sq = X3YI - XIY3 + X(Y3 - YI) - Y(X3 -

(5.79)

Xl)

where

S=

(Xl -

X3)(Y2 - Y3) - (X2 - X3)(YI - Y3)

(5.80)

is twice the area of the triangle in the (x, y) plane. So long as the triangle is genuine, i.e. its vertices are not collinear, S in non-zero and p, q can be determined. Often p and q are known as isoparametric co-ordinates when the same basis functions are used for interpolating the function as well as describing the geometry. Let the vertices of the standard triangle be numbered 1, 2, 3 as shown in Fig. 5.1. Suppose that it is desired to interpolate the function u so that it takes the correct values U I , u 2 , U 3 are the vertices. Then the interpolant U is given by U

= pUt + qU 2 + (1 -

p - q)U3.

By means of the transformation (5.79) we can return to the original triangle. On the 23 side of this triangle U will vary linearly from U2 to U 3• This will also be true if the same process is carried out for the neighbouring triangle with the same side (Fig 5.2). Since the linear variation is unique it follows that the interpolant built in this way is continuous over the whole triangular network. Thus a continuous trial function has been manufactured in which the unknowns are the values of the function at the vertices of the triangles.

264


4

2

Fig. 5.2. Linear interpolation on adjacent triangles. 2 4

5

6

Fig. 5.3. Quadratic interpolation.

Polynomials of higher degree grant the power of inserting unknown values at points other than the vertices. For example, with quadratics the mid-points 4, 5, 6 (Fig. 5.3) of the sides of the standard triangle can be allowed for. Now

U = p(2p - l)u 1

+ q(2q -

l)u2

+ r(2r

- l)u 3

+ 4pqu 4 + 4qrus + 4rpu 6

(5.81)

where r = 1 - p - q. Again this provides a trial function which is continuous over the whole triangular network. Cubics permit values at the points of trisection of the sides and one at the centroid of the triangle (Fig. 5.4). In this case U

= 1P(3p - 1)(3p - 2)u1 + · + · + !pq(3p - 1)u4 + !pq(3q - l)u s + . + · + . + · + 27pqru 1o

the dots indicating terms derived in an obvious cyclic manner. It may happen that one may wish to take the values of some derivatives as unknowns instead of relying solely on function values. A quadratic possibility for Fig. 5.3 is

U = t(p

+

+q+

p2 - 2pq - q2)U 1

+ t(p + q -

q(1 _ q)(OU) + p(1 _ P)(ou) oq 6 op

5

p2 - 2pq

+ q2)U2 + (2pq +

r)u 3

_ 2- 1/ 2 (p + q _ p2 _ 2pq _ q2)(OU) On 4

265


2

5

6

4 7

• 10

3

8

9

Fig. 5.4. Cubic interpolation.

and ou/on is a derivative normal to the 12 side out of the triangle. This formula is based on the simple difference formula for the derivative; thus (OU/Oq)6 arises from the difference of values at (!,!) and (!, -i). The significance of the derivatives in the (x, y) plane can, of course, be deduced from (5.78) and (5.79). A cubic formula for Fig. 5.4 is

U = {p(3p - 2p2) - 7pqr}u 1

+ p{(x 1 + p{(Yt

x2)q(r - p) -

+.+· (X3 -

x 1)r(q _ P)}(ou) ox

1

- Y2)q(r - p) - (Y3 - Yt)r(q - p)} (::) 1

+.+.+.+.+

(5.82)

27pqru l O

the dots representing terms obtained by cyclic interchange. All the interpolants so far supply continuity over the whole triangular network. If basis functions are required that also have continuous derivatives over the complete network, then it is necessary to have quintic polynomials at the very least. The formulae are lengthy and the reader is referred elsewhere (see, for example, Mitchell and Wait 1977) for details. In three dimensions the natural analogue of the triangle is the tetrahedron, and again a standard tetrahedron can be introduced in a transparent way. Perhaps a more valuable element, however, is the hexahedron. This is first reduced to a standard cube (Fig. 5.5) in the (p, q, r) space by the transformation x = pqrx

1

+ (1 -

+ (I

- p)qrx 2 + (1 - p)(1 - q)rx 3 + p(1 - q)rx 4 + pq(1 - r)x s

p)q(1 - r)x 6

+ (1 -

p)(1 - q)(1 - r)x7

+ p(1 -

q)(1 - r)xs

with similar expressions for y and x. An interpolant can be achieved by replacing Xi by Ui. Higher-order polynomials involving nodes on the faces of the cube can also be derived. One important problem is still deserving of attention and that is the matter of coping with elements with curved sides. In principle, one can conceive of a

x by U and

266


,

3

4""-'---.-..-J

7r-----1t-------J--_ 6 q

8

5

p

Fig. S.S. The standard cube.

transformation which converts such a triangle into the standard triangle but that will rarely be feasible in practice. Assume that the 13 and 23 sides of the triangle are straight but that the 12 side is curved. Consider

+ Sm = (Y3 - Yl)X - (x, - xt)Y + Sl =

(Y2 -

Y3)X -

(X2 -

x 3 )Y

X 2Y3 -

X 3Y2,

X 3 Yl

X 1 Y3

-

where !5 is the area of the triangle with straight sides which has the same vertices as the curved triangle; an explicit expression will be found in (5.76). Then 1=0 is the straight 23 side and 1 = 1 is at the vertex 1. Similarly m = 0 is the 13 side and m = 1 at vertex 2. Therefore, in the (I, m) plane the triangle has sides of unit length along the coordinate axes together with a curved side (Fig. 5.6). The equation of the curved side may be written as [(I, m) = 0, and, since f does not vanish at the origin, it may be multiplied by a constant so that [(0, 0) = 1 without altering the curve. Although we do not wish to contemplate mapping from the (I, m) plane to the standard triangle for arbitrary f(l, m) we can examine what befalls when the standard triangle is moved to the (1, m) plane by the connection

1= p + 2(2L - 1)pq, } m= q

+ 2(2M -

1) pq.

(5.83)

The sides p = 0 and q = 0 go into the sides 1 = 0 and m = 0 respectively, the image of a point being at the same distance from the origin. Also the mid-point

267


m (0,1)

(1,0)

Fig. 5.6. Triangle with one curved side.

of the third side of the standard triangle (p = !, q = !) transforms into the point (L, M) which can be selected as any convenient point of the third side (Fig. 5.6). In general, the target will be to have (L, M) so that the image of the third side of the standard triangle is acceptably close to f(l, m) = 0 but we can be sure of only three points being obtained exactly. The equation of the image of the third side is

(X«(X

+ fJI -

(Xm)2

+ «(X + P)2[ = «(X + fJ)(1 + (X)(C( + Pi -

(Xm)

where (X = 2(2L - 1), P= 2(2M - 1). This is a parabola with axis parallel to the line Pi = «m. Thus (5.83) approximates the curved sides in the (I, m) plane by a parabola. Since the relation between (I, m) and (x, y) is linear the approximant will differ from the true curved side unless it is a parabola in the (x, y) plane. If (5.81) is used as an interpolant, p and q are known as isoparametric coordinates and the trial function is continuous over a network composed of straight-sided triangles adjacent to triangles, with two straight and one curved side, around the perimeter of the region. The perimeter is, however, being approximated by a series of parabolas. When f is a quadratic polynomial in 1 and m, an interpolant can be found directly in the (I, m) plane without the intervention of isoparametric coordinates (see Exercise 24). Cubic interpolation can be employed with isoparametric coordinates. Here Fig. 5.4 is the starting point and

1 = p + !pq(6/ 10 - /4 -15 -1) + 227 p2 q(/4 - 2/10 ) +¥ pq2(/s - 2/10 +!), m = q +lpq(6m 10 -m4 -ms -1) + 227 p2q(m4 -2m 1 0 +1)+ 227 pq2(ms -2m 10 ) · The points 1, 2, 3, 6, 7, 8, and 9 in the (p, q) plane go over into the same points in the (I, m) plane. In contrast, 4, 5, and 10 become (1 4 , m4 ) , (Is, m s), and (/ 10, m1o). The first two can be chosen to be any convenient pair of points on the curved side while the third can be made any suitable interior point. Now the approximation to the curved side will be a cubic curve which passes through the two end-points and two other points of the side. In certain circumstances

268


the image of the 12 side has a simpler equation than in general; for instance, when 14 = Is + t, ms = m4 + (when it is a parabola) or if 110 = t/4 ' ml O = tis. Isoparametric coordinates are easy to use for curved elements but they can be dangerous because they replace a bend by a polynomial curve. The deviation may be significant and generate large numerical errors. In fact, very serious discrepancies may be ·concealed by the simplicity of the approach, for one is apt to visualize the triangle of Fig. 5.3 being mapped into a triangle topologically similar to Fig. 5.6 and this need not be so. To see this, evaluate the Jacobian of the transformation (5.83). It will be discovered that

t

ollop ol/oq I = Iom/op omloq

1 + a.q

+ PP.

The Jacobian is of one sign only if the straight line 1 + «q + Pp = 0 does not intersect the standard triangle. The line will have a point in common with the triangle if L ~ ! or M ~ i. When the Jacobian does vanish inside the triangle the transformation (5.83) is not one-to-one and, indeed, there are points (1, m) for which there are two corresponding points in the standard triangle. As an illustration let L = M = !. Then corresponding to I = t, m = 0 are p = !, q = 0 and p = ~, q = !. The second point has p + q = 1 and so lies on the side 12 of the standard triangle. In other words, the curved side in the (I, m) plane cuts the 1 axis twice as in Fig. 5.7. Far from having a triangle of the type in Fig. 5.6 we have a figure composed of three separate parts, each a sort of crescent shape. Consequently, there is a necessity for basis functions which deal with curved sides exactly. Investigation of the geometrical problems involved have been undertaken (Wachspress 1973; McLeod and Mitchell 1972) but it seems that substantial difficulties with integration can arise. Estimates of the error incurred in the finite-element method are not easy to come by. Frequently, they are stated in terms of Sobolev space (§§3.1 and 4.14) and then only when all the elements have straight sides or flat faces. For example, in two-dimensional (x, y) space the norm of the Sobolev space W",2

m

Fig. 5.7. Multiple-valued mapping.

269


is defined by IIWII~,n =

Ln Ls

s=O ,=0

IIax, at _, asw

2 1

dx dy,

the integration being over the domain under consideration. Suppose that u satisfies an elliptic partial differential equation of order 2m or is given by a variational principle of positive-definite type and order m. Then trial functions in W m , 2 are considered. If all the triangles have straight sides and the trial functions Un are piecewise polynomials of degree p (p ~ m) which are complete, the rate of convergence satisfies

where h is a geometrical parameter (effectively the largest side occurring among the elements) which tends to zero as n -+ 00. The corresponding result in the L 2 norm is but little is known about the maximum norm. It may happen that one desires to employ trial functions that are not in wm in order to reduce the constraints in constructs. The finite elements are then said to be non-conforming and the above error bounds are no longer available. The bounds also assume that integration is carried out exactly so must be used with circumspection when the evaluation is by numerical quadrature.

Exercises 19. Show that (5.82) is, on the side of a triangle, a cubic which is uniquely determined by the values of the function and its first partial derivative at the vertices at the ends of the side. What does this imply about the continuity of the trial function over the triangular network? 20. Find the analogue of (5.78) for transforming a tetrahedron to the standard tetrahedron. 21. If f(l, m) = 0 is a conic show that f(l, m) = al 2

+ blm + cm 2 -

(a

+

1)1 - (c

+

l)m

+

1

to satisfy the normalization conditions. If the conic is a hyperbola with a = c = 0 passing through (L, M) show that

b = (L + M - l)/LM so long as neither L nor M is unity. 22. If ~(l, m) is unity at (0, 0), varies linearly along 1 = 0 and m the curved side of Fig. 5.6 prove that

= 0, but

vanishes on

270


is an interpolant, where

~

=

W2 =

(1 - M)l

+ Lm - L

l-L-M Ml

+ (1 - L)m - M l-L-M

1-1- m

~=---

l-L-M

L +"'), l-L-M M +----"'), l-L-M

1 ----"'). l-L-M

When the curved side is the hyperbola of Exercise 21 prove that ~ = blm I - m + 1 and deduce the forms of Jt;, "'2, and W4 . Hence demonstrate that this process provides a continuous interpolant across the curved side common to two adjoining triangles. 23. By taking Ut3 = 1 - 1- m - (a + e - b)lm(l - al - em) in Exercise 22 obtain an interpolant in rational functions for the conic of Exercise 21. 24. For a quadratic interpolant add Us JfS + U6 U'6 to the formula of Exercise 22, 5 and 6 being the mid-points of the straight sides. For the hyperbola of Exercise 21 put W3 = (1 - 21 - 2m)f(l, m) and show that ~

= 1(1

- 2f - m/M),

~

= lm/LM,

Ws = 4mf.

25. Study the possibility of a cubic interpolant as in Exercise 24 with the points 5, 6, 7, 8 the points of trisection of the straight sides.

Constructing conforming elements which represent vectors with prescribed continuity of components is a more intricate matter. Some of a group which have been devised by Nedelec (1980, 1982) will be described here. It is convenient to start with the representation for a standard tetrahedron. To distinguish quantities associated with the standard tetrahedron from those for any other tetrahedron a caret will be used. Thus, stands for a vector of coordinates for the standard tetrahedron while v will be a vector in terms of these coordinates. The vertices of the standard tetrahedron are the origin and (1, 0, 0), (0, 1,0), (0, 0, 1). The vector vis to be represented by an interpolant Vwhich has continuous tangential components across faces. A representation in terms of polynomials of the first degree is

x

(5.84) where a and b are constant vectors. The six conditions necessary to determine

271


It

the coefficients in (5.84) are

at

=

Vt(t, 0, 0) dt, a2 =

bl

= a2 -

a3 +

b2 = a3 - at

+

b3 = at - a2 +

It It It

It

V2(O, t, 0) dt,

a3

=

It

V3(O,O, t) dt,

{V3(O, 1 - t, t) - V2(0, 1 - t, t)} dt, {vt(t, O, 1 - t) - V3(t, 0, 1 - t)} dt, {v2(1 - t, t,O) - v3(1 - t, t, O)} dt.

The method of Nedelec is not restricted to polynomials of the first degree but the number of coefficients to be found grows rapidly with degree. For instance, 20 coefficients are needed for quadratic interpolants and 45 for cubics. The tangential components of V on X2 = 0 are ~ = at + b2 3 and Y3 = a3 - b2 t . From the definitions of a and b it is evident that these components involve only the tangential components of v. Therefore, if the tangential components of vare continuous, an adjacent tetrahedron which has the same face will produce an interpolant which has the same tangential components as V. It can be checked that this property is valid for each face and so (5.84) provides an interpolant with continuous tangential components. ..... Another feature of (5.84) is that, if curl V = 0,

x

x

~

.....

~

V = grad(a.x).

This result, that V is a gradient when its curl vanishes, carries over to the interpolants of higher degree. Now suppose that a representation is required for v on an arbitrary tetrahedron. The tetrahedron is mapped into the standard tetrahedron by the linear transformation (5.85) x=Bx+c where the constant vector c and matrix B are known as soon as the position of the arbitrary tetrahedron is fixed. The matrix B is non-singular so long as the arbitrary tetrahedron is a genuine tetrahedron, i.e. has non-zero volume. The connection between a vector for the arbitrary tetrahedron and its counterpart for the standard tetrahedron is fixed by (5.86) The purpose of this relation is to ensure that certain properties are true for both tetrahedra. Let C be the 3 x 3 matrix whose ij component is oV i _

OVj

OXj

OXi

272


and C the corresponding matrix in which v, x are replaced by V, x respectively. Then (5.85) and (5.86) imply that

C = BTCB. Consequently, when the curl of a vector is zero in one of the tetrahedra the curl of its counterpart in the other tetrahedron also vanishes. If f is a scalar g-:;ad f = B T grad f which shows, if D is the normal to a face of the arbitrary tetrahedron, that the normal to the corresponding face of the standard tetrahedron ii is parallel to HT D. If i is a vector tangential to the face so that i. il = 0 it follows that i. HT D = 0, i.e. Hi. D = O. Thus, t, is parallel to Hi. Hence v, t is a scalar multiple of i. In other words, vectors which are tangential to a face of one tetrahedron remain tangential on the other under the transformations (5.85) and (5.86). Therefore, all the properties which have been ascribed to the standard tetrahedron can be attributed now to the arbitrary tetrahedron. To put it another way, an interpolant with continuous tangential components has been obtained for the arbitrary tetrahedron. Because these elements treat tangential components specially it is easy to arrange that an interpolant has such components zero on a face. Accordingly, they are very convenient for boundary value problems where D A v = 0 on a surface provided that the surface can be approximated reasonably by tetrahedra. Nedelec has extended the theory to representations with continuous tangential components on the standard cube (Fig. 5.5) and derived convergence results (see also Girault and Raviart 1986; Monk 1991). Continuity of normal, instead of tangential, components can be secured by replacing (5.84) by (5.87) = a + bi

v.

v

where each of the four conditions to fix a and b is of the form

f (v -

V).n dS = 0

with the integral over one of the faces. The mapping of the arbitrary tetrahedron to the standard one is still given by (5.85) but the vectors are related by (5.88)

v.

instead of (5.86). It follows then that il is a scalar multiple of v. n so that normal components remain normal under the transformation. Furthermore ...............

div

v= div v.

(5.89)

These conforming elements are suitable for problems in which the normal component of a vector vanishes on the boundary. As an illustration consider Maxwell's equations of §2.1 in a perfectly conducting cavity resonator. Assume that all functions occurring from now on

273


are in L 2 and let (u, v) =

f

u, v·

with integration over the interior of the resonator. If curl u is in L 2 , (2.2) gives (H, curl u) -

(e ~~, u) = (J, u)

when n A U = 0 on the surface of the resonator. On the other hand, for any w, (2.1) supplies

(curl E, w) +

(Jl aa~ , w)

= O.

If, now, the resonator is approximated by tetrahedra, we can try expressing E in the form derived via (5.84)-(5.86)as long as its tangential components vanish on the boundary of the resonator. Likewise, u can be chosen to be an arbitrary element of the same type. For Hand w an obvious choice is from (5.87), (5.88) but it may not be the best-there is some evidence that making Hand w piecewise constant vectors is more efficacious. At any rate, the arbitrariness of u and w leads to a set of linear ordinary differentialequations for the coefficients in E and H. They are subject to whatever initial conditions have been prescribed (in a format to match the foregoing). If the domain is appropriate cubes can be used instead of tetrahedra. Then one obtains differential equations which bear some resemblance to those proposed by Vee (1966) who replaced the space derivatives in (2.1) and (2.2) by centred differences. 5.9 Finite differences Variational principles furnish a method of deriving finite difference approximations to differential equations, whether ordinary or partial. It will be sufficient to demonstrate the technique for d 2u - dt 2 + o{t)u

= J(t)

on a ~ t ~ b, subject to the boundary conditions u(a) associated variational principle is to minimize

=~,

u(b)

= p.

The

over functions w which have piecewise continuous first derivatives on [a, b] and which satisfy w(a) = ~, w(b) = p. Select points to = a, t 1 , ••• , t., tn + 1 = b on [a, b]. Now approximate the

274


integral by making central difference approximations so that the replacements

dt = (w + f (dW)2 dt t + tj

+

1

f

ti +

W j)2

tj

1 -

j

1

,

= ! 0 and P2 > O. There is now no restriction on the behaviour of the trial function at the end-points. The path already described may be followed


275

though there are now two extra parameters Wo and wlI + 1 to be varied. The equations which result from their variation correspond to certain finitedifference replacements for the boundary conditions. Since the discrete analogue stems from a variational principle it will reproduce any positive-definite properties in the original. This ensures that boundary elements are treated in a way which does not destroy this property, a facility which may not be easily available if a direct attack is made as in Chapter 2. If a differential equation is not self-adjoint so that a variational principle is not to hand, different equations can be provided by integrating the equation once and then using central differences for derivatives and integrals in the manner already indicated.

Exercises 26. Generalize the technique to the equation

dU} + au =

d { p(t)-dt dt

f.

27. Extend the method to the partial differential equation

_~ {P(x. y) au} _~ {p(x. y) au} + au = f ax

ax

on a finite domain subject to (X(x,

oy

y)u + p ou/on

oy

= y on the boundary.

5.10 Comparison between finite difference and finite element In the preceding section it has been remarked that the equations obtained by finite differences and finite elements may bear a distinct similarity. This is not always true, but it raises the question as to whether one method might be preferred to the other. A brief comparison between the methods may therefore be worthwhile and assist the reader in his choice of procedure. In finite differences a regular grid is usually adopted, except possibly near the boundary, and the differential equation is replaced by a local difference approximation. Contrariwise, in finite elements the region is divided into elements (usually triangular) and the unknown is approximated by a polynomial (containing parameters) over each element. Equations for the parameters are obtained on a global basis by Galerkin's process or the variational method. The accuracy of finite differences is usually improved by using more extensive formulae, e.g. by substituting a nine-point approximation for a five-point approximation. In contrast, for finite elements the common technique is to increase the degree of the interpolating polynomial. The algebraic equations obtained are usually suitable for iterative procedures when derived via finite differences but need not be so for finite elements, so that direct methods are more likely to be relevant to finite elements. On the

276


other hand, finite elements will usually cope with boundary conditions automatically whereas interpolatory approximation is often adopted in finite differences. Finally, while irregular grids are relatively uncommon in finite differences they are habitual in finite elements. Broadly speaking, therefore, finite elements are probably better at coping with complex geometries whereas finite differences are good for initial value problems and simple geometries. When non-linearities are present there is probably not much to choose between them; the backing of variational principles may well not be available. 5.11 Eigenvalues From time to time it is desirable to determine eigenvalues by a Galerkin process or to find values of Afor which A.U - Ku = O. Actually, this spreads its net more widely since it aims at the spectrum (§3.6) of K rather than the eigenvalues alone. The theory tends to be elaborate since eigenvalue problems are non-linear even when the operator is linear. Nevertheless, it can be shown that, if K is compact and conditions A are satisfied, any open set which contains the spectrum of K will also contain the spectrum of FnK when n is sufficiently large. Indeed, if a sequence of eigenvalues and unit eigenvectors of FnK converges to A. and u as n -+ 00, then A is an eigenvalue of K and u is its corresponding unit eigenvector. For a positive definite operator it is sufficient (Dovbysh 1962, 1965) for both eigenvalues and eigenvectors that {cPm} be strongly minimal in Hp •

NUMERICAL INTEG RATION 5.12 Quadrature Frequently, the Galerkin process applied to a practical problem involves numerical integration to calculate the coefficients in the algebraic system. It will not, therefore, be out of place at this juncture to say something about the relative merits of methods for quadrature. In its simplest form quadrature consists of specifying n + 1 interpolation points xo, Xl'.'.' x, on an interval [a, b] and then determining weights \Vo, · · · , Wn so that

f

b

a

n

f(x) dx ~ ,~o wr/(x,)

(5.91)

to some acceptable degree of accuracy. If (5.91) holds exactly for all polynomials of degree p or less but not for a polynomial of degree p + 1 the method is said to be of order p. To find the w, to achieve order p we solve the p + 1 linear alge braic eq ua tions

277

NUMERICAL INTEGRATION

Ib where

~

(X - a)- dx

= ,to w,(x,

(s

- a)

= 0, 1,... , p)

(5.92)

is any convenient constant; this device is known as the method of

undetermined weights.

For example, suppose n = o. Then (5.92) is satisfied with s = 0, ~ = 0 if = b - a. With this choice of Wo it 'is satisfied for s = 1 only if Xo = t(a + b). If we choose Xo = a, we obtain the forward rectangle rule. Wo

{b f(x) dx ~ (b - a)f(a)

while Xo

= b gives the

backward rectangle rule

Ib

f(x) dx

~ (b -

a)f(b).

Both these methods are of order o. By comparison, method of order 1, the mid-point rule

Ib

f(x) dx

~ (b -

Xo

= !(a + b)

yields a

a)f{1(a + b)}.

By increasing the number of interpolating points, methods of higher order may be attained, but the more profuse the points the more intricate the formulae become. For the moment, suppose the points are equally spaced. Then, for n = 1, we have the trapezoidal rule

{b

f(x) dx

~ t(b -

a){f(a)

+ f(b)}

which is of order 1. For n = 2, there is the five-eight rule

{b f(x) dx ~ l~b - a){5f(a) + 8f(b) - f(2b - a)}

r

of order 2 and Simpson's rule

f(x) dx = i(b - a){f(a)

+ 4f{t(a + b)} + j(b)}

of order 3. A method of order 5 is

f

b

a

{(3a + b) + 24f (b-2+ a) + 64f -4-

f(x) dx ~ rto(b - a) 14f(a)

+ 64f(a: 3b) + 14f(b)} sometimes known as the Newton-Cotes rule.

278


To integrate over an interval to a high degree of accuracy there are two courses open. Having subdivided the interval by interpolating points one can base the rule on a single high-order interpolating polynomial. This is usually unsatisfactory and, indeed, there is no guarantee that the result will converge, even for a continuous function, as the number of subdivisions grows. Alternatively, different polynomials can be employed on different subintervals. For instance, if b - a = nh, the application of Simpson's rule in this way would give

fb f(x) dx = th{f(a) + 4f(a + h) + 2f(a + 2h) + 4f(a + 3h) + ... + 4f(b - h) + f(b)}.

In general, this procedure furnishes tolerable accuracy and is much less objectionable than its alternative. Error estimates are customarily expressed in terms of a derivative of f. If one does not know the derivative, the estimate may be of little value though it may indicate how convergence will improve with finer subdivision so long as the existence of the derivative can be assumed. A posteriori estimates might be more helpful but are harder to come by. In settling the error it will be assumed that all the interpolation points lie in [a, b] or to the right of b, though the method can be adopted straightforwardly to the more general case. Suppose that all the interpolation points are contained in [a, P] with P~ b. Then by Taylor's theorem, f(x)

= f(a) + (x -

a)f'(a)

f» j=l=,

Our target is to show that, in fact, a method of order 2n + 1 has been derived by this choice. Let P(y) be any polynomial of degree 2n + 1 or less. Then by subtracting the appropriate constant multiple of ymp,,+I(Y) from P(y) we can write P(y)

where

= Q(y)P" + 1 (y) +

R(y)

Q and R are polynomials of degree n. Now

I~1 R(Y)dy=,t w,R(y,) by what has been proved already. Further Q, being of degree n, can be expanded as a linear combination of Po, . . . ,P" and then the orthogonal properties of Legendre polynomials (§1.5) enforce

I~1 Q(y)Pn+ 1(y)dy=O. Thus it has been established that Gaussian quadrature is of order 2n + 1. If n = 0, Yo = 0 and Wo = o. If n = 1, Yo = -1/.J3, YI = 1/.J3, and Wo = WI = 1. Ifn = 2, Yo = -(3/5)1/2, YI = 0, Y2 = (3/5)1/2, and Wo = W2 = ~, WI = ~. For higher values of n, the reader should consult the relevant tables. The error estimate (5.94) is still valid and it is found that

I I

-1

22,,+3« n + 1),)4 g(y) dy - ,~o wrg(Yr) = (2n + 3)«2n + ;)!)3 g(2n+2)(e). "

To obtain the original integral replace g by f and 2 2 " + 3 by (b - a)2"+3. An alternative point of view is to fix Wo = WI = ... = w" = w and then prosecute a search for y, which gives a quadrature formula of order n + 1. This is known as Chebyshev quadrature. It will be discovered that the y, can be real only if n =1= 7 and 0 ~ n ~ 8. For odd values of n the order is n + 2 rather than n + 1. At this stage all the quadrature rules have been declared in terms of function values alone but one can entertain the potential presence of derivatives. A convenient formula can be derived by introducing the Bernoulli polynomials B,,( t) defined by

etz _ 1 00 z" z-z- = L -Bn(t) e - 1 n= 1 n!

(5.95)

281


for Izl < 2n. There is no difficulty in deducing the first few polynomials, namely B2(t) = t 2 - t,

B1(t) = t,

B3(t)

By taking a derivative of (5.95) with respect to

= t3 t

~t2

+ tt.

we obtain

An expansion in powers of z when [z] < 2n gives (5.96) where the B; are Bernoulii numbers; B 1 = i, B 2 = lo, B3 = 12' B4 = lo, .... Hence (5.97) (5.98) for n ~ 1. Now define the functions C2n and S2n+ 1 for n ~ 1 by C (t)

= (- )n+ 1 {B2n(t) (2n)!

2n

S (t) 2n+ 1

= (-

(- )nBn} '

)n + 1 B2n+ 1(t) (2n + I)!

on 0 ~ t < 1 and by requiring them to be periodic with period 1 for other values of t. Then, if k is a non-negative integer,

by integration by parts since S2n + 1 vanishes at the end-points because B2n + 1(O) = 0 (put t = 0 in (5.95». Advantage has also been taken of (5.98). Moreover, C2n(k) = C2n(k + 1) = Bn/(2n)! so that another integration by parts provides

i

k+1

k

8 211 + 1(t)!(2n+ 1)(a

+

ht) dt

B = __ n _ {!(2n-l)(a + 2

i

(2n)!h

1 - 1: h

k

k

+1

kh) -

!(2n-l)(a

S2n_l(t)!(2n-l)(a

+

+

ht) dt.

(k

+

l)h)}

282


Proceeding in this way we eventually arrive at

r+

since C;

=t

-

t on

1 - {f(a 2h

1

(t - k - t)!'(a

+ ht) dt

[0, 1). But this integral is

+ (k +

+

l)h)

f(a

+

Ilk

kh)} - h

+1

f(a

k

+ ht) dt.

Hence

11+

S2n + 1(t) I'"+ l)(a + ht) dt

1

=

n

~

()n+rB

-

-

{f(2r-l)(a

r

r~l (2r)! h 2n + 2 -

2r

+

kh) - f(2r-l)(a

+ (k +

2~~~1 {f(a + (k + l)h) + f(a + kh)} + ~;:rl

f+l

l)h)}

f(a

+ ht)dt.

It follows, therefore, that for positive integer m

f

m

o

S

2n + 1

n

«u»: l)(a + ht) dt =

~

~

r=1

()n+rB r (2 r. )' h2n + 2 -

- -

2r

(~+l )n {f(a) + f(a+mh)+2 m-l} L f(a+rh)

-

2h

r=1

(-)" em

+ h 2n+ 1

Jo

f(a+ht) dt.

Consequently, mh

r+

f(t) dt = h {tf(a)

+

+ tf(a + mh) +

±(-

r= 1

{f(2r-l)(a) _f(2r-l)(a + mh)}

),B,h (2r)!

+ (-th 2n+ 2

2'

:t:

{f(2,-l)(a

f(a

+ rh)}

+ mh) _ P2,-1)(a)}

Lm S2n+l(t)p2n+l)(a + ht)dt

(5.99)

for n ~ 1, which is known as the Euler-Maclaurin summationformula. The same formula may be used for n = 0 on the understanding that the sum involving the derivatives is absent and S 1 is defined by periodicity from t - t. If the integral on the right of (5.99) can be regarded as small, (5.99) provides a formula for estimating the integral on the left or, alternatively, can be conceived of as an approximation for the sum on the right. When f is a


283

polynomial the integral can be made zero by choosing n large enough but it would be wrong to imagine that the same result carries over for arbitrary f. Even the assumption that f is analytic is not sufficient to guarantee the validity, as can be seen by the particular example a = 0, b = 2n, f(t) = 1 + cos 4t. In fact, the last term in (5.99) can be expressed as 2n

2

_ (b - a)h + B f(211+2)(~) (2n + 2)! 11+ 1 '"

e

for some E (a, b) and this remainder term need not tend to zero as n ~ 00. In spite of this deficiency (5.99) without the remainder term is commonly used in the summation of series and the evaluation of integrals. It has been tacitly assumed in the foregoing that f and a sufficient number of derivatives are continuous. When this assumption fails special treatment may be necessary. If f or one of its derivatives has a finite discontinuity, the most that is necessary is to split the range of integration at the point of discontinuity into two subintervals and apply the previous quadrature formulae to the subintervals separately. When f possesses an infinity in the interval of integration a more elaborate manipulation must be undertaken. We may suppose that the singularity occurs at the end of an interval, by splitting the original range if necessary, and that the integral has the form g(t)S(t) dt where S is singular at a, e.g. S(t) = (t - a)-1/2. On the other hand, g and a sufficient number of its derivatives will be assumed to be continuous. Two methods of attack immediately offer themselves. In the first the interval is split at a + e. The integral over (a + e, b) may be handled by earlier rules. For the interval (a, a + e) expand g about t = a in a Taylor series with remainder term. Assuming the individual terms can be evaluated explicitly by analytical means, it only requires an estimate of the remainder by a mean-value theorem, coupled with previous error estimates for (a + s, b), to see if s can be adjusted so that tolerable accuracy is achieved. The second approach is to take G(t) as the first m terms (say) of the Taylor expansion of g about a and then write the integral as

S:

I b

{g(t)-G(t)}S(t) dt

+

I b

G(t)S(t) dt.

The first term now contains no singularity and may be tackled by standard quadrature methods. The second term may possibly be evaluated explicitly. Once again we have a method so long as (t - a)'S(t) dt can be calculated explicitly. Certain singularities may be managed by a change of variable. For example,

S:

284


the substitution t = a

f

b

a

g

+

s'/( 1 -

(r)

(t - a)

dt =

a

with 0
10 and for the phase when Idl > 100. On the basis of (6.28) and (6.25) I () z

V exp(Ok) = 41t V exp( - ikz)/t(z) = "'() .I ~ Z - 1 Z Zo

(6.29)

if the strength of the magnetic ring is multiplied by - 2 V and Zo = (po/e o) t /2

295

WIRE ANTENNAS

is the impedance of free space. The admittance Y(z) is given by Y(z) = 4nfl (z)/Zo

(6.30)

~ 4n/2Z o In d.

The current wave decays slowly with distance along the rod. It travels nearly at the speed predicted by Pocklington for an infinitesimally thin wire. The slow diminution in amplitude means that the wave is guided by the wire over long distances, gradually radiating energy. The radiation field can be calculated from (6.1) and H(x)

= Hi(x) + curl f:<Xl

I(~)"'(x,~) ds.

(6.31)

Thus, when r > b, (6.19), (6.23), and (6.11) supply Z oH co

0-6

0-5

.~

Q)

a::

0·4 0-3

0·2 0-1 30

60

90

120

150

6 (deg)

Fig. 6.14. Radiation pattern of a tA. sphere plus monopole: solid curve, measured (Lyngby); chain curve, computed (Lyngby); broken curve, computed (AMP); dotted curve, computed (RAE).

power should be calculated by integration of the radiation pattern because this procedure is relatively insensitive to small errors.

Exercises

.

24. A plane wave is incident along the axis of a circular sheet of perfectly conducting metal. Find the radar cross-section as the radius varies and compare your results with those of Richmond (1966). 25. A square antenna contains a central rectangular slot parallel to the sides. Find the radar cross-section for a plane wave incident perpendicular to the plane of the antenna and linearly polarized (a) parallel to the longer side of the rectangle, (b) in the perpendicular direction (see also Miller and Morton 1970). 26. Determine the radiation pattern for Fig, 6.12 when the length of the monopole is tA. (compare also Tesche and Neureuther 1970). 27. Find the radar cross-section of a spheroid for incidence along a principal axis and check against the results of Oshiro et ale (1966). 28. Model a cone with a flat base by radial wires on the base and their continuations along generators of the slant sides. Determine the scattered pattern for a plane wave incident along the axis towards (a) the apex, (b) the base. 29. A quarter-wave monopole is added to the base of the cone of Exercise 28 and is along the axis away from the apex. It is fed by a small magnetic frill surrounding it in the base. Calculate the radiation pattern produced by this transmitting antenna (Thiele et ale 1969).

328

ANTENNAS AND INTEGRAL EQUATIONS

30. Model a paraboloidal antenna by a wire grid and calculate its transmitting characteristics for excitation from a small dipole at its focus (Poggio and Miller 1970). 31. A cone-sphere antenna is manufactured by placing a hemisphere on the base of a cone so that the perimeters of the base and hemisphere coincide. Compute the scattering characteristics for a wave incident along the axis (Mautz and Harrington 1969).

6.12 The electric-field integral equation The derivation of integral equations for antennas of arbitrary shape commences from a standard representation for electromagnetic fields in terms of surface integrals. Let S be a closed surface which can be the boundary of the antenna but may be some other conveniently disposed surface. The infinite region outside S will be denoted by S+ while the bounded interior will be called S_ (Fig. 6.15). Points in S + or S_ will be distinguished by capital letters such as P and Q; moreover, the notation will be simplified so that, for example t/J(P, Q) = t/J(x p , x Q) where the right-hand side is defined by (6.2). Lower-case letters such as p and q will signify points of S. The radiation conditions to be imposed on scattered fields, assuming the medium outside S to have constant Jl and 8, are RE, RH boun~ed, R(E - ZH

R(H -

R 1\

1\

}

R) --. 0

(6.73)

E/Z) --. 0

as R --. 00, Ii being a unit vector along the radius from the origin and Z = (Jl/e)1/2 is the impedance of the medium. The problem for which a solution is sought is to satisfy Maxwell's equations in S+, the radiation conditions (6.73) at infinity, and to have a prescribed tangential electric intensity on S, i.e. if Dp is a unit normal from the point p of S into S + it is required that np

1\

E(p)

= Dp

1\

Eo(p)

Fig. 6.15. The antenna configuration.

(6.74)

329

SOLID ANTENNAS

on S where 0 1\ Eo is a known field. The field Eo might originate from sources inside S, as when S encloses a waveguide with radiating slots. In that case (6.74) is effectively the condition for a transmitting antenna. On the other hand, Eo might be due to a signal whose origins lie outside S. Then (6.74) corresponds to S being perfectly conducting and E is the field scattered by such a receiving antenna. It may be desirable to specify alternative boundary conditions, e.g. of impedance type, on S but that will not alter the essence of the principles involved and so (6.74) will be concentrated on. Observe that the field E, H does not have any sources in S+; they must all be on S or in S_. Having formulated the fundamental antenna or exterior problem, one has to consider what techniques are to be adopted to solve it. First the approach will be that of integral equations solely and some of the considerations in solving them, including finite-element methods. For a method based on an infinite system of state-space differential equations see Hizal (1974). The basic field representation can be written, on the assumption that E and H satisfy the radiation conditions (6.73) and Maxwell's equations curl E

+ ikZH = 0, div E = 0,

curl ZH - ikE

= 0,

div H = 0

as (see for example Jones 1986)

Is [(Oq

A

E(q)}

A

grad, IjI(P, q) + {Oq.E(q)} grad, IjI(P, q) - iW,u{Dq

Is [{Oq

A

1\

H(q)}t/J(P, q)] dS q

= E(P)

(P

E

S+),

=0

(P

E

S_)

(6.75)

H(q)} A grad, IjI(P, q) + {Oq.H(q)} grad, IjI(P, q)

- iwe{Dq

1\

E(q)}t/J(P, q)] dS q = H(P)

=0

(P E S+),

(P

E

S_)

(6.76)

where, now, k 2 = w 2,ue and the time dependence eicz>t has been suppressed. The symbol Dq indicates the unit normal at q of S into S+ and the subscript on a vector operator shows the variable to which it is applied, i.e.

grad, t/J(P, q)

=

(i oX~ + q

j

~ + k ~) t/J(P, q) oY q

OZq

where i, j, and k are unit vectors along the Cartesian axes. The integrals in (6.75) and (6.76) contain the normal components n.E and D. H, but the boundary condition (6.74) involves only tangential components. It is therefore advantageous to express the normal components in terms of

330


n

Fig. 6.16. The calculation of surface divergence.

tangential ones. This may be accomplished by introducing the surface divergence of a tangential vector u on S. A number of results about operators on surfaces is collected in the appendix to this chapter. References to equations there are distinguished by the letter A. The proofs in the appendix differ from the following. Draw a small curve C on S and let v be the unit vector which is perpendicular to both nand C, as well as being out of the portion of S containing n (Fig. 6.16). If the area of this portion of S enclosed by C is A, the surface divergence Div u of the tangential vector u is defined by Div u = lim

~

f

A c

u. v ds

where the integration is around C and the limit is taken as C contracts to the point where n is drawn. If C is arbitrary it follows, by covering the enclosed part of S by a grid of curves each surrounding a small area, that

fsc DivudS =

f

c

u.v ds

(6.77)

where Sc is the portion of S with C as the rim out of which v is pointing. If S is closed then C can be allowed to contract to a point in such a way that Sc becomes S. Hence

Is DivudS = 0

(6.78)

when S is closed. The formulae (6.77) and (6.78) are in agreement with (A.17) and (A.18). The element ds of arc of C satisfies ds = n /\ v ds. Hence Div(n

A

E) = lim

~

f

= -lim ~ A

by Stokes's theorem.

~

f

n /\ E. v ds = -lim E. ds A cAe

f

Sc

(n.curl E) dS

331

SOLID ANTENNAS

By Maxwell's equations, - impH may be substituted for curl E. The integrand may be regarded as constant if C is sufficiently small and hence Div(o A E) = impo.H.

(6.79)

It may be shown similarly that (6.80)

Div(o A H) = -imeo.E.

These results are implied also by (A.25). The surface divergence requires only derivatives along the surface and so (6.79) and (6.80) permit the determination of normal components from tangential ones. Write (6.81) n A H = -j, -eo.E = p. Then, from (6.80), (6.82) Div j + imp = O. The vector j may be thought of as a surface electric current while p is a surface electric charge. Thus (6.82) is a continuity equation on the surface. Surface magnetic currents and charges may also be injected via n

A

E =jm'

(6.83)

-po." = p".

with the surface continuity equation (6.84) Div jm + imp". = 0 from (6.79). For our immediate purpose, the substitution (6.83) will not be made since n A E is known on S from (6.74). Therefore the following electromagnetic field will be considered: E(P)

=

H(P) =

Is [{Oq /\

f[s

Eo(q)} /\ grad, ",(P, q) -

p~q) grad, ",(P, q) + irojlj(q)"'(P, q) ]

j(q) /\ grad, ",(P, q)

+ Div{Oq /\ Eo(q)} +iro6oq

/\

dS q ,

(6.85)

grad~ ",(P, q) Imp

Eo(q)"'(P, q) ] dS q •

(6.86)

In (6.85) and (6.86) it is assumed that (6.82) is valid so that p is known once j has been specified. Thus the integrals of (6.85) and (6.86) involveonly the single unknown tangential surface current j. No matter how j is chosen, so long as (6.82) is satisfied, (6.85) and (6.86) are solutions of Maxwell's equations in S+. The verification is straightforward so

332


that only the hardest step need be considered in detail. Put Dq • Eo(q) =

curl;

Is a

1\

grad, t/J(P, q)

8.

Then

as,

f f

= {a.div, grad, t/J(P,

q) - (a.grads) grad, t/J(P, q)}

= {(a. grad.) grad, t/J(P, q) - a. V~t/J(P, q)}

as,

as,

because t/J depends only on Ix p - xql. Since t/J satisfies Helmholtz's equation in S+ the last term reproduces the final term of (6.86). Further

Is (a. grad.) grad, t/J(P, q) as, = - grad, Is a. grad, t/J dS = -grad Is {Div(at/J) - t/J Diva} as, = - Is Diva grad, t/J(P, q) as, (6.87) q

p

from (6.78) because t/Ja is a tangential vector. The second term of (6.86) has consequently been recovered. A similar procedure is necessary in evaluating curl H but no other terms give any difficulty. It will be remarked that the argument applies equally if P is in S_, i.e, (6.85) and (6.86) are also solutions of Maxwell's equations inside S. In contrast to (6.75) and (6.76), it cannot be guaranteed that a free choice of j will lead to an identically zero field in S_. Of course, if the right choice is made it is expected that the field will be zero in the interior. Again, (6.85) and (6.86) comply with the radiation conditions (6.73) for arbitrary choice of j. This may be confirmed by incorporating the asymptotic behaviour of t/J as P -+ 00 (which is permissible because S is bounded) and by noting that

Is (Div j + ikj. x

p)

exp(ikx q •

x as, = Is Div{j exp(ikx x as, p)

q•

p)}

=0 from (6.78) since S is closed. Consequently, the representation (6.85) and (6.86) satisfies all the conditions which have been demanded of the solution with the sole exception of (6.74). The. next aim is to rectify this deficiency by letting P tend to a point of S. Some care has to be exercised in this process because t/J becomes singular as

333

SOLID ANTENNAS

Fig. 6.17. Determination of the discontinuities of surface integrals.

P approaches S. While a rigorous analysis can be undertaken (Kellogg 1929) we

shall be content with a plausible investigation which gives the same results. For more extensive properties see Colton and Kress (1983). The singularity of t/J(p, q) is integrable so that, if f is reasonably continuous on 8, lim P~p

fS f(q)t/J(P, q) dS = fS f(q)t/J(p, q) dS q

q•

(6.88)

Integrals with derivatives of t/J, however, display discontinuities. Split 8 into two sections, 8 1 and S2' of which 82 is a small region surrounding p and 8 1 the remainder of 8 (Fig. 6.17). If 8 has a continuously turning tangent plane on a neighbourhood of p it may be supposed that S2 is effectively a circular disc of radius ~ with centre p. The z axis may also be taken along the outward normal at p. An integral over 8 1 is continuous as P -. p and causes no trouble. In the limit as ~ -. 0 it will lead to a principal value. With regard to 8 2 , assume firstly that P is on the z axis, a small distance z from p. Then the phase of t/J can be ignored and

f

S2

I/I(P )dS f( q ) 'I' ,q q

~f(p)J27tJlJ tdtdl/J 4 ( 2 2) 1/2 1C 0 0 t + z

= tf(p){ (£52 + Z2)1/2 - z}.

(6.89)

As z -. 0 this may be approximated by tf(p){~ + !z2/£5 - z}. P may now be moved a small distance r off the z axis so long as a solution of Laplace's equation is retained. Thus, in general

A derivative with respect to z along the normal gives a non-zero value as P -. p,

334


whereas tangential derivatives supply no contribution in the limit. Hence -lim grad, P-p

f

S

f(q)t/J(P, q)

as, = lim

P-p

f

S

f(q) grad, t/J(P, q)

= tf(p)op +

Is

as,

f(q) grad, t/J(p, q)

as, (6.90)

where the bar on the integral sign signifies a principal value. If P is inside S, at ~ say, the only difference is that - z in (6.89) is changed to z. Therefore lim Pi-P

f

Jsr f(q) grad, t/J(P, q) as, = -tf(p)op + sf(q) grad, t/J(p, q) dS

q•

(6.91)

From (6.88) and (6.90) it follows that

. 11m E(P)

P_p

= t{np 1\ +

Eo(p)}

Is [{Oq

1\

Eo(q)}

x grad, t/J(p, q) · H(P) _ 1-() 11m - -2.1 P P"'p

+

1\

np

-() [ f S

- Jq

+ -1 Divln, •1\ 2

1\

1\

grad, t/J(p, q) -

+ iW/li(q)t/J(p, q)] Eo(p)}n

p~q)

dS q ,

(6.92)

p

lWIJ

gra

+ iweoq 1\

1 p(p)n up - - - - p 2 e

1\

d .II( ) q 'I' p, q

+ -~:""'--_-.--~-Divln, 1\ Eo(q)} grad, t/J(p, q) lWIJ

Eo(q)t/J(p, q)] dS q •

(6.93)

Now apply the boundary conditions (6.74) to (6.92). Then Up 1\

Is

= top

{iW/li(q)t/J(P, q) 1\

Eo(p) - op 1\

p~q) grad, t/J(p, q)} us,

Is

{n,

1\

Eo(q)}

1\

grad, t/J(p, q) dS q •

(6.94)

The right-hand side of (6.94) is known in principle and so (6.94) constitutes an integral equation to determine the unknown j. It is known as the electric-field

335

SOLID ANTENNAS

integral equation (EFIE). For an elongated antenna it effectively degenerates to one of the forms discussed in connection with wires. The integral equation (6.94) only applies at points where the tangent plane is continuously turning. At edges or conical points the formulae (6.90) and (6.91) are no longer valid and neither is (6.94). However, (6.74) cannot be strictly applied at such points since D is not well defined there. Therefore, although (6.94) can be imposed arbitrarily close to an edge, it must be avoided actually on an edge - an aspect to be watched when employing point matching in numerical techniques. When Eo is due to an electromagnetic field arriving from outside S, the integration on the right-hand side of (6.94) can be escaped. The incident field has no sources inside S and so

Is [(nq

1\

Eo(q)}

1\

grad, "'(P;, q) + {nq.Eo(q)} grad, "'(P;, q) - iWJl{ Dq

1\

Ho(q)}t/J(~, q)] dSq

=-

Eo(~)

(6.95)

from (6.75). Let P, --. p and then (6.95) becomes, on account of (6.91),

f[iWjl{n q

1\

Ho(q)}"'(p,q) - {nq.Eo(q)} grad, "'(p,q)] as, =

tEo(p) +

Multiply (6.96) vectorially by

Dp

Is {n,

1\

Eo(q)}

1\

grad, "'(p, q) as, (6.96)

and add to (6.94). Relabel j

+ Dq."o

and

pie + D. Eo as j and p respectively; no violation of (6.82) occurs by virtue of

(6.80). Hence the electricfield integral equation np

1\

Is {iWjlj(q)",(P, q) - p~q) grad, "'(p, q)} as,

= np

1\

Eo(p) (6.97)

is obtained, when the sources of excitation are in S+. 6.13 Uniqueness Despite the fact that it has been demonstrated that a solution to our problem satisfies (6.94) (or (6.97» it has not been shown that solving (6.94) necessarily provides the answer to our problem. If (6.94) possesses only a single solution the difficulty is resolved, but if (6.94) has several solutions the identification of the desired one has to be settled. Now, if (6.94) has two solutions, their difference must satisfy

np

1\

Is {iWjlj(q)",(P, q) - p~q) grad, "'(p, q)} as, = 0

(6.98)

336


and the question may be reformulated to ask whether (6.98) holds for any j It will be shown now that there are such solutions. Consider the electromagnetic field EO, HO defined by EO(P)

=

Is {iWJlj(q)",(P, q) -

HO(P)

=-

Is j(q)

1\

1=

o.

p~q) grad, ",(P, q)} as, grad, ",(P, q)

us,

Owing to (6.90) and (6.98), limp-+ p Dp A EO(P) = o. Accordingly, EO, HO is an electromagnetic field in S + which satisfies the radiation conditions and has vanishing tangential components on S. By the standard uniqueness theorem (Jones 1986) for radiating fields EO == 0 and n° == 0 in S+. By a similar argument based on (6.91), we can assert that the tangential component ofEO(~) vanishes as P, tends to S. It does not follow that EO is identically zero in S_ because there exist values of k for which electric modes of oscillation take place inside S which is then acting as a cavity resonator (§3.5). Now from (6.92) and (6.93)

E~(p) - E~(p) = H~(p) - H':(p)

=

p(p)n p ,

e

-j(p)

A Op

(6.99) (6.100)

where the subscripts + and - signify the values obtained as S is approached from S+ and S_ respectively. It has been shown already that E~(p) = 0, H~(p) = O. Therefore, if EO(~) is identically zero and thereby HO(~) == 0, p and j must be identically zero. Consequently, (6.94) has a unique solution unless k has a value at which the interior of S resonates in an electric mode. When k does have a resonant value there is more than one solution of (6.94). For suppose EiJ Hi is an interior electric mode. Choose j(q) = -Oq A Hi(q) and define p from (6.82). Then, by letting ~ ~ p, we can check the validity of (6.98) and confirm the existence of a non-trivial solution. While (6.94) fails only for isolated values of k the shortcoming can be disastrous for the application of numerical methods. The matrix of the algebraic system which replaces the integral equation must be singular or nearly so when k is in a neighbourhood of a point of non-uniqueness. How far away k can be to eliminate the uncertainty is less easy to explain. Figure 6.18 shows the relative error as the wavenumber varies for a sphere in an acoustic field where a similar phenomenon rears its ugly head. Clearly, errors start to make themselves manifest at wavenumbers as little as half the first resonant wavenumber. The situation is worse at the higher wavenumbers because the interior eigenvalues tend to cluster with increasing frequency. It may therefore be concuded that numerical results founded on the EFIE must be viewed with suspicion unless the dimensions of the body do not exceed

337

SOLID ANTENNAS

100

-

30

EQ) o

~

~

Q)

00.

tQ) Q)

>

Q) ~

:J

-~ ~ ~

,

10

3

Q)

~c. E ~ '-0 :J ~ E't:

I

~,~-_.

-:J C fI)

~ .s 0-3 0·1 0'4n

-- ------0-6n 0'8n Wave number times radius

n

ka

Fig. 6.18. The relative error in surface pressure on a vibrating sphere: solid curve, no correction for interior resonance; broken curve, exact theory.

half a wavelength when they are probably acceptable provided that errors have not crept in from sources other than interior resonance. The EFIE has one virtue which does not seem to have been exploited to any extent. It is transparent from Fig. 6.18 that it exposes in a distinct fashion where resonance occurs and so it could supply a means of finding the wavenumbers of electric modes of oscillation.

Exercise 32. A perfectly conducting sphere of radius a has its centre at the origin. In terms of the spherical polar co-ordinates R, (J, l/J an electromagnetic field is required outside the sphere with (Eo)o = 0,

(Eo)t/>

= h\2)(ka)

sin (J

where h\2) is the spherical Hankel function defined by h\2)(Z) = (n/2z) 112 H~~~(z).

Assume that io = 0 and it/> = C sin (J where C is a constant. Show that the EFIE (6.94) is satisfied independent of C whenever k satisfies it(ka) = 0 where it is the spherical Bessel function.

6.14 The magnetic-field integral equation The downfall of the EFIE cannot be attributed to posing the original problem incorrectly because that does have a unique solution. The fault must lie with the representation of E. Perhaps that for H is better.

338


Suppose we let P ~ p in (6.68). We are immediately faced with the problem that we have no boundary condition on D 1\ Hand j is unknown. However, from (6.93) (cf. (6.100» Dp 1\

{H+(p) - H_(p)}

= -j(p).

It is desirable that the representation should have the same property as (6.76) since that is exact. Therefore j ought to be such that Dp 1\ H_(p) = O. Accepting this condition leads to the integral equation

-tj(p)

+ Dp = Dp

Is j(q)

A

1\

f s

A

grad, "'(p, q) as,

[DiV{Dq

A

Eo\q)} grad q "'(p, q) + iweD q

A

Eo(q)"'(p, q)J as,

tWJl

(6.101) which is called the magnetic field integral equation (MFIE). If Eo is caused by a field incident from outside S the right-hand side of (6.101) may be replaced by -op 1\ Ho(p) in a similar way to (6.97) provided that j + Dq 1\ "0 is again relabelled as j. Whether (6.101) has a unique solution turns on whether

-tj(p)

+ Dp

A

Is

j(q)

A

grad, "'(p, q) as, = 0

(6.102)

is satisfied by non-trivial j. Define an inner product (§3.2) for tangential vectors j and h by

Is j.h* dS

(j, h) =

(6.103)

q•

Then the adjoint (§3.2) of (6.102) is

th(p)

+

Is

{D q A

h(q)}

A

grad, "'*(p, q) dSq

= O.

(6.104)

By the Fredholm alternative (to be discussed in the following section), whenever (6.102) possesses a non-trivial solution so does (6.104) and vice versa. Therefore construction of a non-trivial solution of (6.104) will imply non-uniqueness of (6.102). Take the complex conjugate of (6.104) and then multiply vectorially by 0p. With jm(P) = op 1\ h*(p) there results

tjm(P) +

Dp A

Is

jm

A

grad, "'(p, q) as, = O.

(6.105)

Thus to every non-trivial solution of (6.104) corresponds a non-trivial solution of (6.105). Conversely, when (6.105) is satisfied non-trivially so is (6.104) as may

SOLID ANTENNAS

339

Is

be confirmed by defining h* = jm 1\ grad, t/J(p, q) dSq whence !jm = - n 1\ h* and the result follows by taking the complex conjugate. Hence, it is sufficient for our purposes to demonstrate the non-uniqueness of (6.105). Consider the electromagnetic field E'(P) =

H'(P) =

Is

L

im(q)

A

grad, t/!(P, q) dSq ,

{iulIlim(q)I/J(P, q) -

~ grad, t/!(P, q)} as,

(6.106)

(6.107)

where Pm is given by (6.84). It satisfies Maxwell's equations in S+, the radiation conditions, and n /\ E~(p) = 0 on account of (6.90) and (6.105). Hence it is identically zero in S+ and n 1\ H'+(p) = O. An immediate consequence, by virtue of(6.91), is that n 1\ H'_(p) = O. Thus (6.105) can have non-trivial solutions only when there are interior magnetic modes of oscillations. On the other hand, if Em' H; is a magnetic mode of resonance choose im = n 1\ Em and then the application of (6.90) and (6.91) to (6.106) implies (6.105). The deduction is that the uniqueness of (6.101) fails precisely for those values of k for which there are interior modes of magnetic resonance. Thus the MFIE suffers from the same disadvantage as the EFIE, and accepting numerical results based on the MFIE should be tempered with prudence. Notwithstanding, if a choice has to be made between the MFIE and the EFIE, the better gambit is the MFIE. One reason is that the EFIE is a singular integral equation of the first kind, whereas the MFIE is one of the second kind, for which the theory has sounder foundations. The second motive for the selection is numerical in origin. Often the algebraic system, substituting for an integral equation, is increased in size by subdividing a mesh on the surface with the aim of improving the accuracy. However, as the size of a mesh element diminishes, the value of the integral over it tends to decrease. Thus the elements of the matrix for the EFIE will be becoming smaller as its order is becoming larger. There will therefore be a tendency for the matrix to become ill conditioned. In contrast, the MFIE contains a j which does not involve integration. Its matrix is therefore likely to exhibit more and more diagonal dominance with steadily decreasing mesh size. Hence, reduction in mesh size should not cause the deterioration of accuracy for the MFIE that would be expected with the EFIE. Before concluding this section there is one further implication of the Fredholm alternative that needs checking. This is the assertion that at a resonant value of k, (6.101) possesses a solution if and only if the right-hand side is orthogonal (with respect to the inner product) to every non-trivial solution of (6.104). For the receiving antenna the right-hand side is a multiple of n 1\ H o and the requirement is (6.108)

340


Now

Is

Oq A

Ho(q).h*(q)dSq = -

=

Is Ho(q).jm(q)dSq

Is {Hm(q)·Oq

A

Eo(q) - Ho(q).Oq

A

Em(q)}

as,

since jm stems from a magnetic mode of oscillation. The last integral is zero because the divergence theorem can be applied in S_ since neither field has any singularity there. Hence (6.108) is verified for a receiving antenna with no sources inside S.

6.15 The Fredholm alternative This section is a digression from the theory of antennas because its aim is to establish the Fredholm alternative for operators sufficiently general to cover the needs of the preceding section. The notation of Chapter 3 will be used but since (6.102) and (6.104) are plainly not self-adjoint a generalized version of the theory of that chapter will be necessary. The proof will be restricted to operators defined on Hilbert spaces but, in fact, the results hold for normed linear spaces though the proofs have to be changed (see Colton and Kress 1983). Let T be a compact operator on a complex Hilbert space H into H. By §3.3 T is both bounded and continuous. Let N, be the set of x such that Tx = x. It is obvious that N, is a linear subspace of H. Indeed, N, is of finite dimension and thereby closed (§3.1). For, if N, contained an infinite basis, an infinite orthonormal set {x n } could be constructed by the Schmidt process (§1.5). Then TXn = x, and so the definition of a compact operator demands that {x n } has a convergent subsequence. However, IIX m - x, 11 2 = 2 prevents the existence of any such convergent subsequence. Hence N, is of finite dimension. Since T is into H, T2 can be defined by Px = T(Tx) and higher powers of T follow in a similar manner. Now if {Xj} is a bounded sequence so is {Txj} since T is bounded. Hence T(Tx j) has a convergent subsequence because Tis compact. Consequently, T2 is compact and, in general, so is T". Let Nm be the set of x such that (T - l)mx = 0, I being the identity operator. Since T" is compact it follows from the result for Nl that Nm is of finite dimension. Clearly, if x E Nm , then x E Nm + 1 and the question arises whether the dimension of Nm increases without limit as m grows. Suppose there is x E Nm+ 1 which is not in Nm. Then (T - I)m-n x E Nn+ 1 for n ~ m but does not belong to Nn • Therefore, either N; + t and Nn are different spaces for all n or they coincide for some n and do not change for any further increase in n. To show that the first possibility cannot occur choose in each N; an element x, which is orthogonal to all members of N;- 1 and such that

341

SOLID ANTENNAS

[x, II

= 1. Then, for n > m, TXn - TXm = {(T -

Tyx; - (T - I)x m - x m} + X n = Y + X n

where Y E N;_ i- Since x, is orthogonal to Y

IITx n - Tx ml12 = I/YI1 2

+ IIxnl1 2 ~

1.

Therefore {Tx n} does not possess a convergent subsequence contradicting T being compact. We deduce that there is a finite integer v such that N; = N, + t = NY + 2 = .... An associated space is M; which consists of all those y which can be written as y = (T - I)nx for some x E H, i.e, M; is the range of the operator (T - l)n. Evidently, Mn is linear and it is, in fact, a closed linear subspace of H. This will be proved for M t and the general demonstration left to the reader. Let {Ym} be a convergent sequence in sf.. Ify". = (T- I)x".putx". = Um + Vm where UmE N, and Vmis orthogonal to Nt. Then Ym = (T - I)v".. To show that II v.,. II is bounded, suppose that there is a subsequence on which II Vm II ~ 00. Then (T-1)v m/llvmll = Ym/llvmll ~ 0 since {Ym} is convergent and hence bounded. The sequence {Tvm/ll VmII} contains a subsequence {Tv"'jlll vmj II} which converges. Therefore {Tvmj - (T - l)v"'j/llvmj II} or {vmj/llv mj II} is also convergent. If its limit is v then (T - I)v

= lim (T j-+ 00

I)v m j

II vmj II

=0

i.e. v E Nt. However, vmj is orthogonal to N, and so V-

Vmjl12 = IIvl1 2 + 1

II IlvmjII

which is a contradiction. Hence .1 Vm II is bounded. Consequently, {Tv".} possesses a convergent subsequence {Tvmk}· Since {Ym} is convergent the sequence {Tvmk - (T - l)vmk) or {vmk} converges also. Let its limit be vo. Then

Ymk = (T - l)v mk -+ (T - l)v o

and so the limit of {Ym} is in M t , i.e. M, is closed. Obviously an element of Mn + t is an element of Mn , but the spaces cannot all be different. If they could, pick for each M; a Yn which is orthogonal to Mn + t and of unit norm. A similar sort of argument to that applied to Nm demonstrates that this is impossible. Accordingly, there is a finite integer J1. such that Mp=Mp+t=Mp+2=···· The remarkable fact is that J1. = v. The common value is known as the Riesz number of T. In order to prove this the spaces are first supplemented by Mo, which is to be the same as H, and No, which contains only the zero element. These spaces are consistent with the earlier definitions of M; and Nm since (T - 1)0 can be regarded as the identity operator.

342


Next, the aim is to show that any x E H can be expressed as x = y + z where y E MJJ and Z E NJJ • Moreover, the only element MJJ and NJJ have in common is the zero element. Briefly, H = MJJ Ee Nil' Now (T - I)JJ x is in MJJ and therefore in M 2 JJ since M 2 JJ = MJJ• Hence there is some WE H such that (T - I)JJ x = (T - 1)2Il W• Therefore if we write

x

= {x -

(T - I)JJ w} + (T - I)JJ w

the first member is an element of NJJ whereas the second lies in MJJ• The desired decomposition has therefore been achieved if zero is the only common element of Mil and Nil. Suppose that there is a z =j:. 0 which is in both MJJ and Nm (m ~ 0). For n ~ u, Z E M" and so, for each n, there is a z; such that z = (T - I)"z". Since Z E Nm it follows that z, E Nm +" but is not in N", contrary to the properties of N, as soon as n ~ v. Hence the only possible common element of Nm and MJJ is zero, which is a rather stronger result than was stated. At any rate H= MIl~NJJ. One consequence of this result is that the solution of (T - I)x = y is unique when x, y E MJJ• There is a solution x E MJJ since MJJ + I = MJJ and it will be unique if (T - l)x = 0 implies that x = O. However, this is true by the preceding paragraphs because x is in both MJJ and NI , whose only common element is zero. Let now x E NJJ +1 so that (T - I)Il+ IX = O. The equation (T - I)y = 0 has solution y = (T - I)JJ x which is in MJJ and so y = O. Therefore x E NJJ , i.e. NJJ + 1 = NJJ whence u ~ v. An instant deduction is that v = 0 when u = o. If u ~ 1,let y = (T- I)JJ-IX E MJJ- 1 which is not in MJJ• Since z = (T -1)JJx E MJJ the equation (T - I)JJ w = Z has the unique solution Wo (say) in MJJ• Hence x - WE NIJ. On the other hand (T - I)JJ-l(X - w o) = y - (T - I)JJ-l~,O -# 0 since (T - I)JJ-I W O E MIJ but y is not in MJJ• Hence x - w o is not in NJJ- 1 and so NJJ - 1 -# NJJ • Therefore Jl ~ v. Combining the last two paragraphs we conclude that u = v. There is no longer any necessity to distinguish betweem u and v. Suppose that u = O. Since Mo = H, the equation (T - I)x = y has a unique solution for any y E H. The correspondence could be expressed as x = (T - I) - 1 Y and the inverse is, in fact, bounded. For, if this were not so, there would be a sequence {y,,} such that Ilx"ll/lly,,11 was unbounded. Put z, = x"/lIx" II; then (T - Fyz; = y"/llx" II ~ O. On account of liz" II = 1, the sequence {Tz,,} contains a convergent subsequence {TZ"k} with limit, say zo0 Then Z"k

= TZ"k -

(T - I)z"k ~

Zo°

By virtue of the continuity of T - I, (T - I)z"k ~ (T - l)zo. The limit of the left-hand side is zero and hence Zo = 0 by uniqueness. This contradicts II zoll = limllz"kll = 1 and so there is a finite C such that II(T-/)-Ill < C. Further, if TAy = y, o = (x, TAy - y) = (Tx - x, y) for all x E H. Since u

=0

this shows that y cannot be in M I

= Mo = H

unless

SOLID ANTENNAS

343

Y = O. Accordingly, when u = 0 the homogeneous adjoint equation possesses only a trivial solution and the inverse (TA - I) -1 exists. It is indeed bounded by the same constant as (T - I) -1 because, in general,

IIT AYI1 2

= (TAy, TAy) = (TTAy,y)

~ IITTAyllllYII ~ IITIIIITAyllllyll.

If u ~ 1 the situation is different because the equation Tx = x always admits the solution x = (T - I)IJ-I Z where Z E NIJ but is not in NIJ- I . Thus T - I has no inverse. However, it has been proved above that Tx - x = y has a unique solution when both x and yare in MIJ. Arguing as in the case Jl = 0 we deduce that there is a C such that for x E MIJ, Ilxll ~ C II(T - I)xll. Since (T - I)nx E MIJ , it follows that [x] ~ CIJII(T - I)lJxll for x E MIJ. When the Riesz number is non-zero the equation Tx - x = y mayor may not have a solution. If there is a solution it will not be unique in general. It has been seen that, if y is in MIJ , there is a solution but any x satisfying Tx = x could be added unless x is forced to be in MIJ. For arbitrary y note that, when

TAW = W,

(y, w) = (Tx - x, w) = (x, TAW - w) = 0

which is a necessary condition that y must obey for a solution to exist. It turns out to be a sufficient condition as well. In fact, since AT is compact when A is complex, the following theorem can be demonstrated. 6.15 (THE FREDHOLM ALTERNATIVE). If T is a compact operatorfrom H into H EITHER the equations x - ATx = 0 and w - A*TAw = 0 have only the solutions x = 0 and w = 0, and x - ATx = y possesses a unique solution OR the equations x - ATx = 0 and W - A*TAw = 0 each have the same number of linear independent solutions x(1), ... ,x(r) and w(1), , w(r). Then x - ATx = y has a solution if and only if (vii), y) = 0 for j = 1, .r.

THEOREM

One important conclusion from this theorem is that, if it can be shown

x - ATx = 0 has no solution other than x = 0, the existence of a solution to x - )"Tx = y is guaranteed.

When the second alternative is relevant and y has the necessary orthogonality the solution of x - ATx = y can be written as X

= Z

+

r

L

bmx(m)

m=l

where z is a particular solution and the bm are arbitrary constants. It will be observed that Theorem 6.15 also resolves an eigenvalue problem. In particular, any eigenvalue A (which may be complex) is of finite multiplicity by the second part of the theorem. It may be deduced that the eigenvalues An are countable and have no limit point in the finite part of the complex plane. Often applications provide an operator T which is bounded but not compact, though some positive integer power T" is compact. In that case, Theorem 6.15 still stands.

344


When the Riesz number of AT is non-zero the second alternative of Theorem 6.15 is pertinent. Any solution of (I - AT)2x = 0 must be of the form

ATx

X -

r

L

=

bmx(m).

m=l

If this equation can be solved for x with one at least of the constants b; non-zero the Riesz number must be two or greater since x - ATx =F o. If no such solution exists the Riesz number must be one. The existence of a solution entails

L bm(x(m), w{J» r

= 0

m=l

for j = 1, ... , r. Consequently, the Riesz number is either one or greater than one accordingly as the r x r matrix with entries (x(m), w{J» is non-singular or singular. When the matrix is non-singular it is possible to choose X, = L~ = 1 akmx(m) such that (Xk , w(j» = ~kj where ~kj is the Kronecker symbol. Since X k is also an eigenfunction this means that there are eigenfunctions such that the matrix is the unit matrix. COROLLARY 6.15. The Riesz number of AT is one if, and only if, the matrix with entries (x(m), wU) >is non-singular or, equivalently, there are linearly independent eigenfunctions X, such that (Xk , w(j) > is the unit matrix.

As an application of Theorem 6.15 we establish the interrelations between the solutions of (6.102), (6.105), and its adjoint

tk(p) -

Is {Oq

1\

k(q)}

1\

grad, "'*(p, q) dS q = 0

(6.109)

assuming that the operators are compact as will be verified shortly. If j is any solution of (6.102) let E 2 , H 2 be the electromagnetic field in which E 2 is defined by E 2(P)

=

Is [iWJlj",(P, q) -

By virtue of (6.102) D 1\ H: for the interior S provides

op

1\

[tE2(P)

+

= O.

Is {Oq

{p(q)/e} grad, "'(P, q)] dS q •

Hence, application of the analogue of (6.75)

1\

E

2(q)}

1\

grad, "'(p, q) dSqJ = O.

Thus Dp A E 2(p) is a solution of (6.105). To every linearly independent j there corresponds an Dp A E 2(p). If the Dp A E 2(p) were not also linearly independent there would be some non-zero j for which Dp A E 2(p) = O. Since up A E~(p) = Up A E: (p) = Dp A E2 (p) the uniqueness theorem of the exterior problem would imply that the field was identically zero in S+ and so Dp 1\ H~(p) = O. But,

345

SOLID ANTENNAS

since Dp 1\ H~(p) = 0, this would necessitate j = 0 and a zero field, contrary to assumption. Hence there are as many linearly independent Dp 1\ E 2 (p) as there are j. But (6.102) and (6.105) have the same number of linearly independent solutions by Theorem 6.15. Accordingly, every solution of (6.105) can be

E2(p) derivedfrom a solution of (6.102). Next, it will be shown that is a solution of (6.105) whenever im is. Let

expressed in the form Dp

1\

i:

im

generate the electromagnetic field Em' H; inside S. Then Dp 1\ U m- = 0 and jm = Dp 1\ E m _ . However, the field E:, also satisfies Maxwell's equations in S_ and has a tangential magnetic intensity which is zero on S. Therefore satisfies (6.105). In view of the conclusion of the preceding paragraph, to each

-H:

i:

solution i of (6.102) is related a solution l, of (6.105) such that i:(p)

= Dp

1\

E 2 (p).

On account of the way in which (6.105) was constructed from (6.104) we infer that every solution of (6.104), the adjoint of (6.102), satisfies

(6.110) The result enables us to show that the Riesz number of (6.102) is one when it possesses non-trivial solutions. If it were not one, then j would be orthogonal to all solutions of the adjoint (6.104) by Corollary 6.15. In particular, it would be orthogonal to the h of (6.110). Since j = - n 1\ H~ this means

Is n

A

H~. E2 * as, = O.

But then the exterior uniqueness theorem would require Dp 1\ H~ = 0 on S or j = 0 in contradiction to i being non-trivial. Consequently, the Riesz number must be one. An immediate inference is that the Riesz number of (6.105) is one. Accordingly, by Corollary 6.15, there are linearly independent J, spanning the space of solutions of (6.105) and linearly independent k, solutions of the adjoint (6.109) such that

(6.111) An explicit formula for J, in terms of the k, can be derived. Multiply (6.109) vectorially by Dp and take the complex conjugate. It is obvious then that D 1\ k* satisfies (6.102). Hence D 1\ k: can be used to generate a field of the type E 2 • By what has been proved above J, = D 1\ E 2 for some E 2 • Therefore, there are constants ars such that J,(p)

= np

A

.tl

a,s

Is {iwjln

when there are N of the ks •

q A

k:.!J(p, q)

+ (L'icos) Divm, 1\

k:) grad, t/J(p, q)} dS q (6.112)

346


6.16 Compactness and other properties of the MFIE To apply the Fredholm alternative to (6.102) it is necessary to show that the operator T, defined by 1j

= Dp

A {

j(q)

grad, t/J(p, q) dS q ,

A

is compact. The theory at the beginning of §3.4 cannot be cited because the kernel is not square integrable. Instead, advantage is taken of the device described after eqn (3.17) of that section. Define 'I'1 and 'I'2 by 'I'1

= grad, t/!(p, q)

(lx, - xql

=0

(lx, - xql < b), (lx, - xql

'1'2=0

= grad, t/!(p, q) Then T = T1

+

~

~

b) b)

(lx, - xql < b).

T2 where T1 j

= Dp

TJ =

j(q)

A '1'1

dS q ,

{j(q)

A '1'2

dS q •

A {

Dp A

The kernel for T1 has no singularity and is square integrable. Therefore, by §3.4, T1 is compact. Hence T will be compact, by the first criterion at the end of §3.3, if it can be shown that the norm of T2 can be made arbitrarily small. For given xp select temporary axes with the origin at xp and the z axis along np , assuming that S has a continuously turning tangent plane near x p • The integral in T2 will then be effectively limited to x 2 + y2 ~ b2 and the equation of the surface will be approximately 2z = ax: + by 2 if the x and y axes are z

_ .......p

--=~---~x

Fig. 6.19. The geometrical configuration for the norm of T2 •

347

SOLID ANTENNAS

arranged to lie in the planes of principal curvature. If j has components j h j 2' and i, parallel to the coordinate axes

i3

= axj, + byj2

(6.113)

because j is tangential to the surface. Also

Tzi = { {(k. 'I' z)j - (k .j)'I' z} as, and now t/J

= e- ik'/41tr where ITzil

Now

r2

= x 2 + y2 + Z2.

Hence

~ f.~d {1~~I(ljll + Uzl) + U3I(1~~1 + 1~~I)}dSq.

Izl/r 2 ~ lal + Ihl and so, if (6.113) is substituted, ITJI ~ B

f

,~d

UI as, r

for some finite constant B. Hence, in general

ITJI2

~ B2{ r

iii

JIxp-xql ~d [x, -

~

B2

"

i

dSq

xql

1_12 U

dS

IXp-xql~d [x, - xql

q

}2

i

dS

_q

IXql~d IXql

by the Schwarz inequality. Consequently

II T:J11 2 =

fs IT:J1

2

dS

~ B {f f 2

lil

z

s s [x, - xql

~B f 2

s

UI 2 dSq

~ B2 11i 112

f

as, dSq } dS p

i i Ixpl R o, a scattered wave has the representation P=

e-

L Pn(O, l/J)/R n+ 00

ikR

(6.140)

1

n=O

where the Pn satisfy the recurrence relation

2 -a (. 1 -a-Pn + ~_.sin f) -apn) + -.-22 + n(n + I)Pn = 0 2

.

2lk(n + I)Pn+

1

SIn

ao

0 00

SIn

0 dl/>

(n

= 0, 1, ...) (6.141)

in order to comply with (6.139) (Jones 1986). Remark that, if Po P is identically zero outside S. From (6.140)

op + (ik +~)

R P

oR

=

_e-ikR

~

npn = _e- ikR

c: Rn+2 n=O

= 0, the field

~ (n + I)Pn+l. c: R n+ 3 n=O

Therefore

2(ik + ~){:~ + (ik + ~)p} 00

= _e- ikR L

n=O

{2ik(n

+

I)Pn+l

+ n(n +

l)p,. - n(n -

l)p,.}/R n+

3

•

The last term on the right-hand side vanishes at n = 0 and n = 1. Its omission causes an error of O(I/R 5 ) therefore. The remaining terms can be modified by (6.141). Hence

2p} {_1_ ~ (Sin 8 ap)+_l_ a sin of) sin? f) ol/J2

2(ik+~){ap +(ik+~)P}=~2 R

oR

R

R

0 00

(6.142)

with an error of O(1/R 5 ) . Equation (6.142) constitutes an absorbing boundary condition for a spherical artificial boundary. Since the right-hand side involves only tangential derivatives of p the relation is, in essence, one between the values of p on a sphere and its normal derivative. Formulae which commit even smaller errors have been constructed by Bayliss et ale (1982) by applying the operator alaR. They can be converted to relations between tangential and normal derivatives by invoking (6.139). However, their increased accuracy is offset by substantial complication in the equation so that they are used rarely in practice. It is not always convenient to employ a sphere as artificial boundary. For other shapes a relation between tangential and normal derivatives analogous to (6.142) is still feasible though it has to be expressed in terms of the surface operators Div. Grad, Curl defined in the appendix to this chapter. It is

368


(Jones 1992b)

{1 - t[H + (H 2 - K)I/2J}{~: + (ik + H)P} = 2~k { 1 + ~ [H - (H 2 - K)I/2J}{(H 2 - K)p + Div Grad p}

+ 2~k Div[Curl(n 1\

Grad p)],

(6.143)

where H, K are the mean and Gaussian curvatures respectively. The notation [ ]t signifies the tangential component of the vector. The unit vector n is the outward normal to the boundary and op/on is the normal derivative. A simpler version which yet seems tolerably accurate was proposed by Jones (1988a). In two dimensions an alternative has been suggested by Mittra et ale (1989). Another form, for Laplace's equation in three dimensions, has been given by Khebir et ale (1990). A comparison of the predictions of the two-dimensional version of (6.142) with an exact solution has been made by Mittra et ale (1989). They conclude that the absorbing boundary condition is acceptable provided that the scattered field does not include to any great extent cylindrical harmonics above the first 40. If the surface of the scatterer has too many significant wrinkles, predictions from the absorbing boundary condition can be expected to display appreciable errors. Next, absorbing boundary conditions for Maxwell's equations will be derived. Since the Cartesian components of E satisfy (6.139) we can state that 00

E=e-

ikR

L

n=O

c n(O, 1; in general K = 0 will not be a solution. When m = 1 we have

H\2)'(Koa) ~ (2) ~ H 1 (Koa)

1 (1

--

Koa

2 2

+ Koa

I

)

n Koa ·

The highest-order singularity 1/(Koa)4 is again removed by cancellation. The next-largest terms are of order (In Koa)/(K oa)2 and these cannot be eliminated unless KJl (Ka) is small. To agree with this, one possibility is OJ2(J1.B - J1. oBo) ~ 0

and there is then no cut-off frequency for the hybrid wave with m

= 1.

In general, (6.154) has to be solved numerically though a graphical approach is a helpful adjunct. For example, xa and iKoa are taken as Cartesian coordinates and the Hankel function is replaced via H~)(z) = (2i/n) exp(tmni)Km(iz). Then the curve whose equation is (6.154) is drawn. The curve (Ka)2 + (iKoa)2 = (l)2 a2(J1.B - J1. oBo) is then traced. The required values of a. may be deduced from the intersections of the two curves. There is no difficulty in confirming that for all these modes the Poynting vector is normal to the cylindrical surface when Ko is purely imaginary. Consequently there is no net energy flow out of the cylinder. The modes can therefore be called surface waves, i.e. they are waves which propagate along the interface between two different media without any transfer of energy across the dividing surface (other than that necessary to make good resistive losses). The possibility of complex a. should not be ignored. That such values are feasible may be recognized by returning to the approximation (6.158) and allowing Ko to be arbitrarily complex, but keeping 11 as a real constant. Let "oa = d e icJ where d « 1. For 0 ~ ~ tn

s

d 2 e 2 icJ (tni

+

bi

+ In td) = -

2/11

if the term involving y is dropped. The real and imaginary parts of this equation rnay be rearranged as

1d = exp{ -

(tn

+ b) cot 2b},

d 2 = ~ sin 2 0 so that il P < Kia < I and any solutions occur in a different regime from those of (6.159). The curve of (6.160) increases monotonically with b, starting from the value 0 at ~ = O. In contrast, that of (6.161) increases steadily from 0 at b = 0, passes through a maximum for some b < tn, and then falls steadily to zero at ~ = tn. Hence there is precisely one solution and it lies in the interval 0 < ~ < in; as t1 --+ 00 it converges on b = 0 while simultaneously d --+ O. In this wave K o has a positive imaginary part and-so the field grows as r --+ 00. Such waves have, on occasion, been deemed leaky waves. As Kia --+ iO,P+I' '1 --+ 00 and b -+ O. This suggests that the leaky wave which originates for Kia just below io,p+ I converts to the (p + l)th surface wave as K1a traverses the surface-mode cut-off frequency. Furthermore, cos 2b is positive at the solution of (6.160) and (6.161) so that ~a.2 < w 2JJ. e. For ~ and d near zero, Ja. will be of the order of d2 and we deduce that {Jfa. < w(jle)I/2. Consequently, the leaky wave travels with a phase speed faster than light along the z axis. There is, in addition, a radiation loss associated with this fast wave.

io,p+

Exercises 63. An optical fibre consists of a circularly cylindrical inner dielectric core of refractive index N, surrounded by an annular cladding of smaller refractive index N2 • Surrounding the cladding is a dielectric with a refractive index N3 not far from N2 • Set up the equation governing the modes of propagation. If the outer medium is constructed of black glass so as to be highly lossy while the core and cladding are only slightly lossy are there any convenient approximations which can be made? At a wavelength of 0.9 urn with N, = 1.61 extensive numerical investigations for (a) core radius 20 urn, cladding thickness 5 urn, N2 = N3 = 0.99Nh (b) core radius 40 urn, cladding thickness 5 urn, N2 = N3 = 0.96N 1 have been carried out by Roberts (1972, 1973, 1975). Further details can be found in Snyder and Lore (1983). 64. Tackle the problem of propagation along a circular dielectric rod by assuming that the exterior field satisfies a surface radiation condition on the surface of the rod. Compare the modes so obtained with the exact ones (Jones 1989).

6.26 Modal excitation The initiation of modes on an infinite dielectric rod by a given source will be the topic investigated within this section. To fix ideas, the source will be taken as the magnetic frill of §6.3 (Figs 6.3 and 6.4). The incident field is then expressed as in eqns (6.17)-(6.22) with K replaced by Ko to conform to the notation of the preceding section. The scattered field may be assumed to be represented by similar integrals but with the integrands being expansions of the type at the beginning of §6.25. In fact, for the excitation considered here, it is sufficient to

378


limit the series to m = 0 and put bo = 0 and do = O. Apply the boundary condition of the continuity of the tangential components of the field at r = a. Then it will be discovered that the total field (i.e. the sum of the incident and scattered fields) is given in r > b by (cf. Becker and Meister 1973) H -

~-

_1 4

W80

b fOO [H12)(Kob){eoKJt(Koa)Jo(Ka) - KoeJo(Koa)Jt(Ka)} _ 00 8K oJl (Ka)Hlil(Koa) - 80KH\2l(Koa)Jo(Ka)

+ J 1(KOb)] H \2l(Kor) exp( -iaz) de while on r

(6.162)

=a (6.163)

At first glance it may look as though (6.162) and (6.163) have branch points at the zeros of K and K o. However, study of the numerator and denominator reveals that the behaviour is such that the zeros of K do not correspond to branch points so that the only singularities are due to K o (and any poles which the denominator may produce). To ensure that no poles lie on the path of integration (Fig. 6.5) slight dissipation is assumed to be present so that poles for ala. > 0 are displaced downwards and those for ala. < 0 upwards. One might venture to calculate the field on the rod approximately by deforming the contour of integration into that of Fig. 6.6. The drawback to such a procedure is that in the deformation to the position shown by broken lines poles corresponding to leaky waves might be passed over. To avoid their consideration it is convenient to redraw the branch line from a. = k (k 2 = ro2 Jl. oeo) as shown in Fig. 6.21 and deform the contour of integration into Ct. Of course, such a deformation will introduce contributions from poles answerable for surface waves. For large [z] the important part of C t is near (X = k and so, to a first approximation, K may be treated as the constant K 1 • Thus (6.163) becomes approximately

H4>

- 1 b . - roeo 11 1 4n

f Cl

H\2)(K ob) exp( -ia.z) d (2) (2) a. aK H 2 o11 0 (Koa) - H 1 (Koa) 1

~ [H\2)(KOb)J 1(Ka) exp + cueeo -b i..J a

p

dL/da.

(')J -ta.z

(6.164) «=«p

where (6.165) and (Xp is a typical zero of L which becomes real when the dissipation is removed. For the surface waves K o is negative imaginary and the Hankel function may be befittingly replaced by the modified Bessel function K as in the previous section.

379

DIELECTRIC ANTENNAS

-k

c, Fig. 6.21. Contour to avoid leaky waves.

When b = a, the integral in (6.164) can be simplified by introducing the variable of integration t = Ko/k and combining the integrals on the two sides of the branch cut. The result is

Htj)

1

= - - 2 weo'12ka2 2n

foo texp{ -ikz(l2

(1 - t)

0

1/2

t 2)1/2} M

(.)J

~ [H\2)(Koa)J1 (xe) + weeo f..J exp -taz p

dL/da

dt (6.166)

«=ap

where M

= {!akt1tJo(kat) -

J1(kat)}2

+ {!akt1tYo(akt) - Y1(kat)}2

(6.167)

and (1 - t 2 ) 1 / 2 is negative imaginary when t > 1. The far field where kr » 1 may be evaluated from (6.162) by means of the method of stationary phase, the same device as was utilized for (6.33) being adopted. Any poles encountered in moving the contour to the path of stationary phase can be neglected because they will have J Ko < 0 and so make a contribution exponentially small in comparison with the point of stationary phase. Hence

(6.168) where now K = (w 2JL e - k 2 cos? fJ)1/2. Any consistent choice of the radical in K may be employed since no branch point is involved. If b = a, (6.168) simplfies to

i exp(-ikR) ZoHtj) "" -- ek J1(Ka) n R x {ek sin fJJ 1(Ka)H b2)(ka sin fJ) - eoKH\2)(ka sin (J)Jo(Ka)}

-1.

(6.169)

The radiation conductance G may, as pointed out at the end of §6.3, be

380


calculated from the real part of (6.166) with z = O. Thus

(6.170) Some general observations can be made on the strength of these formulae, especially when eJl » eoJlo. Suppose, firstly, that koa(eJl - eoJlo)I/2 « (eoJlo)I/2. Then no surface wave or leaky wave can be initiated and the residue terms disappear from (6.166) and (6.170). Also (6.169) becomes roughly Z oH4J

f'toJ

I (k )2 . 8 exp( - ikR) tn a SIn

R

while (6.170) gives

on approximating M by 4/(nkat)2. Consequently, the pattern is that of a small loop, the main effect of the dielectric being to alter the magnitude of the field. For 1 < ka(eJ.l/eoJ.lo - 1)1/2 < JOI ~ 2.4, ka will be quite small. The denominator of (6.169) will have a complex zero corresponding to a solution of (6.158) and the phenomenon of a leaky wave is manifest. When 8 is near the real part of the complex angle, the radiation pattern will have a maximum. The rod is acting as a fairly efficient radiator, the leaky wave on the rod decaying exponentially with distance from the magnetic frill. Confirmation of the radiation is further provided by G which is still essentially equal to the integral term in (6.170). As ka(eJl/eoJlo - 1)1/2 approaches 2.4, Itli --+- 00 and the cut-off frequency of the surface wave is near. The radiation pattern and conductance transform to those of a metal wire of the same diameter. The current now falls very slowly along the rod and the radiation of energy is inefficient. In fact, the conductance behaves as if the rod were metal for quite a wide range on either side of 2.4. With 2.4 < ka(eJl/eoJlo - 1)1/2 :(1) >:(2/)

(6.171 )

where Yoo is the admittance of the infinite antenna and 2(l1 ka )2 fOOt exp{ - ikz(1 - t 2)1/2} Y(z)=-dt 0 (1 - t 2 ) 1 / 2 M

-z,

from (6.166)

Exercise 67. If there is a well-established surface wave ~ can be regarded as independent of 1. Determine the simplification which occurs in (6.171).

6.28 General shapes When the dielectric is not a circular cylinder there is little hope of finding analytical solutions unless the obstacle has a very simple shape like a sphere or special conditions (e.g. low frequencies or small refractive index) are applicable (Jones 1986). General shapes, therefore, can be tackled only by numerical techniques. In this section, some of the various methods will be set out. It will be presumed that the dielectric occupies the region S_ of Fig. 6.15 and that J1 and s supply measures of its permeability and permittivity respectively. In S + the corresponding quantities are taken as the constants J10 and eo respectively. To accommodate fundamental solutions in both regions it will be convenient to make a slight change of notation and put k 2 = w 2J1 e, k~ = w 2 J1of,o with

ljJ o(x, 1;)

= exp( -

ikolx - 1;1) .

4nlx - ~I

An incident wave E i ,

Hi

strikes the dielectric obstacle from S+. As a result,

382


the body generates in S+ a scattered field E s ' H s which has to satisfy the radiation conditions at infinity. In addition, a field E t , H, is transmitted into S_. The tangential components of the total field are to be continuous across S, i.e. n A (E i + E s ) + = 0 A (E t ) _ , 0 A (Hi + Hs ) + = 0 A (H t ) _ . (6.172) Inside the dielectric Maxwell's equations can be written as curl E

+

iWJloH = iW(Jlo - Jl)H,

curl H - iweoE = iw(e - eo)E. In this context, the influence of the dielectric is represented as attributable to volume distributions of electric and magnetic current placed in the same medium as the exterior. Consequently, the total field E, H can be expressed as

E(P)

= Ei(P) + (grad div + k5)

L-

(e/eo - l)E(Q)",o(P, Q) dXQ

- iw curl H(P) = Hi(P)

+ (grad div + k5)

L-

L-

(J1. - J1.o)H(Q)",o(P, Q) dx Q,

(6.173)

(J1./J1.0 - l)H(Q)",o(P, Q) dXQ

+ iw curl

L-

(s - eo)E(Q)",o(P, Q) dx Q• (6.174)

This representation has continuous tangential components and therefore automatically satisfies the boundary conditions (6.172). The application of (6.173) and (6.174) with P in S_ delivers integral equations to determine E and H inside the dielectric. Substitution of these values in the integrals of (6.173) and (6.174) when P is in S+ then provides the field outside the dielectric and the problem is solved. One tremendous advantage of these volume integral equations is that there is no necessity for e and Jl to be constant. They are therefore available for arbitrary inhomogeneous dielectrics. Indeed, replacing E, H by E', Hi in the integrals is the standard way of obtaining Rayleigh scattering as a first approximation to an iteration procedure which may be expected to converge if the combination of frequency and material deviation is suitably low. Unfortunately, (6.173) and (6.174) suffer from a grave defect from a practical point of view. In effect, six scalar three-dimensional integral equations have to be solved simultaneously. The effort demanded of the computer is therefore at least two orders of magnitude greater than that for a metallic scatterer. For this reason a good deal of attention has been devoted to finding other formulations. Here the methods appropriate to arbitrary isotropic non-conducting dielectrics will be discussed though the principles carryover to more general

383

DIELECTRIC ANTENNAS

,/~/'/" I

........

...... "'",~ Incident

,,

\

(62

\

\

I

\

\

I

,

\

\ \

,,

,

.... ......

,, ...

,, , ,, , \

I

\

\

\

I I

\

\

wave

,,

, ,,

I

I

.....

Fig. 6.22. Scattering by a dielectric.

dielectrics with much elaboration of detail. The special case of the homogeneous isotropic dielectric is studied in the next section. Basically, all the formulations are the same initially. A boundary B is drawn enclosing the dielectric (Fig. 6.22) and inside B solutions are generated by finite differences or finite elements. The methods differ in the location and shape of B as well as in the treatment of the field on it. In the first case B is taken sufficiently far away from S for an absorbing boundary condition to be applicable. Often the shape of B is circular or spherical but it can be adjusted to accommodate any special features of S. The problem has become now a normal interior one for finite differences with boundary conditions on Sand B. As regards finite elements suppose that (6.153) is being employed. Select representative functions Vi' V 1, ••• , V N and assume

Then, putting v = VI' ... , VN successively in (6.153) leads to a matrix equation for the an which can be tackled by any suitable method such as conjugate gradients. If p vanishes on S the vj will be chosen to be zero on S so that the assumed expansion for p satisfies the boundary condition; in addition, the integral over S will disappear from (6.153).The integral is removed automatically if the normal derivative of p vanishes on S. When the scatterer is penetrable the Vj should be selected to be continuous through S when p is; then the integral over S can be transformed into a volume integral over S_ (similar to that over T) by applying the boundary conditions satisfied by p in the transition through S. More complicated arrangements may be necessary to attain the transformation when p is not continuous across S. The procedure can be adapted to vector fields. The second approach makes B coincide with S and applies a surface radiation

384


condition. For instance, we could require Es ' Us to satisfy (6.152) on S. Substitution from (6.172) furnishes a relation between the tangential components Et , " t . Thus, the problem has been converted to finding a solution Et , H, inside S_ which complies with this relation. The converted problem is amenable to finite differences or finite elements. As has been pointed out the absorbing boundary condition is not entirely successful in keeping out unwanted reflections. Methods have been suggested which avoid using it. One of these is the unimoment method (Mei 1974; Chang and Mei 1974, 1976; Morgan and Mei 1974, 1979; for dielectric obstacles in waveguides see Mur et al. 1976). For simplicity of description B will be taken as spherical but other choices are permissible. Moreover, only solutions of Helmholtz's equation will be discussed; the principles carryover to Maxwell's equations. The fundamental idea is to find out what happens to a basis function defined on B in a truly radiating field. For simplicity again the spherical harmonic will be chosen as the basis function but selections from other complete sets are perfectly acceptable. With p = Y~ on B the interior problem is solved by finite differences or finite elements. It will be assumed that there are no solutions when p = 0 on B so that the question of non-uniqueness does not have to be raised. This assumption would have to be verified in practice since it is not transparent that a dielectric loaded sphere would not resonate at inconvenient frequencies. Given a unique solution, the values of the interior normal derivative on B can be derived, say op/on = qnm. It may be that qnm will emerge as a constant multiple of but, in general, it will contain contributions from other spherical harmonics. Now suppose that the total field on B can be represented adequately by

Y:

Y:

p=

L PnmY~' op/on = L v.;«; n,m

n,m

(6.175)

as determined by the internal calculation. The total field on B is known to consist of the incident field pi and a radiating field p S• If both of these are expanded in terms of Y~ we must have

P~m

= Pnm - P~m

in an obvious notation. However, since pS is radiating, pS =

L P~m Y~h~2)(kR)/h~2)(kRo)

n,m

outside B, R o being the radius of B. Continuity of the normal derivatives through B demands

L Pnmqnm = L {kP~mh~2)/(kRo)/h~2)(kRo) + [oP~m/OR]R=Ro} Y~.

n,m

n,m

Sufficient information has been acquired to enable the determination of the Pnm or P~m·

385

DIELECTRIC ANTENNAS

One disadvantage of the unimoment method is the necessity to know pS exactly outside B. In effect, this limits it to those B which fit separable solutions of the governing equations. The limitation can be surmounted by employing an integral representation for the field outside B, e.g. (6.75) with S replaced by B. Allowing the point of observation to tend to B provides the information to fix Pnm though the theory of integral equations is called on. Because the unimoment method handles individual basis functions the potential exists for splitting off from the incident field some harmonics to be dealt with by the unimoment method while the scattering of the rest of the incident field is resolved by an absorbing boundary condition. Whether this is reasonable depends on the extent to which qnm contains harmonics other than y~. If qnm is relatively free of other harmonics a reasonable splitting should be attainable. For a comparison of the relative effectiveness of the methods for tackling general dielectrics see Peterson (1989).

Exercise 68. The following is a selection of problems for trying out methods: (a) the sphere with invariable interior; (b) the sphere with material changing radially; (c) a sphere composed of segments of constant materials; (d) an annular circular cylinder; (e) an elliptic cylinder; (f) a square cylinder; (g) a biconical antenna; (h) a lens. The incident wave might be plane or come from a point dipole.

6.29 Homogeneous isotropic dielectric It has been observed that the volume integral equations are expensive in computer effort. Therefore, it is worth attempting to find surface integral equations when e and u are both constant in order to save an order of magnitude in computer expenditure. From this point onwards s and J.l will be regarded as unvarying in S_. From (6.75) and (6.76)

E.(P)

=

Is [{n

q A

E.(q)}

A

grad, r/Jo(P, q)

+ {nq.E.(q)} grad, r/Jo(P,

-iwJlo{n q H.(P) =

Is [(n

q A

H.(q)}

A

grad, r/Jo(P, q)

A

q)

Hs(q)}t/Jo(P, q)] dS q ,

+ {nq.H.(q)} grad, r/Jo(P,

+ iweo{Oq

A

Es(q)}t/Jo(P, q)]

(6.176)

q)

as,

(6.177)

where P E S+ and both right-hand sides vanish identically for P in S_.

386


Similarly

Et(J~) = - Is [(nq A

Et(q)}

grad,

A

ljJ(~, q) + {nq.Et(q)} grad, ljJ(~, q) - iWJl{ Oq

Ht(~) = - Is [(nq A

Ht(q)}

Ht(q) }t/J(~, q)J dS q ,

(6.178)

ljJ(~, q) + {nq.Ht(q)} grad, ljJ(~, q)

grad,

A

/\

+ iwe{oq /\

Et(q)}t/J(~, q)]

as,

(6.179)

for 1>; in S_ and the pair of right-hand sides is identically zero when 1>; is in S+. Let

iWPt

=

Then, from (6.81)-(6.84), n , E, that

n /\ E,

n.E,

= j: -

iwp;

-Div jo

=

-Div j;.

= - Pt/e and n. H, = - P; / JI.

0 /\

Ei ,

= - jt - 0 /\ Hi, n.H, = -o.Hi - p;/Jlo

0 /\

= -Pt/eo - o.Ei ,

Also (6.172) implies

H,

from (6.79) and (6.80). Now, if E, and H, are replaced by E i and Hi in the right-hand sides of (6.176) and (6.177) the integrals give - E i and - Hi for P in S_ because the sources of the incident field are outside S. Combining these integrals with those of (6.176) and (6.177) for a point inside S_ we obtain op /\ lim Pi-P

f

{j; /\ grad, t/Jo(1);, q) - Pt grad, t/Jo(P;, q) eo

+ up /\ lim

f{ -it

Pi-P

A

grad, ljJo(~, q) -

iWJlojtt/Jo(~, q)}

P;

Jlo

grad,

as, =

The next objective is to rid the integrands of p, and

Pi-P

f S

p, grad,

t/Jo(~, q) as, -

lim P-p

= np

f S

A

Ei ,

(6.180)

ljJo(~, q)

+ iweoj;t/lo(.l~, q)} dS q =

op /\ {lim

-up /\

p;.

-up /\

Hi. (6.181)

With P in S +

Pt grad, t/J(P, q) dS q }

Is p, grad, {ljJo(p, q) -ljJ(p, q)} as,

387

DIELECTRIC ANTENNAS

because (6.90) and (6.91) hold for both t/J and t/Jo. But

I

PI gradq{",o(p, q) - "'(p, q)}

as, = -grad p = grad,

=-

f

S

grad,

I

PI{"'O(P, q) - "'(p, q)}

as,

{"'o(p, q) - "'(p, q)} div jl

~Sq lW

fj S

I·

grad, {'"o(p, q) _ "'(p, q)}

~Sq lW

from (6.78). The right-hand side of (6.178) is zero in S +. Therefore, if we take the multiple en A of it and add it to eo times (6.180) we derive via (6.90) and (6.91)

-!(E

+ Eo)j; + up r.

Is [j; r. grad, {Eo"'o(p, +

ijt{k~t/lo(p, q) - k 2t/1(p, q)}/w

- (j

= -eoD p

q) - E"'(p, q)}

A

t

.gra

d) grad, {t/Jo(p, q) - t/J(p, q)}] dS •

q

lW

Ei .

q

(6.182)

Similar operations with (6.181) and (6.179) lead to

t(j.t + Jlo)jl + up /\

Is

[jl /\ grad, {Jl"'(p, q) - Jlo"'o(p, q)}

+ ij;{k~t/lo(p, q) - (j;.grad

=

-JloD p A Hi.

q

)

k 2t/J(p, q)}/w

grad q {"'o(P: q) - "'(p, q)}] as, lW

(6.183)

The integral equations (6.182) and (6.183) are the duo sought. They constitute four scalar simultaneous linear integral equations and are appreciably easier to tackle than (6.173) and (6.174) but substantially harder than if S were a perfect conductor. It can be shown that the operators in (6.182) and (6.183) are compact, but the proof will not be given here. The matter of uniqueness will be settled in the next section.

388


6.30 Uniqueness for the homogeneous isotropic dielectric If the right-hand sides of (6.182) and (6.183) are placed equal to zero the integral equations may possess a solution jo, j~. If so, (6.182) and (6.183) are not uniquely soluble. Therefore, to guarantee uniqueness we want to show that jo and j~ must be identically zero. Define Po and p~ by iwpo = - Div jo, iwp~ = - Div j~. Then construct the following fields:

1 Is {-iwJljot/t(P, q) - j~ H = Is {- iwej~t/t(P, q) + jo E =

1

E2

=

A

grad, t/t(P, q) +

:0 grad, t/t(P, q)} dSq,

(6.184)

A

gradqt/t(P, q) +

~ grad, t/t(P, q)} dSq,

(6.185)

Is {-iwJlJ't/to(P, q) - j~

A

grad, t/to(P, q) + :: grad, t/to(P, q)}

as, (6.186)

H2 =

Is {-iweJ~t/to(P, q) + jo

A

grad, t/to(P, q) + :: grad, t/to(P, q)}

as, (6.187)

The integral equations satisfied by jo and j~ have been built in such a way that

eon /\ (E 2 ) -

= en /\ (E t ) +, J.lon /\ (H 2 ) - = J.ln /\ (H t ) _ .

(6.188)

Now the equations satisfied by (6.184) and (6.185) in S+ can be expressed as curl eEt

+ iwe(J.lH t) = 0,

curl JlH l - iWJl(eE t)

=0

whereas, in S_, (6.186) and (6.187) give

Moreover, the radiation conditions at infinity may be set forth as

R{JlH 1

+ (~r/2eEl

R{ -eEl - a

A

JlH1 (~)

A

a} ~

0,

/2} -1 ~ O.

Therefore the field JlH l, -eEl in S+ and J.l OH2 , -eoE 2 in S_ is an electromagnetic field which satisfies the radiation conditions and, by (6.188), has continuous tangential components. Let us assume that such a field must be identically zero. Then (Ej}, = 0 and

389

APPENDIX

(H 1) +

= O.

Hence, from the jump in the representations across S,

Similarly

Consider the field E 2 ' " 2 in S+ and -E 1, -HI in S_. By what has just been said its tangential components are continuous through S and it obeys the radiation conditions at infinity. The assumption at the beginning of the paragraph makes the field identically zero. It follows that io == 0, io = 0 and the uniqueness of (6.182) and (6.183) is established. The assumption in the last paragraph can be justified as follows. If E, H is a field with the given properties

r (E

JaR

A

H*

+ E*

A

H).o an

= O.

Rewriting the integral as in §6.22 we deduce that JaR IEI 2 dO -+ 0 as R -+ 00 and this is impossible unless E and H are identically zero. The uniqueness property of (6.182) and (6.183) makes them similar to the CFIE for a perfect conductor. Despite being known for some years such equations do not seem to have been deployed for numerical purposes until recently (Rao and Wilton 1990).

Exercises 69. Show that the uniqueness demonstrated above holds for complex permeability and permittivity provided that 9l(ime) ~ 0, 9t(iwp) ~ 0 and 0 ~ ph w 2poeo > -1[. Deduce that (6.182) and (6.183) have a unique solution for complex material constants. 70. Prove that the unique solution of (6.182) and (6.183) does furnish a solution of the dielectric scattering problem. 71. Reformulate the theory of this section for two-dimensional scattering.

A P PEN D I X: Geometry of surfaces Assume that the surface S is specified by the two parameters 0'1, 0'2 so that a point on it can be designated by x(a 1,a 2 ) . The curves 001 = constant and 0'2 = constant need not be orthogonal though it is convenient often to select them as lines of curvature. The direction of the unit normal n to S will be chosen so that 0'1, 0'2, n form a right-handed system. In addition, the convention will be adopted that n is an outward normal when S is convex, i.e, n points away from the centres of curvature; this can be arranged always by labelling 1 00 and 0'2 if necessary. With this convention the principal curvatures are positive for a synclastic surface.

390


Let g'k }

ax ax

= -,'= gk'} au} auk

and g = gllg22 - gi2'

For notational convenience it will be assumed that a repeated affix such as k which occurs twice in a character or product means summation over k = I and k = 2. Thus akbk stands for alb l + a2b2, bJ for b~ + b~ but a k + bk and aj j} are unaffected by the convention. Then, the element of length ds can be expressed as j (A.I) ds 2 = dx .dx = gjk du dqk. The element of surface area dS is given by (A.2)

Related to gjk is e" where o" = g22/g, can be deduced immediately that

a" =

g21

= -gI2/g,

g22

= gIl/g. It (A.3)

where ~ = 1 if j = m and 0 otherwise. The vectors ox/ou l and ox/ou 2 are tangential to S and so I

ax

aX

O=--A-

.J gaul

ou 2

in accordance with the convention on the direction of n, It follows that n

whence

ax

1\

ax = _1_ (ga ~2 .J g ou

auk

/

2k

ax

Ol\-=ygg --, au

I

auk

ox

g2k ~)

01\-=

ou 2

(A.4)

au I

/

Ik

ox

-ygg --. auk

(A.5)

Although (A.5) is the preferred form usually there are times when (A.4) is more helpful. Since 0.0 = 1, we have o. o%ak = 0 which implies that o%qk is tangential to S. Consequently, it must be expressible in the form

~ = b{. oak

ox..

au}

(A.6)

The quantities b1e are related to the second fundamental tensor bjk of the surface defined by (A.7)

391

APPENDIX

because (A.8)

from (A.6). It can be shown that

K = b~b~ - bfb~

H = !b:,

(A.9)

where H is the mean curvature and K is the Gaussian curvature of the surface. When 0'1 = constant and 0'2 = constant are the lines of curvature b~ = "1' b~ = "2' bf = b~ = 0 where "1 and "2 are the principal curvatures; then

= "1"2· Observe that, if T is a tangential vector with T = Ti oxjaai and U is another H = !("1 + '(2), K

tangential vector

T. U

= gikTiUk.

(A.IO)

The surface gradient of a scalar u, called Grad u, is defined by Grad u

, au ax

= glk - ,l . -k.

(A.II)

aa aa

On account of (A.10)

(A.I2) from (A.3). The surface divergence of the tangential vector T, denoted by Div T, is defined by

J '

. T = - 1 -a, ( gTl).

DIV

Therefore

, DIV

(A.I3)

Jgaal

Grad u

1 a = ----: l

-i» aa

(J

g gl 'k -

au)

aak

(A.14)

and Div(uT)

= u Div T +

T! au,

aal

= u Div T + T •Grad u

(A.15)

by virtue of (A.12). Let C be a curve on S and let s be arc length along C. Then a unit tangent t to C is given by dx ax det' t=-=k

ds

with

0'1

and

(12

aa ds

expressed in terms of s on C. A unit normal v to C in S such

392


that v, t, n form a right-handed system is / ( lk ox da v=t/\n=",g 9 ---g oak ds 2

2k

ox da l )

--

oak ds

from (A.5). Thus (A.I6) on invoking (A.3) Now let C be a closed curve enclosing the portion

f

I

Div T dS

~

of S. Then

= f~ (J 9 r, do ' d0'2 O(J}

f

from (A.2) and (A.3). Removing the partial derivatives by integration we have

f

Div T dS

I

=

f ( C

-Je

d(Jl) ds =

d (J2

T? - - T2 ds ds

c

T. v ds

(A.I?)

by (A.I6); (A.I?) is the surface analogue of the divergence theorem for volumes. If S is a closed surface choose ~ to be the whole of S so that C disappears. We infer from (A.I?) that

Is DivTdS = 0

(A.18)

when S is closed. By applying (A.I8) to (A.15) we deduce that

Is

u Div T dS

=-

Is T. Grad

u dS

(A.19)

when S is closed. Other analogues of formulae for integrals in three dimensions can be derived. First, remark that, 1 (OU ox ou ox) n 1\ Grad u = 9 00'1 00'2 - 00'2 00'1

J

by virtue of (A.4) and (A.3). Evidently Div(n /\ Grad u)

= O.

(A.20)

On the other hand, integration furnishes

I

n

1\

Grad u dS =

Ie ut ds.

Furthermore, from (A.6) and (A.9), 2Hn~g

an

ax

an

ox

O(Jl

00'2

0(J2

00'1

= -/\ - - - / \ -

393

APPENDIX

so that, on integrating the derivatives of

from (A.5). Hence

0,

Ie uv ds = L(2Hun + Grad u) dS.

(A.22)

For the surface curl, called Curl, it is advantageous not to confine the definition to surface vectors. Assume that S is sufficiently regular for points nearby to be identified by coordinates of the type x( 0'1, 0'2) + rn where r is the (small) distance along the normal. Let W be a space vector which is defined in the neighbourhood where this representation is valid. Then Curl W is defined by Curl W = [curl W - n When W = W3n

+

1\

(A.23)

oWjor],=o.

Wi oxjoO'i this leads to

OX

·

Curl W = (Grad W ) A n + WJbjn A all' + n DIV(W An). 3

. k

(A.24)

The formula (A.24) may be confirmed easily when 0'1 = constant and 0'2 = constant are lines of curvature; it then holds generally because of its tensor nature. The middle term of (A.24) can be rewritten in various ways through

Note that Div(W

1\

n)

= n.Curl W = n.curl W.

It cannot be presumed that Curl Grad u

=0

(A.24)

holds in general nor that

394


Div[Curl W]t

=

0, t signifying a tangential element. Useful combinations are

[Curl(n /\ Grad u)]t = T1

Vg

and .

(OU k OU k) ox - 1 b2 - - 2 bl n /\ - k 0(1

0(1

0(1

ou)

1 0 (OU 1 -1 b2 2 - I - b2 1 - 2 T V g 0(1 0(1 0(1 v g

- Divj'Curlm /\ Grad u)]t = T

+

1

Jg

0

O(J2

(au bu

O(J2 -

Ou)

bl 2 O(JI

1

Jg'

(A.25)

7 TRANSIENT PHENOMENA Most electromagnetic transmitters operate long enough at a single frequency for the analysis by time-harmonic waves to be appropriate. However, it is possible to produce short pulses with a broad frequency spectrum so that predictions of events for general time variations is desirable. The effects of the radiation from lightning discharges and of high-powered optical pulses then become amenable to investigation. This chapter is, therefore, devoted to the problem of transients. 7.1 Finite methods The general problem is to solve equations of the form curl E

+

a =u- ,J, at

aE = J at

curl U - e -

J.1 -

(7.1)

where J and J' are known electric and magnetic currents. The quantities Jl and e may depend on the space variables but do not vary with the time t. Causality insists that there shall be no disturbance until the source is switched on. Thereafter the field propagates behind wavefronts which travel with speeds and in directions characteristic of the medium. No energy flow can be detected by an observer before such a wavefront has passed over him. This property affords the opportunity of avoiding the difficulty with harmonic waves of satisfying the radiation conditions at infinity on a mesh of finite size which led to the introduction of absorbing boundary conditions. For electromagnetic pulses the role of the radiation conditions is taken over by causality. There is now a clear-cut wavefront, ahead of which there is no disturbance. Therefore the mesh does not have to go off to infinity; it merely has to extend as far as the most distant wavefront which has emanated from the obstacle. The progress of each wavefront in time and space can be traced from its initiation and no artificial boundary to account for the behaviour at infinity is needed. Nevertheless, there may be circumstances when an absorbing boundary condition may be called into play. For example, in calculations over a long time the disturbance may have spread far enough for the number of mesh points to have become unmanageable. In such cases one may wish to deploy the analogue of (6.142) in a homogeneous medium. It is

1){op (1 a I)} R1{Isin e aea(. 2 ~ at + R aR + ~ at + R (

10

P

=

2

SID

ap)

e ae

1 a2p} + sin e ac/>2 2

396

TRANSIENT PHENOMENA

where v is the speed of propagation of the waves. This condition can be derived from 00 p = L Pn(vt - R, (), O. The problem is then very similar to that of §6.14 and the MFIE can be formulated as

-tj(p)

+ up

A

Is j(q)

A

=

Is {(uq.ho) grad, 0). In either case the standard exterior uniqueness theorem obliges the electromagnetic field to be identically zero in the exterior S+. Therefore DAb + = 0 and hence j = O. Hence it has been demonstrated that (7.25) is non-unique only when either the field produced by j in S_ is non-zero and 9ts = 0 or the field in S_ is identically zero and 9ls < O. The former corresponds to interior modes of magnetic resonance whereas the latter are exterior modes of electric resonance since D A e , = O. On account of 9ts < 0 the exterior modes grow at infinity rather than diminish. Both types of mode exist for the sphere and so must be allowed for in scattering by general obstacles. Consider now the electromagnetic field in which

Is j' grad, c/J(P, q) dS b(P) = Is {esj' c/J(P, q) + s~ Div J' grad, c/J(P, q)} dS e(P)

=

A

q,

(7.42) q•

If j' is the tangential component of the electric field in an interior magnetic mode e vanishes identically in S+ by the discussion after (6.106). Then D A e , = 0 and

ti' + Dp A

L

j' A grad, c/J(p, q) ss,

= O.

(7.43)

If (7.42) is used as the representation of the exterior electric mode then (7.43) also results on account of the boundary condition. Thus with each solution of the homogeneous form of (7.25) can be associated a solution of (7.43), the j' being the tangential component of the interior electric field of the magnetic mode (9ts = 0) and of (7.42) (als < 0). We wish to show that the reverse is also true so that (7.25) and (7.43) (and thereby (7.27) and (7.32» stand or fall together in the matter of uniqueness.

419

TRANSIENT PHENOMENA

The dimension of the space of solutions of (7.43) is the same as that of its adjoint which, as has been seen in §6.14, has elements n 1\ j •. Therefore (7.25) and (7.43) have the same dimension. Hence the solutions of the homogeneous form of (7.25) are not one-to-one with those of (7.43) only if there is one such that the electric field produced by it satisfies n 1\ e _ = 0 when ~s = o. However, n 1\ e, = 0 with the consequence n 1\ h, = 0 from which follows j = O. Hence the relationship must be one-to-one and the case 9ls = 0 has been dealt with. When fits < 0 let the field be defined by (7.42) with j' satisfying (7.43). Then, from (6.86)

hOD =

-1

{(n

A

h)

A

grad, eJ>(~, q)

+ sen A

+

Div(n

(n

A

h)

A

grad,

1\

e_) grad, cP(~, SJL

for ~ in S_. However, by virtue of (7.43), and (7.44) indicates that when s = s;

1

e_eJ>(~, q)

- 0 1\

q)} as,

(7.44)

e., = j'. Comparison of (7.42)

eJ>(~, q) as, = 0

(7.45)

Allowing ~ to tend to a point of S we deduce that n 1\ b satisfies (7.25). Remark that if n 1\ b = 0 then n 1\ e., = 0 so that j' = o. Once again, the one-to-one correspondence has been established. The kernel of (7.25) is semi-simple according to (7.37) if, for each 5,., there is a solution of the homogeneous version of (7.25) and a solution of (7.43) such that

Suppose that 9lS,. = O. Then, from the foregoing analysis, one j' can be associated with the electric intensity of the interior magnetic mode generated by j. Therefore, (7.46) will be met for 9lS,. = 0, if it can be shown that

fs

ab_ as

o.-

1\

e _ dS =/;

o.

(7.47)

This will be proved by assuming the contrary, i.e. it will be demonstrated that, when the left-hand side of (7.47) is zero, the field must vanish identically. Since

420

n

A

TRANSIENT PHENOMENA

h.,

= 0, the

assumption gives

Oh_ oe_} fs n. {- os e _ - h_ os dS = {e·(S8 oeos + 8e) + ah. SJlh - ae. see s: os os

o=

=

A

A -

f

1-

- h. (SJl ah

os

+ Jlh)} dx

(ee,e - Jlh.h) dx

(7.48)

from the divergence theorem and (7.24), together with the derivative with respect to s. The theory of§3.5 informs us that in any interior mode of oscillation e may be taken as purely real and h as purely imaginary. Hence (7.48) implies that e and h are identically zero. Thus the failure of (7.47) entails the disappearance of the electromagnetic field. Accordingly, (7.47) and thereby (7.46) must hold for a magnetic mode. In other words, all the poles of the resolvent which have 91s" = 0 are simple. The exterior electric mode is relevant when 9ls" < 0 and then the analogue of (7.47) is

f s

n .h ,

A

oe+

-dS #- O. OS

(7.49)

No proof has so far been forthcoming as to whether or not (7.49) is true for general obstacles. It is certainly valid for the sphere and one may conjecture that it has universal veracity. From now on it will be assumed that the scattering kernel is semi-simple. Then advantage may be taken of (7.41). Let j, be an orthonormal set of eigenelements of (7.25) when s = s; and select associated solutions j~ of (7.43) satisfies the adjoint of (7.25» such that (remember n A

j::

(Jle)1/21 j~(p).1 jr(q) /\ grad, exp{ -Sn(J.te)1/2I x p - xql} dSq dSp

=0

if m #- r but is non-zero if m = r. More briefly this may be expressed in inner product notation as .'*) -- 0 (AJ"· Jm (r :F m)

(r

= m)

with an obvious significance for the operator A. If the right-hand side of (7.25) be denoted by ji the solution of (7.25) may be expressed as

j(p) = -2j;(p)

-1

[(p, q).j;(q) dSq

(7.50)

where the resolvent [ is a dyadic. The dyadic has only simple poles in the s plane

421

TRANSIENT PHENOMENA

and near a pole s

= s; r(

- p,q

)

= (s -

S n

)-1 ~ j,(p){n q /\ j;(q)} c: (A . . '*) r J" J,

(7.51)

on account of (7.39) and (7.41). Since .[ is meromorphic a representation valid in the whole s plane could be constructed for .[ by Mittag-Leffler's theorem (Goursat 1942) but that is unnecessary for our purposes. Note that the effect of (7.51) is to give a term in j, via the coupling between the incident field and the interior tangential electric field accompanying the natural modes of oscillation. If e is the field of (7.42) and ji = n /\ hh the coefficient is proportional to n /\ e, • hi dS. Now, if hi has no sources inside S this is the same as n /\ e.. h dS at s = Sn' This last integral vanishes if is pure imaginary because then n /\ h = O. Thus the residue of disappears at any pole corresponding to an interior magnetic resonance when the incident field comes from outside S. In other words that is no coupling between the exterior incident field and the magnetic field inside at frequencies of interior resonance. Before leaving this theoretical investigation we derive some properties which are valuable when making deformations in the complex s plane. For a surface vector g define f by

Is

Is

f(p)

s;

r

= -!g(p) + Dp 1\ = -!g(p) +

Is g(q)

1\

grad, c/>(p, q) as,

Tg

in the notation of §6.16. According to §6.16, T can be split as T1 + T2 where T2 comes from a domain of S of diameter 2b surrounding p and, for any fixed s, b can be chosen small enough for IIT2g112 ~ 8111g\l2 for any 81 > O. The proof of this statement is unaffected if ~S is replaced by any larger value. Therefore, as f!is -+ 00, we first choose ~ so that II T2 g 11 2 ~ 8fllg\l2 and then make ~s so large that IITlgl/ 2 ~ efllgl1 2. Evidently, as ~S -+ 00,

- t1 +

Further

11(1- 2T)-lg - gil so long as

00

~

L

r= 1

T -+

(211 TglI)' ~

4e 111gl1 < 1. Allowing

8 1 -+

00

L

,= 1

-

t1.

(7.52)

(4e 111gll)' ~ 48 1\1gll(1 - 4e 111g1D- 1

0 we see that

(-t1 + tv : -+ -t1 (7.53) as Bls -+ 00. The behaviour as lsi -+ 00 in general needs more elaborate analysis. Let g be continuous on S and let the maximum value of Igl attained on S be denoted by maxlg], Then, for fixed ~ ( < 1), there is a finite K such that IT1gI ~ K exp{lsld(jl8)1/2} maxlg]

422

TRANSIENT PHENOMENA

where d is the maximum separation between any pair of points of S. On the other hand, (6.114) implies that IT2KI ~ K~ exp{lsld(jle)1/2} maxlg],

Thus, as lsi -+

00.

(7.54) Now considering only the variations of T with s, we observe that, for given p, f is an entire function of s which, by virtue of (7.54), is of exponential type and of order 1. Hence, by the properties of the minimum modulus (Titchmarch 1934), there are, for given p, arbitrarily large circles of radius lsi on which

+ ,,)lsl(jle)1/2} maxlg]

If(p)1 > exp{ -(d for any" >

o. Writing g = (-t + T)-lf this may be expressed as I( -t + T)-lfl < exp{(d + ,,)lsl(pe)1/2}lf(p)1

(7.55)

on some circles of arbitrarily large radius. Broadly speaking, (7.55) may be interpreted as stating that [is meromorphic and of order 1, its growth at infinity being dictated by the largest distance between any brace of points of S. Meromorphic functions of order 1 can be factorized as the ratio of entire functions of order 1 but that fact will not be needed here. On the basis of the method of moments Wilton (1981) has conjectured that, for convex bodies, there are contours progressing to infinity on which [(p, q)

"'-I

"'-I

R+ exp{ -slx p

-

xql(JLe)1/2}

R_ exp [slx, - Xql(Jle)1/2}

(Rs-+oo)

(7.56)

(Rs -+ - (0)

(7.57)

where R+ and R_ have algebraic growth at most at infinity. Obviously in Rs < 0 the contours must dodge the poles of [. Bearing in mind (7.30) we can see that iteration of the kernel will verify (7.56). As regards Rs < 0 consider what happens when the right-hand side of (7.57) is substituted for r. The integrand contains an exponential with exponent s{Jle)1/2{lx t - xql - Ii"p - x.l] but no other factor with exponential behaviour. When lsi is large the integral may be estimated by the method of steepest descent (Jones 1982, 1986). As x, moves over the body the exponent is stationary at x, = x p and x, = x q , the convexity of the body being presumed. The exponential contribution from x, = x q has exponent -slx p - xqIVte)1/2 which is the same as that of the kernel; the remainder of the kernel can be recovered by adjusting R_ suitably. On the other hand, the contribution from x, = x p balances I. Accordingly, it has been demonstrated that (7.56) and (7.57) are valid for convex bodies. They may hold for other shapes but the above argument suggests that this is not so in general since one can envisage boundaries where other stationary points have to be included in the asymptotic estimation.

423

TRANSIENT PHENOMENA

7.9 The impulse response Suppose that the antenna is acting as a perfectly conducting receptor under the influence of the illumination eo, ho from outside. Then the field outside may be expressed as

h(P)

= ho(P) -

Is j(q)

A

grad, cjJ(P, q) as,

(7.58)

leading to the integral equation

- tj(p)

+ up

A

Is

j(q)

A

grad, cjJ(p, q) dSq =

Up A

ho(p)·

(7.59)

The same equation could be obtained from (7.25) by the device of §6.14, relabelling j + D A ho as j and then changing the sign of boo From the theory of the preceding section the solution of (7.59) is, from (7.50),

j(p)

= - 2up

A

ho(p) -

Is Ep, q).

Uq A

ho(q) as,

(7.60)

which enables the determination of the total field from (7.58). The integral in (7.58), indeed, supplies an expansion in terms of the natural modes of the body but only those representing the exterior electric modes arise in view of the remarks of the preceding section after (7.51). Typical conduct emerges when the incident wave is plane and impulsive, its time dependence between decreed by a ~ function. Naturally, the performance for other incident fields can then be deduced by convolution. To focus thoughts let "0 = 10<x> f(t o)15{t - to - (Jl8)1/2 X } dt o = lof{t - {Jre)1/2 X } .

The assumed form of "0 makes ho = 10 exp{ -s(,ue)1/2 X} where X x + t o/(,ue)1/2. With Dp A 10 = -jo(p), (7.60) gives

j(p) = 2io(p) exp{ -S(Jl8)1/2 X p }

+

Is Ep, q) ·jo(q) exp{- S(Jl8)X

q}

as,

=

(7.61)

where X p = x p + t o/(J,l e)1/2, X q = x q + t o/(J,l e)1/2 in which x p , x q are the values of x at the point of observation x p and the point of integration x q respectively. Returning to the time domain we obtain for the current induced in the

424

TRANSIENT PHENOMENA

obstacle J(p, t) = -1.

21tl

f

C

+

i OO

estj(p) ds

c-ioo

where c is sufficiently positive for all singularities of the integrand to lie to the left of the contour of integration. From (7.61) where J 1(p, t)

= ~ f 0 will be imposed. When Rs --+ 00, (7.52) and (7.53) imply that (7.59) can be solved by iteration. Thus

it(p) = 40, /\

Is

io(q) exp{ _S(jl&)1/2 Xq } /\ grad, cP(p, q)

as, + iterates

(7.64)

as Rs ---. 00. It is evident that, for the first term on the right of (7.64), the contour in the integral for J 1 can be pushed to the right with a consequent contribution of zero to J 1 when T/(pe)1/2 < x q + Ixp - xql. It is readily verified that higher iterates in (7.64) make no contribution to J 1 under the same condition. The inequality holds for all xp and xq when T < x 1 (pe)1/2. Hence, no current is induced in the obstacle until the incident pulse strikes it. Consequently, the dictates of causality are obeyed. Furthermore, if the point of observation p is such that T < x p (pe)1/2, the inequality remains valid since x p ~ x q + [x, - xql. Thus, there is no current at a point of the surface before the incident pulse

reaches it. The fact that J 1 = 0 when the inequality is satisfied leads to another inference. For values of T such that Xl < T(J1.e)1/2 < X 2 the integration over S in (7.63) can be limited to those points x q of S for which x q + [x, - xql < T/{Pf,) 1/2. Of course, once T is large enough the whole of S will be involved. If T/(j.lG)1/2 > X2 + d the limitation on the growth of demonstrated in the

r

425

TRANSIENT PHENOMENA

last section empowers deformation of the contour to the left in (7.60). Only the poles of [ need to be taken into account and J 1 (p, t) =

L exp(slIT) L j~,(P~,. n

r

f

(AnJnr, Jnr >

S

Dq A

j~r (q).jo(q)

exp{ -sn(jJe)1/2x q } dSq (7.65)

when T/(pe)1/2 > X2 + d, i.e. the pulse has gone about twice the length of the body from hitting it initially. The subscript n has been added to the quantities in (7.51) in order to identify a particular pole. The summation over r may depend on n since the number of eigenelements of (7.25) may vary with the pole under consideration. Nominally, all the poles of ,[ may be allowed for in (7.65). However, it has already been pointed out, after (7.52), that the interior magnetic resonances can be ignored when the sources of the incident field are outside S. Therefore, the summation in (7.65) can be limited to those values of n for poles corresponding to the exterior electric modes and so all terms are exponentially damped in time. At first sight (7.65) does not appear to be real, but it must be remembered that if s; is a pole so is s: with associated eigenelement j:,. Thus each term of (7.65) has a matching complex conjugate so that J is real, as it needs to be. The weight to be attached to the expansion (7.65) is considerable. It demonstrates that after a certain interval of time the induced current at any point has a particularly simple time dependence, being a series of decaying exponentials whose coefficients do not vary with time. Moreover, the exponentials, though not the coefficients, are independent of the position of the point on the surface. Representing the current in terms of the poles by an expansion like (7.65) is known as the singularity expansion method or SEM for short. The delay which has to be accepted before (7.65) is valid is quite short in practice. For example, with a sphere of radius m in air the delay is a little more than 3 ns. For an ellipsoid whose largest and smallest semi-axes are a and c respectively, the delay does not exceed 12a ns roughly and may be as little as 6(a + c) ns, depending upon the direction of incidence of the incoming pulse. When the obstacle is convex (7.56) and (7.57) can be exploited. To simplify the presentation without invalidating the principles we will just write

*

with P; and Qm independent of s. By virtue of (7.56)

1

fC+ i co

21tl

c- i oo

-.

[(p, q) exp{sT - s(pe)1/2 x q } ds =

!!

(7.66)

when T/(Jle)1/2 < x q + [x, - xql since the contour can be deformed to the right. This is consistent with what has been established already for the general

426

TRANSIENT PHENOMENA

obstacle. When T/(Jle)1 /2 > x q left on account of (7.57) and -1. 21tl

f

C

i OO

+

-

[x, - xql the contour may be deformed to the

[(p, q) exp{sT - s(Jle)1 /2 xq} ds

c - i co

= L Pm(xp)Qm(x q ) exp[sm{Tm

(J.le)1 /2 xq}] . (7.67)

Therefore J 1(p, t)

= L eSmTPm(xp) [ Qm(xq ) . jo(q)H{T/{}le)1/2 - x q -Ix p - xql}

Js

m

x exp{ -sm(J.le)I/2

x q}

dS q ,

(7.68)

H(x) being the usual Heaviside step function.

For T/(J,le)I/2 > X2 + d, the step function is unity for all xq and (7.68) is equivalent to (7.65). At earlier times the integral in (7.68) is time dependent. Therefore, an attempt to express the SEM response in pure exponentials for all time as in (7.65) will be in error, certainly in the early stages; later on it should be perfectly satisfactory. That error does occur in actual calculations has been affirmed by numerical experiments (Michalski 1982; Baum and Pearson 1981). A source of potential numerical error can be seen from (7.66) and (7.67). From these it can be inferred that the series in (7.67) must be identically zero for xq

-

[x, - xql < T/(J.le)I/2 < x q

+

[x, - xql.

In any numerical scheme only a finite number of the poles of [ can be determined and it is impossible to make a finite series identically zero over the prescribed interval. Nevertheless, if enough of the poles in a neighbourhood of the imaginary axis are found, the predictions of (7.68) can be expected to be reliable for all times. Eventually, they will coincide with those of (7.65). There are two reasons for this. One is that the series in (7.67), although not exactly zero in the relevant range, will be approximately so. The other is that the missing poles have a negative real part of largish magnitude and their influence, if present, would diminish rapidly with increase of T. The overlap between (7.66) and (7.67) means that the step function in (7.68) could be taken as H{T - (J,le)1/2 xq}. While this is true in the exact formula it will be in error in a numerical calculation; yet particular examples indicate that the error soon disappears (Michalski 1982). There is another point. Wavefronts in sharp pulses are dominated by high frequencies in the spectrum. But, at high frequencies, the back of a convex target is in shadow and the current correspondingly small (§8.19). Therefore, the major contribution of the integral in (7.68) will have occurred by the time the pulse reaches the shadow boundary or a bit beyond. Hence, (7.68) can be expected to be in agreement with (7.65) much earlier than (X2 + d)(J.lB)1/2. How much earlier depends on the shape of the body and the direction of illumination. For points in the illuminated region it should not be later than about (Xl + td)(Jle)1/2

TRANSIENT PHENOMENA

427

on average, i.e. the pulse has travelled about half the length of the body after first striking it. For a point in the shadow the agreement should take place not too long after the arrival of the incident pulse there. The formulae (7.63), (7.75) and (7.68) will be valuable practically only if there are ways of calculating the positions of the poles and their associated residues. This problem will be examined in the next section. 7.10 Practical determination of the positions of the poles To locate the poles it is necessary to solve the integral equation

-tj(p)

+ up A

1

j(q)

A

grad, f/J(p, q)

as, = O.

(7.69)

If this can be undertaken analytically then there is little more to be said. However, the analysis is likely to be intractable for all except the very simplest shapes and so recourse to other methods is inevitable. The direct approach to (7.69) is by numerical approximation, replacing (7.69) by a finite algebraic system based on any appropriate method from Chapter 6. A solution will exist only if the determinant of the coefficients vanishes, leading to an equation in s which will be satisfied by the s; or rather by approximations of the s,.. Once the s; are known approximation to the eigenelements of (7.69) can be constructed from the solutions of the algebraic system. The procedure may be clearer if (7.69) is replaced by its scalar equivalent as in §7.6, so consider

j'(x) -

f

K(x, y)j'(y) dy

= O.

The algebraic system corresponding to this is N

L Zm,.(s)j~ = 0

,.=1

(m = 1, ... , N)

(7.70)

where the dependence of the coefficients on s, due to K involving s, has been explicitly displayed. The equations have a non-trivial solution only if det[Zm,.(s)]

= o.

(7.71)

The N roots of this equation are regarded as approximations to the s,.. Obviously, not all of the s,. can be covered but the dominant ones should be there if N is large enough and the accuracy should improve as N increases. In any case, the s; which have a much more negative real part than the others are not generally of much significance. Practical methods for solving (7.71) are those of Newton (§1.8) and of Muller (§1.8). Having found a solution of (7.71) the corresponding j ~ is determined from (7.70) and regarded as an approximation to t'. Alternatively, since we expect Z-1 (Z being the matrix with elements Zm") to have a pole at s = s; we can

428

TRANSIENT PHENOMENA

try to discover the residue directly by evaluating

for small 11 until the result does not alter appreciably with '1. Results of reasonable accuracy have been obtained in this way (Tesche 1973). Notwithstanding, the effort to be expended is not trifling. Not only have the algebraic equations (7.71) and (7.70) to be solved but also the integrals in (7.68). (7.65) or (7.68) have to be evaluted, so the labour is at least of the same order of magnitude as a solution in the frequency domain and probably substantially more. Moreover, the method is not always as efficient in the early time stages as the updating process described in the initial sections of this chapter. Another strategy will, therefore, be set forth now. 7.11 Prony's method and modifications The fact that the surface current after a period of transient behaviour, which lasts at most for the time to transmit twice the greatest diameter of the obstacle, settles down to the exponential evanescence of (7.65) suggests the following tactics. First, calculate the current at a point by working in the time domain with an updating integral equation until sufficient time has elapsed for a reasonable stretch where (7.65) is valid to be available. By sampling in this stretch, arrive at the values of the coefficients for all subsequent times. The input pulse is taken as a Gaussian approximation to the J function as in §7.5. The main problem is to discover the coefficients and exponents in the expansion of the current in exponentials, i.e. when the step function in (7.68) has become 1 permanently. One approach to elucidating them is by Prony's method (Whittaker and Robinson 1952). Suppose that the real current is scalar (the adaptation to vectors is straightforward) and has the representation N

J(t)

= L an exp(snt) .

(7.72)

n=1

The an and s; are to be determined from observations on J at equal time intervals; if the only observations available are at unequal time intervals interpolation will be necessary to supply values on a uniform time scale. For a general current there may be no knowledge of N a priori and so it is desirable to have a method capable of fixing N when it cannot be specified by other considerations. Let the observations be made at t = 0, r, 2r, ... ,(M - l)r where M ~ 2N. Denote J(rr) by 1,.. Let P satisfy N ~ P ~ M - N. Then, it follows from (7.72) that p

L bpJr +

p=l

p_

p = Jr + p

(7.73)

TRANSIENT PHENOMENA

provided that z = expis, r) satisfies zP

+

p

L

bpZP -

p=1

p

=

°

429

(7.74)

for n = 1, ... , N. Clearly, f (7.74) is to furnish all the exponentials in (7.72), it is necessary to make P ~ N. By requiring (7.73) to hold for r = 0, 1, ... , M - P - 1 we obtain the matrix equation (7.75) Ab = c where b = (b 1b 2 ••• bp)T, C = -(J pJp+ 1 ••• J M _ 1 )T and A is an (M - P) x P matrix with entry Aij = J i - 1 +p_ j. The principle of the standard Prony method when N is known is to take P = N and then solve (7.75) for b; this is feasible since A and c are known quantities. Once b has been determined s; can be found from the roots of (7.74); in this connection Muller's method (§1.8) is often valid for eliciting the roots of a polynomial. The s; will be indeterminate to the extent of a multiple of 2ni/r but that is inevitable without a change of sampling interval. With the s; known the an are obtained by satisfying (7.72) at the sampling points. Usually, least squares or a similar approximation will be involved in this determination, as it will in the solution of (7.75). Even in the best circumstances the standard Prony method is suspect. It is very sensitive to small alterations of the data and may exhibit instability if the representation (7.72) is not strictly valid. Some consideration of how matters might be improved is pertinent therefore. Also the question of finding N when it is not given in advance needs to be settled. In general, A is a rectangular matrix and (7.75) is solved by means of the generalized inverse A + (§1.15) or, what comes to the same thing in practice, by least squares (Theorem 1.15). Thus

b = A+c where, according to Theorem 1.15c when A is of rank k, A+

A0 00) VB

=U(

-1

in which U, V are unitary matrices and A is a diagonal matrix of order k whose elements A. 1 , ••• , Ak are the positive singular values of A. In (7.75) the rank of A cannot exceed N, however large P and M, because any N + 1 consecutive observations of J are linearly dependent. There are now two cases which·· can occur: (a) A has at least one singular value which is zero, and (b) no singular value of A is zero. In case (a) N can be identified with the rank of A or, equivalently, the number of non-zero eigenvalues of AHA. In case (b) it is clear that P is not large enough and so P is increased until A moves into case (a); this entails also increasing M (Kumaresan and Tufts 1982 recommend

430

TRANSIENT PHENOMENA

choosing M = 3P). Having determined N one can contemplate returning to the original Prony method but the sensitivity referred to has not been obviated. In any event the determination of N is not quite as simple as has been stated. Unless the observations and numerical procedures are extremely accurate it is highly unlikely that any singular value of A will be exactly zero. Nevertheless, those which should be exactly zero can be expected to be small. So arrange the calculated singular values in decreasing order so that A. 1 ~ A. 2 ~ .•• ~ A. p and form (A. m - A.m + l)/(A. m + A.m + 1) for m = 1, ... , P - 1. The value k of m for which this ratio has its largest value (which would be 1 if A. k+ 1 = 0 exactly) is taken as the rank of A. Calling into play singular values to find N when applying the Prony method is often known as the singular value decomposition-or SVD-Prony method. In practice there are two difficulties associated with the above method of fixing N. The first is that few observations of genuine signals will be free of noise and measurement errors. The second is that in a genuine scattering problem N can be theoretically infinite although many of the exponentials will be heavily damped and contribute little to the long-term behaviour of the current. Since one is interested primarily in a few dominant terms these extra exponentials are effectively acting as noise. The presence of noise renders Prony's method with P = N untrustworthy and, furthermore, the rank of A cannot be estimated reliably unless P is very much larger than N (regarded as the number of dominant exponentials of interest), perhaps P = 30N or greater. Therefore, it is common to use large values of P and M, large enough to ensure that the rank of A has settled down; often the values are doubled as a cross-check that the rank of A has become stable. All subsequent calculations are down with these large values of P and M. Clearly, this will entail subtantial computation (Ross and Dudley 1988) as does any robust method for estimating the exponential terms in J. Trouble arises from another source when P exceeds N. For then (7.74) has more than N roots. Some of these are spurious, being generated by the method, and not part of J. Deciding which roots are true and which spurious is not easy. Assuming that P is large enough for the rank of A to be certain we know how many of the roots are true; the rest can be said to be caused by the noise in the system. It is evident that any roots of (7.74) with lz] ~ 1 are spurious and must be rejected because they correspond to srI which do not have a negative real part. Next, increasing P by 1 or 2 should not affect the true roots much but is likely to modify the spurious ones. Therefore, the roots with Izl < 1 which move least as P varies are the potential true candidates. If there are more than k (the rank of A) of them the only course open is to examine their contribution to J. Those with an a" less than one per cent of the largest a" occurring can be considered for rejection, selecting first those SrI with the most negative real part. If all of these criteria fail to eliminate all spurious roots the suspicion must be that P is not large enough to ascertain the rank of A correctly. To cut down the computational effort in the SVD-Prony method Younan

431

TRANSIENT PHENOMENA

and Taylor (1991) suggest that the data should be pre-processed to reduce the noise content by passing them through a low-pass filter first. The basic idea is to form Jt(q)

=

M-l

L

m=O

i; exp( -i2nq/M)

and then construct J

2(n) = [t{Jl(O) + J1(M)} + qtl {Jl(q)exp(21tiqn/M) + J1(M - q) exp( -21tiqn/M)}JIM.

To set a value on Q let en = I n - J2(n) and form the circular serial correlation L~:: 1 enen+Q' The value of Q is steadily increased from 1; that value at which the serial correlation achieves its minimum is the chosen Q. Now that Q is known, data for the SVD-Prony are generated from J2 (n). According to Younan and Taylor this process enables one to work with a much smaller value of M than is required without filtering. Moreover, the filtering does not cause any deterioriation in the accuracy of the estimates for s; and an' Other robust methods for extracting Sn and an from current waveforms have been proposed by Goodman (1983), Rothwell (1987), Park and Cordaro (1988), Hua and Sarkar (1989). Exercises 16. A scalar current I(t) is observed to have the following values at the times shown: t l(t)

10 20 6460 6090

30 40 50 5642 5049 4417

60 70 80 3623 2401 983

90 142

If I(t) is approximated by Al exp(sl t)

+ A 2 exp(s2t) +

A 3 (S3 t)

show that exp(s 1)' exp(s2), and exp(s3) are the roots of Z3 -

3.029z2

+ 3.788z - 1.687 = 0

in Prony's method. 17. A scalar current I(t) provides the data t I(t)

0 248

8 16 24 32 40 345 421 481 529 569

If Prony's method with three terms is used show that an approximation to I(t) is 629.4(1.001)' - 381.4(0.9670)' + 0.0005(1.236)'. 18. It is known that J(t) = 2e- 3' sin 2t + e -t cos 2t. Construct data and assess the accuracy of Prony's method with P = N = M = 4.

432

TRANSIENT PHENOMENA

2·4 2·0 1·6 1·2 0·8 0·4

E

0·0

Q)

~ ~

8 -0,4 -0,8 -1,2 -1,6 -2,0 -2·4

-2,8 -3,2

0

2

4

6

8

10 12 Time (ns)

14

16

18

20

Fig. 7.6. Data to be sampled. 19. A scalar current is found to have the waveform displayed in Fig. 7.6. Take samples at intervals of ! ns and use Prony's method to locate 40 poles. Show that most of the poles are not very distant from the imaginary axis. Calculate the waveform of the exponential series over 0-100 ns and decide whether it is sufficiently accurate in the first 20 ns for one to have faith in the extrapolation. 20. Assume J(t) = est. Find (7.74) when P = 2 and M = 2. Is it clear which root is spurious? What is the effect of increasing P and M to 3? 21. Add some noise to the current in Exercise 18 and compare the predictions of the SVD-Prony method with and without filtering. 22. Calculate the impulse response of a perfectly conducting sphere by the methods of the last two sections and compare the computer requirements of them critically. 23. A centre-fed dipole of length 1 m and radius t m is excited by the Gaussian pulse exp{ - 25 x 1018 (t - 5.556 X 10- 10 ) 2}. Determine the induced current not far from the feed as in §7.3 during the first 30 ns. Extrapolate from there by Prony's method.

TRANSIENT PHENOMENA

r:

433

Find the input impedance in the frequency domain from j = = 1 an/(iw - sn) and show that it gives acceptable values up to about 2 GHz. 24. A prolate spheroid with semi-axes a, b is illuminated along its axis of symmetry. Trace the movement of poles as alb varies. 25. A finite circular cylinder of length 1 and radius a is subject to a plane impulse from outside. The positions of the poles depend upon I, a and the angle of incidence. Plot their behaviour for various typical situations. By taking 1 very small obtain the response of a circular disc. 26. Assuming that the theory applies to two coaxial circular cylinders obtain the impulse response of a single vertical cylinder above a horizontal perfectly conducting ground plane.

8 GEOMETRIC THEORY OF DIFFRACTION The advent of the computer has meant that the small number of analytical solutions to electromagnetic scattering problems can be supplemented by numerical investigation. Antecedent chapters have indicated ways in which answers can be fabricated numerically and reveal that, despite the power of the computer, the accuracy falls off as the frequency increases. Another route must therefore be pioneered to handle the scattering from bodies which are large compared with the wavelength. The method will be one of approximate analytical technique, often conjoined- to numerical methods as well. The classical high-frequency approach is that of geometrical optics, but it fails to account adequately for the behaviour in shadows and the influence of edges. To cope with these problems an extended version, known as the geometric theory of diffraction (GTD), has been devised (Keller 1957, 1962). It is the purpose of this chapter to describe the main features of the geometric theory of diffraction. 8.1 The high-frequency approximation

It will be assumed that the fields vary harmonically in time according to the factor exp(iwt) where ca is real. Then Maxwell's equations take the form curl E

+ iWJlH = 0,

curl H - iwE

=0

(8.1)

where the real Jl and B account for the permeability and permittivity of the medium. The refractive index N of the medium is defined by N = (P£/Jl oBo) 1/2 where JJo and Bo refer to free space. The real quantity W{JJoBo)1/2 will be denoted by ko. If the medium is not isotropic u and B must be replaced in (8.1) by tensors. The resulting analysis then becomes much more complicated in detail without affecting the basic principles of the method. For simplicity, therefore, only the case of scalar Jl and B will be treated. When the wavelength is so small that significant medium changes occur only over distances large compared with it, a reasonable assumption is that in local regions the field behaves as if it were in a homogeneous medium. Thus, locally, the wave may be expected to look like a plane wave. This suggests that E might have the form Eo exp( -ikoL). However it is more likely that this offers a first approximation. Therefore, we introduce the more recondite assumption or

435

GEOMETRIC THEORY OF DIFFRACTION

Ansatz that . ( Eo + Ek 1 + Ek~2 E = exp( -lkoL) o

+

" = exp( -ikoL)(Ho + HI + H:

ko

ko

)

, (8.2)

+

)

where L, Eo, E 1, ... , "0' "1'. · · may depend upon position but are independent of ko. There is no a priori knowledge that the expansion (8.2) is valid whether as a convergent or as an asymptotic series. Even in special circumstances it can be extremely difficult to prove the legitimacy of the expansion. Be that as it may, experience is overwhelmingly in favour of its introduction. It tenders the possibility of progress when exact or numerical solutions are out of the question. Even when the expansion is confuted that may be others of a similar nature which will succeed as will be seen. If (8.2) is substituted in (8.1) and it is agreed to take derivatives term by term then " curl E, i..J - ,=0 leO

-

.

lko grad L

"E, i..J ,=o/co

A

"~ curI -n, - 1·k0 gra d L leO

,=0

. + lWJL

,,", i..J ,=0 k~

= 0,

(8.3)

o.

(8.4)

,,", · "E, i..J - - lwe i..J - = ,=0 k~

A

,=0 k~

If L., E" H" and their derivatives are finite and vary significantly only over several wavelengths, the coefficients of separate powers of ko may be equated to zero in (8.3) and (8.4). There results

= JLH o, } "0 = -eE o,

(8.5)

JlH ml / 2 = -i curl Em-I,

(8.6)

(JL oe o ) I /2 grad L

A

(/Joeo) 1/2 grad L

A

grad L /\ Em -

Eo

(poeo)

(8.7)

for m = 1,2, .... From (8.5) it is evident that, so long as JL and e are non-zero, Eo.grad L = 0, Furthermore, if

"o.grad L = 0,

"0 is eliminated from (8.5),

Ho.E o = 0.

(8.8)

(grad? L - N 2)Eo = 0 when account is taken of (8.8). If Eo is not to be identically zero it is necessary that (8.9)


436

Eqn (8.9) is a partial differential equation for L, the function which defines the surfaces of constant phase, i.e. the wavefronts. The function L is known as the eikonal and (8.9) is called the eikonal equation. From (8.8) it can be seen that Eo and are transverse to grad L, i.e, are transverse to the direction of propagation of the wavefront. Since Eo is also perpendicular to H o the Poynting vector is parallel to grad L and so the direction of energy flow is normal to the wavefront. Consequently, the field due to Eo, H o has locally all the properties of a plane wave and may be expected to be a reasonable first approximation at high frequencies. It will, however, require close scrutiny at points where N becomes is predicted to small or changes discontinuously and at points where Eo or be large. The curves which have at each point the direction of the energy flow in the field are known as rays. It has just been shown that the rays are normal to the wavefronts. In general media there are two characteristic velocities. One is the wave velocity which is the rate of displacement of the wavefront in the direction normal to itself. The other is the energy or ray velocity. For the isotropic medium just considered the velocities can be identified with one another. For anisotropic media the wave and ray velocities may not coincide; the distinction between the two is then vital and must not be forgotten. From (8.6)

"0

"0

(1

curl - grad L Jl

1\

Em

)-

curl a, <Jlo8 o)

1/2

=

. curl (1- curl E m- l )

-I

Jl

while, from (8.7), div(eE m)

=-

(poeo)I/2 div(grad L

A

Hm )

= (J,loe o)1/2 grad L. curl H;

= ie grad L . Em + I when (8.7) with m replaced by m Il curl (; curl Em-1)

-

grad

+

G

1 is noted. Hence

div eEm-1)

.( d ) = 1. grad Ldi L div Em - 1. u curl H, 1/2 - 1 gra L (poeo)

1\

curI E m

- illEmdiv (; grad L) - 2i(grad L · grad)EmMultiply (8.6) by grad L

1

.

1\

(Em' grad Il) grad L. (8.10)

and add J,l/(}Joeo)I/2 times .(8.7) to obtain

- (div eEm-l) grad L = e

(D

u curl H; - 1 + gra d IE L A cur m-l (Jloeo)1/2

(8.11)

437


by virtue of (8.9). Replace m - 1 by m in (8.11) and then substitute in (8.10). There results 2i(grad

L. grad)E

m

+ iJlE

m

div

{~ grad L} + .'. {Em. grad(/u:)} grad L JJe

Jl

t

= grad G diVSE m-i) - J.l curl curl Em-i'

(8.12)

Similarly, or by observing that the equations for H; are the same as those for Em with u and e replaced by -e and - Jl respectively, it may be shown that 2i(grad

L. grad.H; + ieH

m

div

= grad

(! L) + ~ (t e

grad

JJe

{Hm. grad(J.le)} grad

L

div J.lH m-i) - s curl GCUrl H m-i). (8.13)

The only derivatives of Em and H, which occur in (8.12) and (8.13) are normal to the wavefront. Therefore (8.12) and (8.13) are ordinary differential equations for Em or H, along a ray_ They are known as transport equations. When E m - 1 = 0 and "m-l = 0 the right-hand sides of (8.12) and (8.13) vanish. The scalar product of (8.12) with Em then gives

grad L · grad E;, + J.lE;' div since now Em. grad L

= 0 from

(t

grad L) = 0

(8.7). The equation may be written as

(t

E;, grad L ) = O.

(8.14)

div G H;, grad L ) = O.

(8.15)

div Similarly

In particular, it is always true that

div

(~E~ grad L) =

0,

div

HH~

grad

L}

= O.

Exercises

(8.16)

1. In a homogeneous anisotropic medium D = ~. E, B = J1. H where ~'J!:. are real symmetric positive definite tensors. Assuming an approximation of the form E = Eo exp( -ikoL), etc. show that Do is perpendicular to and grad L while Do is perpendicular to Eo and grad L. If S = Eo 1\ define the energy velocity as 2S/(Eo. Do + Bo). Show that the phase speed v of the wavefront is equal to the projection of the energy velocity on the wave normal.

"0

"0

"0.

438


If Jl is a scalar and ~ has elements that v satisfies Fresnel's equation

8 h 82' 83

with respect to its principal axes show

ni n~ n~ --+------+--=0 2 2 2 v

vi

-

v

-

v~

v

v~

-

where v; = 1/Jl8i and n h n2' n3 are the components of grad L along the principal axes. If D~, D~ correspond to two different solutions of Fresnel's equation show that D~.D~

= o.

2. In Exercise 1 assume that S is parallel to neither div(E o A "0) = o.

~. grad

L nor J!:.. grad L. Prove that

8.2 Geometrical optics In an isotropic medium the approximation E = Eo exp( -ikoL) is often known as qeometrical optics, through the name is sometimes applied to the full expansion (8.2). Since the rays are normal to the wavefront the tangent to a ray must be parallel to grad L. Hence a unit vector s along the tangent is s

1

=-

N

(8.17)

grad L

when cognisance is taken of the eikonal equation (8.9). Let K be the curvature at the same point and v a unit vector in the direction of the radius of curvature. Then KV = ds/ds where s is the arc length measured along the ray. Now ds

- = (s.grad)s = -s 1\ ds

and so K

=-

V • S 1\

curl s

curl s.

On using (8.17) and replacing grad(l/N) by -(liN) grad In N we obtain K

= v , grad In N

(8.18)

since v is perpendicular to grad L. Equation (8.18) carries the information that the rate of change of N in the direction of the centre of curvature is positive. This means that a ray bends towards the region ofhigherrefractive index N or, equivalently, towards a region with a lower speed of light. In a homogeneous medium N is independent of position and the right-hand side of (8.18) vanishes. Thus the curvature K is zero and the rays are straight lines. The behaviour of the magnitude of Eo can be found by considering what happens to a tube of rays. Let the tube of rays intersect the wavefronts L 1 and L 2 in the surface elements dS 1 and dS2 respectively (Fig. 8.1). No power will flow across the sides of the tube because the energy moves along a ray; the


439

Fig. 8.1. Propagation along a ray tube.

flow through any normal cross..s ection of the tube must therefore be constant. Hence

or, from (8.5),

(;:)1/2/EO/i as, = e:)1/2/Eo/~ dS2.

(8.19)

This is known as the intensity law of geometrical optics. It may also be obtained by integrating (8.16) over a ray tube and remembering the eikonal equation. Similarly, (Jl/e)I/2Ho satisfies the intensity law of geometrical optics. The intensity law permits the determination of the field amplitude at any

point on the ray when it is known as a single point. The form (8.19) is often valuable though other representations are sometimes helpful (§8.3). If J.l is independent of position (8.19) can be expressed in terms of the refractive index N as

(8.20)

When the medium is homogeneous an illuminating formula can be deduced from (8.19). Let the ray through the point A of the wavefront L 1 be the z axis. Take A as origin and choose the (x, z) and (y, z) planes to be the planes containing the principal radii of curvature PI and P2 of the wavefront at A (Fig. 8.2). A ray through B, a point on the x axis adjacent to A, will intersect the z axis at 0 1 and 0lB = Pl' Similarly, a ray through the adjacent point C on the y axis will intersect the z axis at O 2 where 02C = P2' Take a radius of curvature as positive when the centre of curvature is on the negative z axis as in Fig. 8.2; otherwise it is negative. Continue the rays through A, B, and C until they meet the wavefront L 2 at A', B', and C' respectively. Often the ray AA' which passes through the middle of the small element of L 1 is known as an axial ray, whereas


440

L, Fig. 8.2. Propagation of energy in a homogeneous isotropic medium.

rays such as BB' through the periphery of the element are called paraxial rays. The paraxial rays form the boundary of a tube. The rays are normal to L 2 and so the normals at A' and B' intersect at 0h i.e. the (x', z') plane contains a principal radius of curvature. This radius of curvature can be no other than PI + s where s is the constant length of ray cut off by the wavefront L 1 and L 2 . Similarly, the (y', z') plane contains the other principal radius of curvature P2 + s. Let dS I be the element of area surrounding A and let the paraxial rays produce the element of area dS2 about A'. Corresponding points in the two areas are related by

, Ipt+sl

x = - - x, PI

y

s ' = !P2+ - - y. P2

l

Hence dS 2 Since N I

= N2 ,

= I(Pt + S)(P2 + S)I dS I • PtP2

(8.21)

(8.20) gives 1/2

E PtP2 E I ob - (PI + S)(P2 + S) I 011 I 1

(8.22)

in a homogeneous medium. When PI = P2 the wavefront is spherical and we have the customary inverse square law of decrease of intensity with distance. It will be observed that (8.22) will give trouble when PI or P2 is negative, i.e. geometrical optics becomes suspect when a principal centre of curvature of a wavefront is in the region into which energy is propagating. Then focusing effects can be anticipated (astigmatism will occur If PI =F P2) and some modification of the theory will be necessary. More generally, we can deduce from (8.21) the relation between dS t and dS2 when the medium is inhomogeneous, for, when the two surfaces are close

441


together, we may expect (8.21) to continue to hold. In other words, on a small displacement the change in an element of area dS I is {1/PI (s) + 1/P2(S)} dS I ds where PI and P2 are the principal radii of curvature at s. Hence, in general, dS 2

= dS I exp f

S 2

Sl

{II} + --PI (s)

(8.23)

ds

P2(S)

which expresses the expansion of the element of area as an integral of the mean curvature of the wavefront. Equation (8.23) can be inserted in (8.19) to provide the change in intensity from s I to S2' as 2

IEol~

p s ) 1/ IEoli exp [ =(~

PIe2

-

f

S2

SI

{II} + -- ] -PI(s)

P2(S)

ds .

(8.24)

8.3 The ray and transport equations A standard method of generating solutions of a partial differential equation such as the eikonal equation (8.9) is to introduce the curves which satisfy the ordinary differential equations

dx

oL

drr

ax

-=--,

dy _ da -

oL oy'

dz iJL da -

oz

(8.25)

The parameter a varies along a curve of solution and is connected to the arc length s by

(8.26) on account of the eikonal equation. It is obvious from (8.25) that the curves are normal to the wavefronts and therefore we can identify them with rays. Yet the most convenient form is not furnished by (8.25) because the unknown L is involved. To eliminate L, observe that 2x

d d0"2

a2 L dx a2 L dy a 2 L dz 1 0 = ~ ox = ox 2 dO" + ox oy dO" + ox OZ dO" = 2 ox (grad? L) d (OL)

from (8.25). We infer from the eikonal equation (8.9) that

d2 x aN -= N2 da ax'

d2 y aN d2z aN - -2 = N - - -2= N da oy' da oz

(8.27)

These are differential equations for the rays which can be tracked directly and, in particular, by numerical methods. They can be solved when suitable initial data are supplied for some values of (J (see also §8.24). The differential equations for the rays can be expressed in terms of the arc

442


length via (8.26). They become d ds

(N dX) _oN ds

-

d ds

ax'

(N dY) _oN ds -

d ds

oY'

(N dZ) _ aN ds - a;

(8.28)

subject to (8.29)

Once the rays have been found the eikonal L can be determined by remarking that its change between two points on a ray is N ds. The magnitude of Eo can be found from the intensity law and so the geometrical optics field is known completely as soon as the direction of Eo is discovered. To unravel the change in polarization along a ray a return is made to the transport equation (8.12) which now reduces to

J

2(grad L. grad)E o + JlE o div

(~ grad L) + ~ {Eo. grad(jle)} grad L = O. Jl

JlB

(8.30)

Introduce the vector P = Eo/IEol; then dP = _1_ dE o __ Eo Eo. dE o . de IEol de IEol 3 de But, from (8.30), (8.25), and (8.8) o E odE .-

dO"

=

(1

)

1 2. -iJlIEol div - grad L .

Jl

Hence (8.30) gives dP

1

dO"

N

- +-

P . grad N grad L

= o.

(8.31)

It may be verified readily that Ho/I"ol satisfies (8.31) also. One consequence of (8.31) is that P . dPIda = 0 so that IPI 2 is independent of (J, which is consistent with P being a unit vector. Let b be a unit vector in the direction of the binormal to the ray. Then b = S 1\ v and the Serret-Frenet formulae are KV

where

t

ds ds'

=-

dv = - "s ds

-

+ tb

'

db ds

-

=-

is the torsion of the ray. Then, from (8.28), grad N

d(Ns)

dN

ds

ds

= - - = N"v + s - .

tV

(8.32)


Since P.s

= 0 it follows from

443

(8.26) that (8.31) can be written as dP

-- + "P . vs = o. ds

Because P is perpendicular to s, it is a linear combination of v and b, say P = av + pb. Further a 2 + p2 = 1 because P is a unit vector. Hence dv ads

da ds

dP

db ds

+ v - + P- + b - + "as = ds

0

or

on account of (8.32). Since b and v are perpendicular, their coefficients must vanish. So, if a = cos fJ and P = sin fJ,

~ exp(iO) + it exp(iO) = 0 ds

whence

exp(iO)

= exp(i01) exp( -

itt

dS)

where fJ 1 is the value of fJ at s 1. Hence

Eo = IEol {V COs(O 1 From (8.5)

n, = IHol{bCOs(Ot -

t t

tdS)

+ bsin (Ot -

tdS) -

VSin(Ot -

t t

tdS)}.

(8.33)

tdS)}.

(8.34)

"0

Equations (8.33) and (8.34) show how the polarization varies as one moves start in the plane of v and b with Eo making along a ray. The vectors Eo, an angle fJ 1 with v and then rotate through the angle - J T ds with respect to v as Eo, H o, v, and b travel down the ray (Fig. 8.3). If the ray is a plane curve the torsion T disappears. The vector Eo then stays in the plane of v and b keeping at a constant angle to v. In particular, if the medium is homogeneous the vector Eo always lies in a fixed plane containing the ray. Once Eo is known we may contemplate the possibility of determining higher-order approximations from the transport equation (8.12). Write (8.12) as 2(grad

L. grad)Em+ JlEmdiv (!J1. grad L) + J1.e~ {Em

0

grad(JlG)} grad

L= em

444


v

Ho

Fig. 8.3. Rotation of polarization along a ray.

where

em = iJ.L curl (~CUrl E

m - 1)

i grad

-

G

div eEm -

1)

is supposed to be known. Let P; = Em/IEOI. Then the following equation is obtained by the same means as (8.31) was derived: dP

(Pm. grad N)s

ds

N

m -+

em

(8.35)

=---.

2NIE oi

A scalar product with s yields d(Ns.Pm ) ds

= ~ CmeS 2 IEol

Hence (m ~ 1)

(8.36)

which supplies the change in the component of P; normal to the wavefront in going along a ray from S1 to S2 since C; and Eo are supposed to be known. If (8.36) is supplemented by an equation for the components of P; tangential to the wavefront then Pm has been found. Take a scalar product of (8.35) with P. Then P • ~pm ds

=

P Cm • 2NIE oi e


445

On the other hand, a scalar product of (8.31) with P; leads to

By addition d(P .Pm) ds

= P .em

_

"P. vPm.s

2NIE oi

and so

(8.37) The integrand is known because P has already been found while Pm. s is given by (8.36). Accordingly, the tangential component of P; parallel to Eo has been determined. In (8.37), P can be replaced by Ho/IHol without altering its validity and so all components of P; are now available. A similar investigation of (8.13) provides the components of H m • It should be remarked that the determination of Cm may not be trivial in practice. The procedure described supplies the field at points of rays, i.e. in ray coordinates. The vector operators in C; have therefore to be expressed in terms of these coordinates or the ray coordinates have to be inverted to Cartesian form. In either case the formulae may be complicated although there is no difficulty of principle. There is another way of writing the differential equations for a ray which can be revealing. In vector notation (8.25) is abbreviated to dx - = grad L. dO'

(8.38)

Also

(8.39) d(grad L) dN ---=Ngra dO'

(8.40)

as in the derivation of (8.27). Equations (8.38)-(8.40) constitute an autonomous system of first-order differential equations for x, grad L, and L. For some purposes the fact that they are first order may make them preferable to the second-order system of (8.27). Let k o = k o grad L. Then (8.38) and (8.40) can be expressed as

dx

ko

de - ko '

dk.,

= koN grad N. de


446

Introduce the speed c = 1(,uc;)1/2 and put ds/dr it is of the dimensions of time. Then dx _ ck o dt - koN'

= c, the symbol t indicating that

dk o dt

- = -koNgradc.

-

(8.41)

Before leaving the subject we derive an alternative, but equivalent, form for (8.24). Suppose that x, L, and grad L are given on some surface S. On the surface they will be functions of two parameters, say (12 and (13. Off S values are determined by the integration of (8.38)-(8.40). They will be functions of (J, (J 2' and (J 3 with (J 2 and (J 3 constant on any particular ray. Thus x can be regarded as a function of (J, (12' and (J 3. To assist temporarily with the notation replace (J by (J 1 and let a be the vector with components (J l' (J 2' and (J 3. Form the Jacobian

J

= o(x)

=

o(a)

Then

O(X h X2' X3). O«(Jh (J2' (J3)

where cof au stands for the co-factor of

oj O(J 1

Now cof(oxi/iJ(Jj)

=J

~=J O(J1

au in J.

t t ~ (OL)

j= 1 i= 1 O(Jj

OXi

Hence

cof

oxi

.

O(1j

iJ(Jj/iJXi and so

t t ~ (OL) O(1j = J t 0

j= 1 i= 1 O(Jj

OXi

OXi

2 :

= JV 2L.

(8.42)

i= 1 OXi

Reverting to our original notation and substituting for V2 L in (8.16), we obtain

-d

de

(J-Eo2) = 0 IJ

(8.43)

i.e. JE5/IJ is independent of (J on a ray. Changes of J account for the variation of the cross-section of a tube of rays during motion along the tube. Instead of calculating the integral of mean curvature as in (8.24) we have now to evaluate the Jacobian determinant of the relations between ray and Cartesian coordinates (see also §8.24 and R. M. Jones 1968). 8.4 The stratified medium A medium in which IJ and s are constant on anyone of a family of parallel planes is said to be stratified. The speed of propagation and refractive index will then vary only in a direction perpendicular to these planes.


447

Let the family of planes be parallel to z = O. There is no loss of generality in taking the z axis vertically upwards. Then N is a function of z only. Only rays in the (x, z) plane need to be attended to because those in other planes through the z axis can be obtained by rotation about the z axis. The ray equations (8.28) reduce to

(dX) = 0, dsd(N dZ) aN ds = a;'

d ds N ds Hence

dx

C

ds

N

--where C is a constant. From (8.29)

dz = +(1 _ ds

-

C

(8.44)

2)1/2.

N2

(8.45)

Assume that the ray starts from the origin and that N(O) is positive. Measure

s positively along the ray from the origin. Then C will be positive on rays which

are propagating to the right. The upper sign in (8.45) will assign a ray travelling upwards, whereas the lower sign furnishes a ray going downwards. To fix ideas, let us concentrate on the upward ray. Then, after division of (8.45) by (8.44),

dz = dx

(NC2 1)1 / 2

_

2•

(8.46)

Consequently, the choice of C decides the slope of the ray as it leaves the origin. Clearly, the ray will be real only if N 2 > C2; it will be assumed that this condition holds. Integration of (8.46) discloses the equation of a ray through the origin as

i

z

X=

o

{N2(W) --2--1

C

}-1/2 dw.

(8.47)

Explicit evaluation of the integral is rarely possible and numerical integration will have to be resorted to. Fortunately, some properties can be inferred without knowledge of the explicit value of the integral. If the tangent to the ray makes an angle 0 with the z axis, dx/ds = sin O. Hence (8.44) implies that N(z) sin 8

= C = N(O) sin 80

(8.48)

where 80 is the value of (} at the origin. In a region where N increases with z, and the ray bends towards the vertical. In contrast, if N decreases with z, i.e. the phase speed increases with height, the ray bends away from the vertical. There then arises the possibility that the ray is bent so far around that it starts to come down. In essence, the ray is being reflected and returned to

o decreases

448


Fig. 8.4. Rays when N has a minimum at z = m.

°

its original level. Such reflection can take place only if at some height h the ray is horizontal, i.e. = tn. Thus the phenomenon of reflection requires the existence of N(h) such that N(h) = N(O) sin 80 , (8.49) No real angle of launching is allowed unless N(h) < N(O), i.e. the refractive index at altitude h is less than that at launching. Suppose that, in fact, the refractive index decreases to a minimum at z = m and then increases. Let Om be the angle such sin

° = N(O) m

N(m) .

If 00 < Om' the condition (8.49) can never be met for any value of h. A ray launched at such an angle will always travel vertically upwards and steadily depart from the level z = 0 (Fig. 8.4). If 80 > 8m the ray will be reflected and turn back at the height h given by (8.49). The horizontal distance X h to the point of turning is, from (8.47) and (8.48), supplied by Xh

= N(O) sin ()o

f

{N 2 (w) - N 2(O) sin! ()o} -1/2 dw.

In the critical case 80 = 8m we have h = m. Now, for w near m, N 2 (w) - N 2 (m) = O{(w - m)2} because of the minimum of N(w) at w = m. The integrand in Xh thus becomes non-integrable and x; --. 00 as 80 --. 8m • Therefore, the limiting ray 00 = Om approaches the altitude z = m asymptotically. As 00 increases from zero there is no reflection until 00 = Om at which value Xh is infinite. Further increase of 80 must force X h to decrease. Yet eventually x, must go back to infinity because that is what happens when 00 = tn. Hence X h has a minimum X m for some value of 80 . Thus, for X m < Xh < 00, there are

449


at least two rays with different angles of launching which turn down at this value of Xh· The equation for the reflected ray is not delivered by (8.47) when x > Xh since

in this range the lower sign must be adopted in (8.45) to make the ray migrate downwards. The appropriate formula is

x

= Xh -

1% {N 2(w) -

N(h)

N 2(h)}-1/2 dw.

(8.50)

when x ~ Xh. The eikonal or phase variation on a ray is estimated by

L=

f

Nds,

the integration being along a ray. For an upgoing ray

L

=f

% o

N2(W)

{N

2 (w)

- C

2

} 1/ 2

dw = Cx

+

f% 0

{N 2 (w) - C 2 } 1/ 2 dw

(8.51)

when L = 0 at the origin. For 00 > Om' (8.51) is still correct while the ray is journeying upwards but, after reflection, must be replaced by

L = N(h)x

+

{h

{N 2(w) _ N2(h)} 1/2 dw

-1% {N 2(w) -

N 2(h)}1/2 dw

(8.52)

where now x is given by (8.50). The rays under consideration are plane curves and so, according to the theory of the preceding section, the polarization of the geometrical optics field is invariant. It therefore remains to evaluate the amplitude of the field. Formulae (8.19), (8.24), and (8.43) are available. For the sake of illustration we shall employ (8.43) but we must take care to acknowledge its three-dimensional character. Let r, l/J, z be cylindrical polar coordinates. The cylindrical symmetry of the propagation guarantees that the equation of a ray in

Z2'

dW*

. 1\vw*' = fq-l/4T~q-l/4T* __1 ik oq l / 2 --+

de

TT*k o

as z --+ 00. Similarly, from z < ZI' we find Jww*' --+ (1 - RR*)k o as z --+ - 00. Since these must be the same we have conservation of energy expressed by

RR*

+ TT* = 1

(8.91)

It must be emphasized that (8.91) is relevant to the exact solution. If we assumed the approximation w = q-l/4(e-~

+ R e~)

= q-1/4te-~ and substituted in J ww*' = constant we would obtain

RR* + ft* = 1.

(8.92)

However, if T is small is may be swamped by the errors in R so that all that could properly be deduced from (8.92) is that IRI is approximately 1. To obtain a better result we must determine the exact R or T. The difficulty of relating Rand T originates from their being connected by what supervenes between z 1 and z 2. In this interval a solution which is negligible at Zl may be dominant at Z2 so that straightforward matching without appreciable error is hard to achieve. The way around the snag is to split the real axis into two parts each of which contains only a solitary zero and use solutions which can pass through one zero (Jones 1964; Olver 1974). Let Zo be a fixed point between ZI and Z2. Then two uniformly valid asymptotic solutions of (8.88) on [zo, 00] are

(D

k~/3,),

WI

= n 1 / 2kA/6

W2

= nl/2k~/6 (~Y/4Ai( _k~/3,)

1/4 Bi(-

478


where Ai and Bi are the standard Airy functions and

(=

±I~

L:

q1/2(t) dtl2/3

the upper or lower sign being taken as z > Z2 or Z < Z2' Let z --+ 00. so that The Airy functions may be replaced by their asymptotic expansions for large negative argument with the result , --+ 00

w1 """ _q-1/4 sin{ko W2 """ q-1/4 cos

L:

q1/2(t)dt-

{k o L: q1/2(t) dt - in},

Comparison with the behaviour of

JJi

as

Z --+ 00

( 1 .)( q - 1/4 T.l1" ""I = exp -Vt1 W2

In ( -

00,

in },

reveals that

+ .lW) I .

Zo] we employ

W3 =

1t1/2k5/6(~Y/4Bi(-k~/3(1)'

W4 =

1t1/2k5/6(~ Y/4 Ai( -k~/3(1)

where

(1

=

+I~L: q1/2(t)df/3

the upper or lower sign being adopted according as z > ZI or Z < ZI. Allowing 00, especially through a sequence of values in which the cosine in the Airy function asymptotic expansion takes alternately the values zero and unity, permits the identification Z --+ -

= e- ni/4 (w3 + iw4 ) , q-I/4nJ = e- ni/ 4 (w4 + iw3 ) . q-l/4~

It follows that W3 + iW4 + R(W4 + iW3) and T(W2 + iw.) represent the same solution to -(8.88). They and their derivatives must therefore be continuous at z = zoo Hence

R=

1Y(W4, W2) -

+ 1'f'"(W3, WI)} , 1r(W3' WI) + i{1r(w 3 , w2) + 1r(w4, WI)}

1Y(W4' w2 )

-21Y(W3' W4) 1Y(W3' WI) + i{1'Y(W3' w 2)

1I1"(w4 , Wt) - If'"(W3' w2 )

T= -

-

i{1r(w4, w2 )

+ 1r(w4 , WI)}

479

CANONICAL PROBLEMS

where

If'"(w i, Wj)

= wiwj -

W;Wj

is the Wronskian evaluated at z = Zo0 Since we know the Wronskian is a constant independent of z we could calculate it at any convenient point. However, the advantage of z = Zo is that the argument of the Airy functions is large and positive so that asymptotic expansions are available. In fact WI

1"'-1

W3

1"'-1

W~

1"'-1

w;

1"'-1

Y- 1 eko! ,

W2

1"'-1

Y- 1 e k Og,

W4

1"'-1

-koy eko! ,

W2

1"'-1

koY ekOg ,

w~

1"'-1

tv-

1

e - k o! ,

ty - 1 e- k Og, tkoY e- ko! , -tkoye- kOg

where

Y = Iq(zo)II/4,

f

=

f%2Iq(t)11/2 dt,

g = f%O Iq(t)1 1/2 dt.

%0

Therefore lII(w 4 , lY(W 4 , W2)

1"'-1

WI)

%1

= 0, 1r(W3' w 2 ) = 0 and

tko exp( -kol),

1r(w 3, WI)

1"'-1

-2ko exp(kol),

1j/(W3'

w4 )

1"'-1

-ko

where

1=

f + g = f%2 Iq(tW 12 dt. %1

Consequently R

= i,

T

= exp( -kol).

(8.93)

There is no point in retaining the exponentially small terms in R because only the first term in the asymptotic expansions has been kept so that the most we can assert is that R = i + O(I/ko). However, the exact relation (8.91) coupled with (8.93) for T does allow the statement

IRI 2 = 1 -

exp( -2kol).

(8.94)

The interpretation of these formulae is that when the refractive index possesses two zeros there is a barrier which prevents the transmission of all but a minute amount of energy. The amplitude of the reflected wave is the same as that of the incident wave to within an exponentially small factor. The phase, however, is advanced by tn. Consequently, when a ray is reflected by a minimum in refractive index the reflected field may be evaluated by geometrical optics provided that a phase advance of tn is inserted. Of course, the result will not be legitimate very near the reflecting level but only some distance below it. One application of this theory is to propagation over a plane earth. Suppose that N is constant and equal to 1 up to a height hi' passes through a positive minimum N(m) between hi and h 2 and becomes constant again at altitudes greater than h2 (Fig. 8.9). If a ray is launched at an angle 80 to the vertical the theory of §8.4 informs us that if sin 8 0 < N(m) the ray will not be reflected but will continue steadily away from the earth after refraction through the layer.

480


h,-----------

Fig. 8.9. Ray when the refractive index has a minimum above a plane earth.

If sin 00 > N(m) the ray will be reflected at a height h where N(h) the abscissa of the point of reversing direction vertically being x, = sin 00

(h

J

= sin (Jo,

{N 2(w) - sin? eo} -1/2 dw

ho

where ho is the height of the source above the earth. (It is assumed that the source is below the layer and in the region in which N = 1.) As 00 increases a reflected ray first appears when sin 00 is just greater than N(m) and then Xh is virtually infinite. Increasing 80 further decreases Xh but X h must subsequently increase because obviously X h -+ 00 as 00 -+ tn. Hence Xh has a minimum for some value of 00 • This minimum can be zero only if the ray with 80 = 0 is reflected, i.e. if N(m) = O. In general, there are at least two rays which turn around at any value of Xh greater than the minimum. A reflected ray will eventually strike the earth. If N(m) > 0, the point of impact must be beyond a certain distance because Xh cannot be below its minimum. The minimum horizontal range which must be traversed before a ray hits the earth is known as the skip distance. The field due to the reflected rays is often known as the sky wave. The sky wave is responsible for the ability to transmit signals great distances because it can penetrate to places where the direct wave from the source has been reduced to negligible proportions by horizon effects. When the reflected ray strikes the earth it will be reflected by the earth at the same angle to the vertical. It will therefore return to the upper layer and be reflected back to the earth again. The phenomenon is therefore one of multiple reflection. If the source is a vertical transmitter which would give exp(- ikoR)/R in free space the sky wave after m reflections at the earth and I

481

CANONICAL PROBLEMS

reflections at the variable layer is exp(tl1ti)D{Rv(Oo)}m exp( - ikoL)

d

°

where Rv(Oo) is the Fresnel coefficient for the reflection at the earth at an angle of incidence 0 , L is the optical path length, d is the geometrical length of the ray, and D is a divergence factor representing the effect of spreading of the rays. The only influence of the layer reflection other than in D is the phase advance of t1t on each occasion. The total field at a point is obtained by summing over all the reflected rays which pass through it and adding that which arrives directly without reflection. In the situation of Fig. 8.9 there may be rays emitted downwards which strike the earth before going to the transition layer; their contribution must be included. If both the transmitting and receiving points are on the surface of the earth the formula for the sky wave is l:D{ R v(80)}m{ 1 + R v(80)}2 exp{ - ikoL + t(m + 1)ni}

d where m is the number of earth reflections between the transmitter and receiver. The factor 1 + R; is included once because the receiver is at a point where both a downcoming and an upgoing ray occur and a second time because the transmitter is on the earth. The summation must be carried out over those values of 80 which supply a ray through the point of observation, i.e. which satisfy x = 2(m + l)x h where x is the distance between transmitter and receiver. Also L =

f

N ds = 2(m + 1)

{h N 2(w){N2(w) -

sin? eo}

-1/2

dw.

With regard to D the rays in a small cone occupy an area d2 sin 80 d8 0 dl/J in free space. Here they fill an area x cos 80 dx dl/J. Hence D =d

(tan Ool:Oo/dXI)

1/2

and the sky wave can now be computed. In practice, the determination of rays when the positions of the transmitter and receiver are specified has to be carried out by trial and error. One way is to trace a ray from the transmitter. If it misses the receiver by being at a range Xi instead of XR at the height h R of the receiver, a new angle of launching based on the iteration cosec 8i + 1 = cosec 0i - N(hT )(XR - x i ) / N 2(h R ) oxi/oN(h R ) is tried with hT the altitude of the transmitter. Alternatively, and probably more

482


effectively, rays are traced for a selected set of launching angles and then, if two consecutive ones bracket the receiver, a better approximation is derived by interpolation.

Exercises 25. Show that the change in range at constant height due to a change in launching angle satisfies

-OX = -No + cot 80 080

N~

I(

NN") 1- dx ,2 N

0

Deduce the intensity of a ray. 26. Assuming that N is given only at a sequence of discrete data levels interpolate (i) 1/N linearly between levels, (ii) N 2 by quadratics with continuous first derivatives, and (iii) N 2 by cubic B-splines. Set up a program for tracing rays and compare the three possibilities when N(z) = 1 + zla and when 1/N(z) = 1 + zla. Note particularly the comparative positions of any caustics. If N has a parabolic variation, first increasing from the surface of the earth to a maximum at height hit then decreasing to a height h2 , and thereafter decreasing linearly to a height h3 where there is a perfectly reflecting boundary, draw a ray diagram.

8.13 Edges An essential feature of the procedure in applying geometrical optics to scattering by an obstacle is the substitution of the tangent plane at a point for the local surface. Such a replacement is plausible if the principal radii of curvature at the point are large compared with the wavelength. If the wavelength is too large another canonical problem may be necessary but the more awkward situation is that in which the radii of curvature are large in some places and small in others. If one radius of curvature is large and the other small another canonical problem is involved. When one radius of curvature is actually zero as at an edge the geometrical theory of diffraction selects as its canonical problem the scattering by a semi-infinite plane. This section is concerned with elucidating the properties of this canonical problem. Let the perfectly conducting semi-infinite plane occupy that part of y = 0 on which x is negative (Fig. 8.10) and lie in free space. Represent x, y by the cylindrical polar coordinates r, l/J. Then we consider an incident electrically polarized plane wave in which the z component of the electric intensity is given by E i = exp{ -ikor cos(cP - cPo)} where cPo is a real angle satisfying 0 < cPo < polarized incident wave in which Hi

1[.

In parallel there is a magnetically

= exp{ -ikor cos(cP - cPo)}.

483

CANONICAL PROBLEMS

y

x

Fig. 8.10. Plane wave incident on a semi-infinite plane.

The boundary condition on the semi-infinite plane is that the total E vanishes there or, correspondingly, oR/oy = A detailed derivation of the solution will not be given. That the formulae displayed below do comply with all conditions of the problem may be readily confirmed by the reader. The solution is expressed in terms of the function F(w) defined by

o.

(8.95) which has been tabulated (Clemmow and Mumford 1952; see also Banos and Johnston 1970) for complex w; it may also be written otherwise by Fresnel's integrals. One useful attribute is that

F( - w) = n 1 / 2 exp(iw 2

ini) - F(w).

-

(8.96)

Integration by parts in (8.95) easily furnishes

i + -1 F(w) = - 2w 4w3

+

0 ( - 1 5) Iwl

(8.97)

for tn > phw > -n. Formula (8.96) caters for the asymptotic conduct in other ranges of phw (see also §8.24). By putting

we see that

F(w) for [w] « 1.

= 1n1/2 exp( -!ni) -

w + O(lwI 2 )

(8.98)

484


The electrically polarized solution is

E = n- 1 / 2 exp( -ikor + !ni)[F{(2k or)1/2 sin t0) ( ~ z),

The first discloses the presence of a plane wave reflected from the face of the diffracting sheet as if it were infinite in extent; the second contains the incident wave alone while the third indicates a zero field behind the screen. These formulae bring to light geometrical optics. For there is a shadow zone behind the plane where there is no field (Fig. 8.11), an illuminated region where the incident plane wave exists, and a reflection sector where a reflected wave is manifest. The field of geometrical optics is discontinuous across the lines cP = ± cPo which is why the more recondite expression (8.99) is wanted to secure a smooth transition. Extract the geometrical optics terms from (8.99), leaving what is known as the diffracted field. When kor » 1 and
- 4>0)

(8.138)

497

CANONICAL PROBLEMS

with E~,,(O) signifying the value at s = O. It follows that the next order term in the non-uniform expansion is given by i

+ E~

J1/2(F _ iE 1

2t/1

1

4t/13

)ei

2

= fin J1/2(F _ 1

iE~ + E~ )ei 2t/1

4t/13

2

.

From (8.136), the right-hand side is 3/2

2

i 1·E 1" (0) • 1

SIn

2(cP - cPo)

. It 3

-

1/2

e[J

curl Fo]o cosec

eo + fin (J

i 1 2e / 12 e E 0 ) 3

4t/1

Now

iE~) + -ecurl it 3 i it 3 . Eo - -2egrad t/1 /\ Eo

t 3ecurl Fo = t 3ecurl ( Fo - -

2t/1

2t/1

2tfJ

and so an alternative expression is

{iE i1,,(O)

+ [t 3 ecurl

23/ 2 sin t(cP

E~]o cosec

- cPo)

+ fin J 1/2

eo}

t3cosec 9 {4~3 grad L E~ i

0,

/\

i

curl (Fo _

i~;)}.

But,

where

Xl = sin 80 cos

cP,

X2

= sin eo sin cP, X3

= cos

80

and the x j are the Cartesian coordinates corresponding to the base vectors t j . Hence, for small s,

t/1 = (2S)I/2 sin eo

sin

t0)' (8.149)

501

CANONICAL PROBLEMS

There are similar results for Fm , involving a linear combination of the first m derivatives of the incident field at the edge. For example, fin rl / 2F1

=-

i2-3/2E~(0, 0) cosec

!(cP - cPo)

+ ~~ 2- 3/2 E ~(O, 0) cosec" i - 4>0) + 2 - 7/2 oE~(O, 0) cosec 3 21(,k 0/ -

or

,k)

(8.150)

0/0

where

b=

a2Li

-2

ox

o2Li a2Li cos? cP + 2 - - cos cP sin cP + - 2 sin? cP Ox oy

oy

evaluated at the edge. From (8.148) and (8.149)

Fo = -

and, since

v 2 Fo = -

i2 - 3/2 r- 1/2 E h(o, 0) cosec

!( cP - cPo)

i2 - 5/2 r- 5/2 E~(O, 0) cosec'

(8.151)

!2)

(8.157)

If the incident wave is the plane wave

t/I

uo = exp{ -iko(X cos 0) and the total field given by (8.156) is u 1(x, y) ~ n- 1/2 exp( -ikor + !ni)F{(2k or)1 /2 sin t u(x, y) '" u'(x, y) - n- 1/2 exp( -ik X

{F(kA/2tfr) + tik o / 1

(8.158)

-n is

~

otii

~)

+ 0/0

2

+ !ni)ui(x, -

y)

2 "to (t)"CO~2)"} tfr- l

+ n- 1 / 2 exp{ -iko(L~ + r) + !ni}ko1/ 2

L {Fm(r, cP) m=O 00

x

Fm(r, 27t - cP)}/k~

(8.159)

where ifJ is t/J with 2n - 4> for l/J. Since tii = f//(2n - to - 0 so that the upper diffracting edge lies in the shadow region cP 1 > cPo. According to (8.158) the incident field V1 = U1 - Uo can be approximated by substituting (8.162)

504


l/Jo·

when

Diffraction of -

Uo

produces the total field

-uo(x, y) - exp{ -ik or (1 ) cos(1) o1 x [F{ - k~/2(r(1) + r2 - r1)1/2} - !iko1/2(r(1) + r2 - r1 ) -1/2]

_ i exp{ - iko(r(1) + '2)} (~ ~(1» (~(1) ~ ) (1 ) )1/2 g 'Po, 'P g 'P ''P2 8nor k « r2

+ V1(X, y) -

exp{ -ik or(1) cos(1)' o1

exp( - iko

By the same process as has just been adopted v1(x, y) +

V2(X,

exp{ - iko(r (1 ) + r2)}

2n:(2k r ) 1/2 g(4)o, 4>1) o1 x [F{k~/2(r(1) + r2 - r1)1/2} + !ik 1/2(r(1)

y) '" -

o

+ r2 -

_ i exp{ -ik o(r(1) + r 2 ) } (~ ~(1») (~(1) ~ ) (1 ) )1/2 g 'Po, 'P g 'P ,'P2 8nor k « r2 Again this formula is uniform for 0 ~

~

1t,

+ a)}

X [g«(JN' 4J -1) {F(_ k I / 2 ,11) 0 'I' r_1/21

-

lik -1 / 2 ,112

0

'I'

1} -

i

4(k) oar 1/2

g(£J

.\-n)g(l1t ,I,.)J

N' 2

2 ' 0/

where t/J = (r + a - r_ 1)1/ 2 with the same convention on sign as has been employed hitherto. The wave V 2 strikes the upper half-plane and initiates v3 • On account of the rapid variation of F near to 4J = t1t a procedure similar to that of the preceding section must be followed, Since this has to be done several times it pays to adopt a general approach. Indeed, it is convenient to start from the representation

where t/J 1 = (r1 + a - r)1 /2 with the same convention on sign as has been employed to date, While more complicated than (8.172), because

and all other a1q and b1q vanish, it has the advantage of being in a form which is suitable for generalization. By replacing r1 and 4> 1 by r- 1 and 4> - 1 respectively we deduce that, to find V2' we need the field scattered by the lower half-plane from the incident wave exp( -ikor-1)t/J!-1 where '" -1 = (r -1 + a - r _2)1 /2. According to (8.155), the scattered wave in 0 ~ 4> ~ 1t is - 1t -

1/2

,11 exp (Ok 1 0'1'

2

1t1 - 1 or- 1 ),IIQ + 4look 'I' - 1

_ exp{ -iko(a + r) - !1ti}(h ,I,.)~ 2(21tk r)1/Z g Z ,'I' Oq o with an error which is O(k0 (1 / 2 )Q-

l ).

Here [tq] is the largest integer which does


512

not exceed !q. Consequently, in 0 ~ 4> ~ n, V2(X,

y)

= - (nk o)- 1/2 exp{ - iko(a +

{F( - k~/2",) -

+ ialO(a, in)

2(2kor)1/2 9

+

ini}

+ kC; 1/2blir-1' cP -1)}

x [qto {ali r-1' cP -1)

x

r)

X

(k~/2", -l)q

tikC; 1/2", - 1 (~~:q (t)nCo~2)

n}

~)J

(In

2 ,0/

o

correct to O(k 1). Now, on account of (8.166),

1 i (l/2)q 1 (i)n F(z)+-2h)n 2"

L

Zn=O

Z

= 1

+ - -n

!n 1/ 2 i exp( -iqni) 1.)' q+l

(1.

- 2 q - 2 .Z

1 12

2 zq

00

{(3 1).}" exp "4q - 4 nl i..J

m=O

()m (1 i) m exp 4mnl Z 1 1 . ("4m - "4q)!

(8.174)

Hence V 2(X,

y)

= ko1/2 exp{ - iko(a + r)} x [-

x

f

q=O

q Ha 1k-l' cP-l) + kC;1/2blq(r_l, cP-l)}( - )ql1 exp(iqni)

~ exp(imni) (k 1/2 ,J,)m

i..J.lm

m=O(2

1), - Iq ·

0."

~ ii1q(r- h cP-l) ()q

x q~O (-tq _

t)!

+ exp(hri)ii1O(a, in) 2(2nkor)1/2

where

-

9

~ exp(ini)

+ 2 k1/2,J, 0 v

(1 i) q exp -a:qm '1

(tn

A..)J

2 ,0/

(8.175)

n = t/J -1/t/J. If (8.175) is written as

V2(X,

y)

= ko1/2 exp{ -iko(a + r)}

L {a2q(r,4» 00

q=O

+ ko1/2b2q(r, 4»}(k~/2t/J)q (8.176)

it follows that

(8.177)

513

CANONICAL PROBLEMS

(8.178) By commencing with (8.176) we can deduce V3(X, y)

= ko1/2 exp{-ik o(2a + r1)}

L {ii 3q(r1, 4>1) + ko1/2b3q(r1, 4>1)}(k5/ 2t/J )q 00

q=O

where Q3q(r1 , 4>1) is related to a 2 m (r, 4» by a formula of the same type as i8.I77) except that '1 is exchanged for '11 = t/!It/! 1. A similar remark is true for b3q. Sufficient quantities have now been determined for a calculation of the field. Let v 2 n(x, y) = exp(-ik or)w2n(r, 4». Then, according to (8.155), (8.151), and (8.152), a non-uniform expansion of V2 n + 1 for n > 1 is exp{- iko(r + a) - !ni} or1 2 1 1 3i 1 {cos !(!n - 4> 1) X [ -W2n(a, 2n)g(21t, 4>1) - - - w 2n(a, In) -.-3 - 1 - 1 - - 8koa sin 2(21t - 4> 1)

_

v2n+ 1 - - 2(2nk - -1 -)1/2 ---

cos! !(!n + 4>1)} __i_ 3 1 1. sin 2(2 n + 4>1) 4k o

+ . X

{w

2n

(a, in) r1

+

[~

ar

1 1

W

(,I,,)J

2n r, 'P

} rt

=0

{cosec"! 1)+ cosec! t(tn + 4> I)}

It is non-uniform because it fails in the vicinity of 4>1 = ±in. Now

o

w2n(a, in) = k 1/2 exp{-ik o(2n - 1)a}{a2n,0(a, in)

aw~(r, 4»J [ urI 2n

rt

=0

=

1

-2: a

-

+ ko1/2b 2n,0(a, in)},

1/2.1. . a2n,l(a'2n)cos4>lexP{-lko(2n-1)a}.

Hence

(8.179)

514


By means of some rather elaborate analysis it may be shown that

(8.180) The same expression. holds for a

(a

2n.l

,

1 )

21t

=(-

Q2 n + 1

) N + 1 + 2n

.

8Ina

with n +

(8

t for n. Also,

1 ) 2n - 1

~ -3/2(2 _ )-3/2 L. m n m

g N,!n

1/2

(8.181 )

m= 1

and

where h(8) = 2 1/ 2(2 + cos 8) sin t8/cos 2 8. The total diffracted field from the upper edge due to the plane wave striking it can now be calculated from (8.172), (8.179), (8.180), (8.181), and (8.182). It is _ VI

+

~ _ L. V 2n + 1

n=1

(- )N exp( - iko r 1 - !ni)

= ---2(-2-k-)-1/-2-n

o~

With regard to the plane wave striking the lower edge we have

The fields v2 , V4' ... successively scattered from the upper edge can be calculated by the same method as has just been described. Thus the field diffracted by the upper edge due to the incident mode is _

vd(r1' 4J1) =

LV= 00

n=1

_

n

( _

)N exp( - ikor1 - !ni) 2(2 k )1/2 !(4J1) n

o~

(8.183)

515

CANONICAL PROBLEMS

where f(1) 2(2rek oa)

2 1 1 1 + -is- [G1(ON' Ire)H 1(re - 4>1) - G1(Ire, 4>1){G 1(ON' Ire) - H1(ON)}],

16rek oa

G1 «(J, 4»

= cosec t«(J - 4» -

H1 (4) ) =

21/ 2 cos !4>(2 -

cosec t(O + 4», cos 4» sec2

4>.

Deduce that the reflection coefficient for the nth mode is -i{1 + (- )N+II}f(re - ( 11) / = 0 when the coefficient is halved. 32. Check the statements made about (8.186) at cut-off frequencies. 33. The plane wave exp(ikoY) is incident on the parallel-plate waveguide of Fig. 8.21. Show that the distant total field produced in the direction 4J = - ire is 2aK n unless n

(a) 1/2{ 1 - .~1

1 1 exp(-ikor) [ 2- 21t;

. } x exp( - 21nk oa)

00

=+=

(

1

(2n)I/2 - (2n

1)

+

1)1/2

exp( -!rei)] 1/2 2(2rek or)

the upper and lower sign being taken according as the incident wave is electrically or magnetically polarized. 34. Try the same type of problem as Exercise 33 but with the plates staggered. 35. Repeat Exercise 33 but with a line source on x = 0, y > a. 36. Examine the possibility of extending the theory of this section to a cylindrical waveguide.

517

CANONICAL PROBLEMS

37. A slit in an infinite plane occupies Ixl ~ a, y = 0 and is irradiated from y < 0 by the plane wave exp(- ikoY). Find a uniform asymptotic expansion for the field, using polar coordinates (rl, 4>1) at the left-hand edge and (r2, n - 4>2) at the right-hand edge (so that the slit is in 4>1 = 0 and 4J2 = 0). Show that, in the electricallypolarized case, the diffracted field is exp(- iko'l - ini) (I 2(2nk

-

' d 1/2 o

g

2

n

exp(- iko'2- !ni) (I n

A" ) _

,'1-'1

2(2nk o'2)1/2

g

At. )

2 , '1-'2

iexp(-2ika) {eXP(-ikor1) 1At. exp(-ik o' 2) lA,.} O(k-3/2) 4 k 1/2 1/2 sec 2'1-'1 + 1/2 sec 2'1-'2 + 0 • n Oa'l

'2

For magnetic polarization replace g(1n, 4» by G1 (! n, 4J) (defined in Exercise 31) and drop the third term. Compare your formulae with the analytical results of Luneburg and Westpfahl (1975) and Sologub (1972). 38. Repeat Exercise 37 with the aperture and screen interchanged. Confirm that your results are in accord in Babinet's principle. 39. In Exercise 37 the right-hand screen is such that the field vanishes whereas the left-hand screen makes the normal derivative zero. Show that the transmission coefficient is 1 cos(2koa - in)

1- -2n1/2

(k oa)3/2

-

9

32n 1/2(k oa)S/2

. 2k 1 sinf a - -n) 0

4

(for non-perfectly conducting screens see Senior 1975). 40. A splash plate for the waveguideof Fig. 8.21 is constructed by making y = d (d > 0) perfectly conducting. If kod» 1, calculate its effect by imaging the field found without a splash plate. If the splash plate is of finite length determine the additional field due to the edges to a first approximation.

8.18 The wedge A canonical problem for what happens when the edge is on a surface which is not infinitesimally thin is provided by the wedge. The angle of the wedge will be taken as b, with free space occupying -n < 4> < n - b (Fig. 8.22). When () = 0 the wedge degenerates into the semi-infinite plane already considered. The illumination will be assumed to be due to a line source at L producing an incident field of -iiHb2)(kolx - XII) where Xl is the position of L. It will be supposed that the wedge is perfectly conducting, though solutions are available for impedance boundary conditions with different impedances on the two faces (Van Dantzig 1958; Lauwerier 1959, 1960, 1961; Williams 1959, 1961; Malyuzinec 1958). For electric polarization

Ez

= w(1 = -t~. Stationary phase provides the diffracted wave

±

(8.190) so long as 4> is not near a shadow boundary or the boundary of a reflected

519

CANONICAL PROBLEMS

pni

-----------------------

tl cosh:"

(r 2ff+f f") 2

1

-pni

- - - - - - - - --- --- - - - - - - - - - -

Fig. 8.23. The contour of integration for the wedge.

wave. Consequently E zd

HZd

'" -

'" -

pi exp{ - iko(r + r l ) } . . 1/2 sm p1t SID p(cP 21tk o(rrt ) I

.

+ 1t) SID Jl(cPl + 1t),

Jliexp{ -iko(r + r)} . 1/2 1 sm Jlx{cos Jl1t - cos Jl(4J 21tk o(rrt ) I

+

x) cos Jl(cPl

(8.191)

+ x)} (8.192)

where

I = cos 2 Jl1t - 2 cos Jl1t cos Jl( + 2:n)} + n~/2 exp(-ikor + !ni)F{-(2kor)1/2 cos ~ (c/> + 2:n)} +{

u sin Jlx 1 1 (A,. 2N x)} exp( - ikor - !ni) +"2 sec - 'P+-cos Jlx - cos Jl4J 2 Jl (2xk or ) 1/2

(8.198) where the prime on L indicates that n = N is to be omitted. The formula (8.198) provides a smooth transition as 4> + 2N1t/Jl passes through 1t in contrast to the singular behaviour of (8.197). The transition as 4> + 2N xlu passes through -1t is also given by (8.198). Oblique incidence may be covered by the device of §8.13. An approximate treatment of the impedance wedge has been given by Syed and Volakis (1992).

Exercise 41. A wedge of height h is placed symmetrically on the side of the splash plate of Exercise 40 facing the waveguide. Examine how this affects the radiation pattern for various values of h and a selection of wedge angles. When the interior of the wedge plays a part as, for example, when it is dielectric, little progress has been made with solving the canonical problem except in the special cases where ~ = 1t (plane interface) or where the dielectric constants differ only slightly from the environment (Rawlins 1972).

8.19 The effect of curvature The next problem to be examined is that of a curved obstacle (Fig. 8.24). The first matter to be disposed of is the rays which can be anticipated on the basis

Po Fig. 8.24. Rays for scattering by an obstacle.

523

CANONICAL PROBLEMS

s R

R

(a)

(b)

Fig. 8.25. Examples of creeping rays.

of Fermat's principle. If the medium is homogeneous the absolute minimum of the optical path length is provided by the straight line Pop. It corresponds to the direct ray from Po to P and carries the primary radiation as if there were no obstacle present. There is also a reflected ray PoP, satisfying Snell's laws as in §8.5. In addition, there is the stationary path PoQRSP in which PoQ and PS are tangential to the boundary of the obstacle whereas QRS is on the surface of the target. A ray of this type is called a creeping ray because of the way it sticks to the boundary of the obstacle. Other creeping rays exist. For example, in Fig. 8.25(a) PoS'RQ'P is similar in type to that of Fig. 8.24 but going around the body in the opposite sense. In Fig. 8.25(b) PoQRSOQRSP does more than a complete circuit of the obstacle. Clearly, an infinite number of creeping rays can be constructed. The total field at P is the sum of all contributions from the direct ray, reflected rays, and all creeping rays. If all these contributions had to be computed it would be a tremendous task but, fortunately, it turns out that most of the creeping rays can be neglected because they are heavily damped. The above picture has omitted any rays in the interior of the body; should there be interior rays there may be contributions at P from rays which have emerged from the body after refraction and internal reflection. The canonical problem for creeping rays is the circular cylinder of radius a illuminated by a line source at (r 1, valid for Ivl »X, [phv] ~ !1t - fJ (fJ > 0) and H~~(z) = exp(v1ti)H~I)(z) it may be verified that deformation in the lower half-plane is permissible when l/J - l/Jl < 2n1t and in the upper half-plane when l/J - l/Jl > 2n1t. Consequently,

525

CANONICAL PROBLEMS

for 0
koa the appropriate integral is

530


The equation for the stationary point is now cos- 1

(_V) + kOrl

COS-1

(~) = -a.. kor

(8.212)

For its existence it is necessary that -n < a. < 0 and rl cos a. < r. This time the incident wave is provided in that part in n <
1) of (8.201) but N, still being given by (8.225). P, will be large and so will its derivatives. If (8.237) is entered in the governing partial differential equation and the derivatives of Vs and N, with respect to 4> neglected by virtue of their smallness, the disappearance of the largest terms demands p~ =

±vs •

The upper sign is the proper choice for the penumbral point under consideration and so

~(4)) = f'; v.(t) dt tPt

which reduces to the right value when Z is a constant. When r, 4> are very near to r 1 , 4>1 respectively, Vs and N, are effectively constant so that (8.237) displays the correct singularity for a source. Consequently, (8.237) can be regarded as a satisfactory solution when Z is variable. Next, suppose that the boundary is not circular, but is convex. To adapt (8.237) to this situation take a in (8.226) as the radius of curvature of the boundary at the point under consideration. The polar coordinates r, l/J are no longer adequate. Instead use the arc length (J of the boundary (increasing in the same sense as l/J does) and a transverse coordinate R. On the boundary R = a and aR/an = 1, so that R does not change significantly with a wavelength alteration in (J. The choice of these coordinates is reasonable if we do not stray too far from the boundary. Now, Vs is a function of (J, as determined by (8.226)and

s, = -ti

f

s=l

H~~)(~oR) H~~)(koRd exp{ -iP'(u)} N;

(8.238)

with N, defined by (8.225). In view of the choice of R the boundary condition (8.224)is satisfied.Therefore (8.238)is appropriate to an upper penumbral point if

_ Vs d(J - -;;.

d~

(2.239)

The expression (8.238) can be expected to be valid only when both the source and point of observation are near the boundary because Rand (J are coordinates local to the boundary.

537

CANONICAL PROBLEMS

For a distant source, the plane-wave approximation can be employed. In this case, is taken to infinity before the generalizations of (8.237) and (8.238) are attempted. It is fund that for an incident plane wave of unit amplitude

'1

s, = 2 L 00

s=1

exp{ -i~(u)}

H(2)(k0 R) 2 Ns YS

(8.240)

where now P, is zero at the penumbral point where the incident ray grazes the body. Although (8.240) is valid only near the body, the Hankel function can be approximated as soon as R is somewhat different from a and (8.240) has a cylindrical wave structure similar to (8.208). By matching to a suitable cylindrical wave E, can be continued to points well away from the body. When Z is infinite and the Hankel functions are replaced by their asymptotic formulae the basic difference berween (8.208) and (8.239) is that vs(
palN, along conventional rays tangential to the caustic r = palNt. In other words, after a period of exponential decay the complex ray is transformed into a real ray whose diminution of amplitude is due solely to spreading by radiation. Rays which reappear as conventional real ones after traversing a region of evanescence are sometimes known as tunnelling rays. Although the rays in the state of propagation lose amplitude only by radiation, they will be exponentially smaller than rays which have not tunnelled because of the dissipation of energy in reaching the caustic. Nevertheless there may be an appreciable leakage of energy from the caustic. It is important to observe that tunnelling rays do not occur for a plane interface since the caustic moves off to infinity as a -+ 00. Hence tunnelling rays represent a surface wave which is peculiar to a curved dielectric. The amplitude changes which take place at a dielectric interface may be calculated from Fresnel's laws for a plane boundary to a first approximation (§8.5). To obtain the appropriate coefficients in three dimensions, recall that if the plane wave Ei

= (Ii + mj + nk) exp{ -

ikN(x sin 80 cos l/Jo + Y sin 80 sin l/Jo

with

I sin (Jo cos l/Jo

+ m sin (Jo

+ Z cos (Jo)}

+ n cos (Jo = 0

sin l/Jo

is incident on y = 0, the reflected wave has components (the exponential factor being omitted) Er = REI (R E + RM)m cos (J, sin (Jo cos l/Jo (8.281) x + .2(J ,

sin

E; = RMm, Er

z

RE

j

(R E + RM)m cos (Jj cos (Jo

= n + - - - - -2 - - - sin (Jj

(8.282) (8.283)

with R E and RM given by (8.55) and (8.57) respectively, (J being replaced by the angle of incidence (Ji where cos (Ji = sin 00 sin l/Jo. Similarly, for the transmitted field

(8.284) (8.285) (8.286)

where T E and T M are given by (8.56) and (8.58) respectively (Oi being substituted

562


for 0). Noteworthy is the fact that components of the reflected and transmitted waves tangential to the interface can be created by the normal components of the incident field. Also, when dealing with rays in general, the twisting of the plane of polarization as described in §8.3 must be borne in mind. The formulae for the reflection and transmission coefficients are defined for all angles of incidence OJ but there are important differences of interpretation if sin OJ > N1/N; such an eventuality can arise only when N1/N < 1. If N1/N < 1 define the critical angle of incidence Oc by sin Oc = Nt/N. For OJ> 0c' (Nf - N 2 sin 2 OJ)1 12 is taken to be negative imaginary and the transmitted wave with coefficients (8.284)-(8.286) will be exponentially attenuated as it moves along. Thus, unless the coefficients are modified, the phenomenon of tunnelling which occurs for curved interfaces will be concealed. By consideration of the canonical problem it is elicited (Snyder and Love 1975; Jones 1978) that the appropriate reflection coefficients are those of (8.55) and (8.57) with {(NtlN)2 - sin OJ} 1/2 replaced by Ai'(uo exp(ini» {tkNPo exp(tni)} 1/3 Ai(uo exp(ini» where Ai is the standard Airy function. The quantity Po is the radius of curvature of the section of the interface by the plane of incidence, i.e. the plane containing the incident ray and surface normal, whereas (!kN po)2/3(sin2 Oc - sin? OJ) (8.287) Uo = - - - - -----{I + po(sin 2 Bc - sin? OJ)lb }2/ 3

b being the the larger of the principal radii of curvature. It is understood that sin 2 Oc = NflN2 even when N1 > N. Corresponding results for transmission are more recondite because of the presence of a caustic when OJ > 0c. These formulae are relevant when the incident ray is on the concave side of the interface; no change to R E and RM is necessary for incidence from the convex side because there is then no tunnelling. Denote the reflection coefficients just delineated by RE and RM• Unless OJ ~ 0c' IUol will be large since ray theory demands radii of curvature which are not small. Hence the asymptotic formulae

iw

3/ 2 . exp( ) Al(W) '" 2nl/2wl/4

w t /4 Ai'(w) ~ - - - exp(

2n 1/ 2

. exp(11ti) 2· 3/2 Al(W) ~ 1/2 t/4 COS(31W n

w

1/4

-

(lph

-iw

3/2

wi
0c' and

= N(sin2 Oc -

sirr' 0i)1/2, Or being negative imaginary when

_1_ = 41tluoI1/2IAi(uo exp(!1ti»I 2, H(u o) J(u ) o

(8.294)

= 41tIAi'(uo exp(!ni)W

(8.295)

IU o11 / 2

with U o = IUol exp(-nil when negative. When IUol » 1 it may be inferred from (8.279) and (8.289) that H(u o) '" 1

'" exp(- ~luoI3/2) J(u o) '" 1

Hence, for IUol » 1,

- E \2 .-.. l - IR

'" exp(iluoI 3/2)

4J1.J1. 1NN1 cos 0iCOS Or (NJ1.l cos OJ + N1J1.cosOr)2

(U o > 0) (u o < 0),

(8.296)

(u o > 0)

(uo < 0).

(8.297)

(Uo > 0)

4J1.J1.1 N Nt cos Oilcos Orl exp(-iluoI 3 / 2 ) N2p.f COS 2 0j + Nfp. 2lcos Orl 2

(8.298)

(uo < 0)

(8.299)

564


When IUol is not large, 0i ~ 0c because of the huge factor kNPo in (8.287). Furthermore, the second and third terms of the denominator of (8.292) can be viewed as products of (tkNPo)- 1/3 and a bounded quantity. They are therefore negligible provided that (8.300) tkNPOJll cos:' 0c » u. Accordingly, for moderate values of U o 1_

IREI 2

~ 4N1Jl.Jl. l cos O;lcos OrIH(uo) . NJlf cos" e,

(8.301)

Since Bi ~ Bc ' the denominator does not differ by much from INJll cos 0i + N1 Jl cos 0rI2/N. On this basis (8.298), (8.299), and (8.301) may be combined into the single formula

1 _ IR EI 2

= 4NN1Jl.Jl. l cos Oil cos OrIH(uo) INJll cos OJ + N1 Jl cos 0r1 2

(8.302)

with tolerable accuracy over the whole range of U o. There is a corresponding formula for RM deduced from (8.293) by exchanging u, Jll and 8, s 1 in (8.302). From (8.287), there is the simplification Uo ~ (tkN po)2/3(sin 2 Bc - sirr' Bi ) (8.303)

°

when Bi ~ Bc • However, (8.303) must be dropped in favour of (8.297) when c becomes increasingly positive otherwise (8.302) will overestimate the amount of transmitted energy. While (8.302) is satisfactory for an incident wave that is E polarized some care is necessary when both polarizations exist because then cross-multiplication may produce algebraic decay rather than exponential for Uo« - 1 and additional terms in (8.296) and (8.297) may be necessary. For the special case of a circular optical fibre, one principal radius of curvature b is infinite and the other is the radius a of the cylinder. Let the z axis be the axis of the cylinder and suppose that a typical ray in the cylinder travelling in the direction of z increasing makes an angle 0% with the z axis. At its point of impact with the cylindrical boundary let the projection of the ray on the cross-section of the cylinder make an angle Bs with the radius vector from the centre of the cross-section to the point of impact (Fig. 8.34). Then

OJ -

cos 0i = sin 0% cos Os.

(8.304)

Meridional rays pass through the axis of the cylinder and are such that Os = 0, cos OJ = sin 0%. Rays in which Bs =F 0 are called skew rays. To illustrate the procedure we shall concentrate solely on rays which are polarized so that (8.302) is applicable. For meridional rays Po/b is unity and Po is infinite. Thus U o will be positive or negative infinity according as OJ is less or greater than 0c. Consequently, a meridional ray will either radiate

LEAKY RAYS

565

Fig. 8.34. Angles for propagation along a fibre.

energy by refraction as if at a plane interface (OJ < Bc ) or radiate no energy at all (OJ> 0c), there being a virtually instantaneous transition when OJ passes through 0c' If the initiating energy in a hollow cone between the angles Oz and Oz + dOz is 2n:G(Oz) sin Oz dOz the energy which leaves the fibre by refraction in a distance z is

where n is the number of reflections in a distance z given by z

n=-tan(J.

2a

If N, > N, the lower limit of integration is O. Sources off the axis will be responsible for skew rays and maybe meridional rays. If a ray starts from z = 0 at the point whose polar coordinates are (c, X) and goes in a direction making an angle (Jz with the cylindrical axis with its projection on z = 0 having the inclination of the polar angle Xo

a sin (Js = c sin(x - Xo)

(8.305)

at the point of reflection. Suppose that G«(Jz' c, x, Xo) is a measure of the incident energy along the ray so that

G(Oz, c, X, Xo)c de dXsin Oz dOz dXo

566


is the energy launched from a small area about (e, X) into an elemental solid angle about the specified direction of propagation. Then the energy which radiates from a length z of the cylinder due to the reflection of this elementary beam is

where v is the number of reflections given by 1 z tan (J%

v=----

2 a cos (Js

and (Jj is defined by (8.304). Moreover, from (8.287),

Uo

= (!ka cosec" O. cosec" Oz)2/3 {(~

Y- + 1

sin? Oz cos? O.}. (8.306)

The total radiation due to all rays is obtained by integration, in which it is advantageous to convert 10 to (Js via (8.305). Thus, if t is the total radiation

r= 4 X

af2~ ft~ fSiO-l(Cla) G(Oz, c, X, Xo}{l f0 0 0 0 a cos (J s 2 2. 2 (e - a sm (Js)

112

•

e de d 1 sm

(J

z

d

(J z

IREI 2 V}

d

(J

s:

(8.307)

If, in particular, G is independent of e and Xo put

f

21t

o

G«(J%,

c. X, Xo) dX =

and then, because of the structure of r = 8

2Go«(J%)

a

2

liE and (8.306),

Lt" Lt" Go(Oz){l _IR

EI 2

V} sin Oz dO z dO..

(8.308)

The value Os = 0 is admitted in (8.307) and (8.308) so that both include the radiation due to meridional rays. Also tunnelling is allowed for. The separate contributions from tunnelling and refraction can be assessed by limiting Os to a value less than that for which N cos (Js = (N 2 - Nf)1 /2 cosec (J% (this limit being tn if N, > N) in refraction and letting Os range above this limit for tunnelling. Obviously, as soon as Os exceeds the limiting value by a significant amount the contribution will be exponentially small on account of (8.299). Whether the rays near critical incidence will supply an appreciable effect depends on the relative sizes of various parameters but they should not be ignored without calculation. The integrand of (8.308) decreases as (J s increases so that the rays which are not far off meridional tend to radiate more than the rays which are more skew. On the other hand, the integrand (apart from Go)

LEAKY RAYS

567

is an increasing function of Oz with the consequence that isotropic sources will produce more radiated energy from their rays with larger values of Oz.

Exercise 60. By assuming that the z dependence of all fields is exp( -ikN cos (Js) in the fibre problem, deduce that when two-dimensional rays are refracted by a curve of radius of curvature a

with

/ Uo = (~kN2 a )2 3{(Nt)2 _ sin2 o.} s1n o, N and (Js the (two-dimensional) angle of incidence. What is the analogous result for the other polarization?

9 SOURCE DETECTION 9.1 General considerations

The location of the origins of an electromagnetic disturbance can be of considerable importance, either because one wishes to know where a source of scattering or radiation is situated or because one wishes to eliminate its influence on other signals. A related problem of practical significance is the determination of the properties of a medium from its effect on a wave; there are obvious applications in the detection of minerals, in the investigation of materials, and in assessing the state of the atmosphere for propagation purposes. It is vital to get straight right away that problems of this type do not usually have a unique solution and that assertions about the location of sources are highly dependent upon the assumptions made in analysing the signals. A common supposition is that there is a closed surface S containing all the sources and that a homogeneous isotropic medium lies between S and the observer 0 (Fig. 9.1). The general representations (6.75) and (6.76), which are exact, demonstrate that the observed field is equivalent to that generated by a suitable distribution of sources on S. One can infer that these are apparent (i.e. not real) sources if one has a priori knowledge that the sources are not actually on S. However, a similar representation in terms of apparent sources exists for any closed surface surrounding S but excluding O. Therefore, even the determination of apparent sources is subject to a fair degree of ambiguity. The most that can be asserted is that, given S, the field can be accounted for by a certain distribution of sources on S.

s

.0 Fig. 9.1. Surface containing sources.

SOURCE DETECTION

569

One may also have to be equivocal about the source distribution because there are sources which do not radiate energy at infinity. To see this (Friedlander 1973) consider Maxwell's equations in a homogeneous isotropic dielectric curl E

+

imJLH

= 0,

curl H - imtE = j

(9.1)

where the current density j is reasonably smooth and vanishes outside a finite region. Apply the operator (curl curl -k 2 ) , where k 2 = m2JLt , to (9.1). Then a current density is produced which supplies an electromagnetic field in which the electric intensity is - iWJLj and the magnetic intensity is curl j. Since j vanishes outside a finite region so does this electromagnetic field and so a source distribution which is responsible for a non-radiating field has been created. It has been proved (Bleistein and Cohen 1977) that, if j is zero outside a finite domain, eqns (9.1) possess a non-radiating solution if and only if (9.2) where c

= 1/(JLf,) 1/2, j(K, (0) = fi(X, (0) expux.x) dx

(9.3)

and the integration in (9.3) is over the whole space. Convergence of (9.3) is immediate since j is non-zero only in a finite domain. Current densities for which (9.2) is satisfied need to be treated with care. It must therefore be recognized that the formulation for source detection may often not yield a unique solution and that, even when additional information is available, the difficulty may not be resolved. For instance, suppose it is alleged that the sources are to be regarded as radiating in free space and their positions are to be determined from distant observations by tracing rays backwards. Then, consider the situation in which there is a point source M in a dielectric of higher refractive index than where the observer is located, the interface between the two being a plane (Fig. 9.2). The observer will place the apparent source at the point where nearby rays focus, i.e, on the caustic of the refracted rays An observer at 0 therefore puts the source at M' where OM' is tangent to the caustic. Thus the apparent location of the source depends upon the

o Fig. 9.2. An observer at 0 places M at M'.

570

SOURCE DETECTION

position of the observer and its apparent direction from the observer will not be that of the true source except at normal incidence. Only detailed information about the medium containing the source would allow any likelihood of detecting the real source from the caustic traced by the apparent sources. Additional knowledge may be forthcoming from other approches. For example, by probing at other frequencies, say optical or infrared, it may be possible to deduce properties of the medium, or again a source might give out a variety of disturbances some of which were more easily analysed than others. However, one does need to be convinced that it is the same source which emits the different kinds of radiation before this latter avenue is open. The lack of uniqueness may appear, at first sight, to contradict everyday experience because telescopes and eyes seem to supply unambiguous images (except maybe in optical illusions). However, objects which convey sharp images usually do so by radiation which is incoherent in space. Highly coherent radiation from lasers is often afflicted by speckle and good visual holograms usually demand that the illumination be sufficiently diffuse. Similarly, polished objects viewed by specularly reflected light are often difficult to recognize. The knowledge that the radiation from a source is spatially incoherent can be a valuable piece of information. Even if the theory predicted a unique solution, enough data, as uncontaminated by noise as practicable, has to be collected for the application of the theory in a numerically sound fashion. It is clear therefore that the detection of sources can be fraught with anxiety for the experimenter and a considerable body of experience and skill may have to be built up before interpretations of physical significance can be accepted with confidence.

INVERSE SeA TTERING During the past decade numerous theorems and methods for the problem of inverse scattering have been published, e.g. Roger (1981), Sleeman (1982), Colton and Kress (1983), Colton and Sleeman (1983), Jones (1985), Colton and Monk (1985, 1986, 1987, 1990, 1992), Kirsch and Kress (1987a, 1987b), Kirsch et ale (1988), Jones and Mao (1989), Zion (1989), Ari and Firth (1990), Sylvester and Uhlmann (1990), Bates et ale (1991), Colton and Paivarinta (1991), Wombell and Murch (1992). It is impossible to do justice to all these developments here and so attention will be confined to a couple of simple approaches which, none the less, are of value in practical circumstances. 9.2 Low frequencies The first problem to be investigated is that in which it is known that a far field. has been scattered by a perfectly conducting obstacle in an otherwise

571

INVERSE SCATTERING

homogeneous isotropic dielectric but the shape and location of the scatterer are to be found. It will be sufficient for illustrative purposes to consider the two-dimensional case; the discussion can easily be extended in principle to three-dimensional problems (Imbriale and Mittra 1970). Let the scatterer have generators parallel to the z axis and be irradiated by a plane wave with its electric intensity perpendicular to the (x, y) plane. Denoting the electric intensity by E and choosing the x axis in the direction of propagation of the plane wave we have E i = exp( -ikx) in the incident wave, k being the wavenumber of the medium. A scattered field E S will be produced such that E i + E S is zero on the obstacle and ES

f"Ot.I

2 )1 /2 A(cP) exp{ -i(kr - 1n)} ( nkr -

(9.4)

as the distance r from the origin increases, cP being the polar angle which is zero on the x axis. The pattern function A(cP) is supposed to be known and from it the position of the scatterer is to be determined. Write A(cP) = 1:a ll exp(incP). The coefficients all can be calculated by standard Fourier analysis (cf. §§1.5, 2.14) and, in practice, the expansion is limited to a finite number of terms. Outside the minimum radius r = a of the circle enclosing the obstacle (Fig. 9.3) it is known that (9.4) is the asymptotic expansion of (9.5)

where H~2) is the Hankel function of the second kind. Actually, a is not given because the location of the scatterer is unknown. However, if E i + E S is calculated from (9.5) for decreasing r a value of r will be reached at which E i + E S is zero at a point on the circumference of the circle. This value of r will be adjudged to be a and the point as the point of contact between the minimum circle and perimeter of the obstacle. In practice, a zero will rarely be achieved precisely and it will be necessary to be satisfied when

,

Fig. 9.3. Geometry for inverse scattering.

SOURCE DETECTION

572

E S + E i is less than some pre-assigned tolerance (which must not be tighter than the data can attain). To discover another point on the perimeter of the body select a new origin (r o, <jJo) and repeat the process. The new pattern function A'(<jJ) is given by A'(<jJ)

= A(<jJ)

exp{ -ikro cos(<jJ - <jJo)}

= I:a~ exp(in<jJ)

where a~ = L a".Jn_".(kr o) exp{i(n - m)(tn - <jJo)}· ".

Then the minimum radius r = b (Fig. 9.3) is obtained via (9.5) with a~ for an and r, 4> measured from the new origin. The above procedure is generally satisfactory for single convex scatterers. For more complicated obstacles it may be supplemented by drawing a circle of radius d and centre (r 0' <jJo) whose interior lies entirely outside r = a. Inside this new circle

ES

= I:cnJn(kr') exp(in4»

where r' is measured from (ro, <jJo) and c; = L a".H~2J".(kro) exp{i(n - m)(n - 4>0) -

tmni}.

".

Now increase r' until a zero of E i + E S is discovered; this corresponds to another point on the perimeter. If A( 4» is not given for the whole range 14> I ~ 1t but only for <jJ 1 ~ <jJ ~ 4>2 say, then an approximation An to an is found by minimizing (§1.5)

f

~2

411

IA(<jJ) - LAn exp(in. n

Thereafter, the procedure is followed as before with An substituting for an. It must be realized that it is rarely possible to make observations as complete as required by theory. Measurements can usually be made at only a discrete number of angles and have to be extended by interpolation or other devices so as to cover a continuous range. Variations in time and wavenumber will be similarly incomplete. Therefore estimates deduced from real data are bound to be in error as well as suffering from the uncertainties (which cannot be avoided) from the noise inherent in any measurement. So predictions in inverse scattering theory are always highly idealized. The procedure seems to work tolerably well when a typical dimension of the scatterer is somewhat less than a wavelength but at higher frequencies the computation time becomes excessive and the method of the next section should be tried.

INVERSE SCATTERING

573

Exercises 1. From the known analytical solution for a circular cylinder of radius 11k use the above procedure to locate 8-12 points on the boundary. Repeat the calculation when the far-field data are restricted to 14>1 ~ 120°. 2. By means of one of the methods described in earlier chapters calculate the far field scattered by an elliptic cylinder with semi-axes 11k and 1/2k. Use this as data for the inverse problem and examine how much inaccuracy is introduced if the data are restricted to 14> I ~ tn. Does the orientation of the cylinder have much effect? 3. Repeat Exercise 2 for two circular cylinders, each of radius 11k, whose centres are separated by 61k, the line of centres being perpendicular to the direction of propagation of the incident plane wave. 4. Formulate the procedure for three dimensions and test it on a sphere and a spheroid.

9.3 High frequencies

At high frequencies the location of a perfectly conducting convex obstacle or target is assisted by the approximation of physical optics. The representation of the field is where

ES(P)

= ~ (grad div + P) lroB

",(P, Q) = exp( -iklx p

41tlx p

-

-

f s

nq

1\

Ht/J(P, q) as,

xal)

xal

and nq is the unit normal from the interior of S to the exterior. H is not known exactly on S but, at high frequencies, the reasonable approximation can be made that each small portion will be subject to the incident wave and a secondary field generated by the rest of the surface. To a first approximation the secondary field may be neglected. Thus we may suppose that an incident plane wave illuminates a section of S called the lit side and denoted by L in Fig. 9.4. The remainder of S is dark and denoted by D. On L, n . nq < 0 where n is the unit vector in the direction of propagation of the plane wave. In the shadow region D the current distribution is assumed to be zero and on the lit side L the approximation

nq

1\

H = 2nq

is introduced. Thus

ES(P)

= ~ (grad div + k2 ) lroB

1\

f L

Hi

nq

1\

Hit/J(P, q) as,

574

SOURCE DETECTION

exp (- ikn ox) a

s Fig. 9.4. The lit and dark portions of the target.

with the result that at distant points 2

ES(x) '" k exp(-iklxl) 2nelxI

f L

{P - (P .x)x} exp(ikx.xq)dSq

where x is a unit vector in the direction of x and iwP Write

= Oq

1\

Hi.

ES(x) '" A(n, x) eXP(~~klxl) so that A(o, x)

= wk (~)1/2f 2n e

L

{P - (P . x)x} exp(ikx. x q ) dS q •

Let the incident plane wave be

(;Y / Ei

Hi =

= eo exp( -ikn.x), 2 0 /\

eo exp(-ilm.x)

where eo is a fixed unit vector such that n. eo = O. Then

A(o, x)

=~ f 2nl

[(nq.eo)n - (nq.n)eo - {(nq.eo)(o.x)

L

- (nq.n)(eo.x)}x] exp{ikxq.(x - o)} dSq • Now irradiate the target with the plane wave eo exp(ikn. x), travelling in the opposite direction to the previous one. Then Land D interchange roles so that A( -0, x)

=

-~ r [(oq.eo)o -

2nl

J

(oq.o)eo - {(oq.eo)(o.x)

D

- (n,; n)(eo. x)}x] exp{ikxq. (x

+ o)} dSq•

575

INVERSE SCATTERING

Hence

A(n,i)

+ {A(-n, -i)}* = ~

f

2nl s

[(nqeeo)n - (nqen)e o - {(nqeeo)(nei)

- (n, en)(eo ei)}i] exp{ikx qe(i - n)} dS q

(9.6)

where the asterisk indicates a complex conjugate. Therefore

n.[A(n,i)

+ {A(-n, -i)}*] = ~f

2nl s

nqe[{1 - (nei)2}eo

+ (eo ei)(nei)n]

x exp{ikxqe(i - n)} dSq •

By the divergence theorem

n.[A(n, i)

2

+ {A(-0, -i)}*] = k eo.i(l - n.i) 2n x

L

exp{ikxq.(i - n)} dV

(9.7)

where V is the volume enclosed by S. Eqn (9.7) constitutes a basic identity (Bojarski 1967; Lewis 1969; Perry 1974; Bleistein and Bojarski 1975) from which the shape of the obstacle is obtained. Let Hy(x) be unity when x is in V and zero elsewhere. Then (9.7) can be written as (9.8) where

B

= 2nn. [A(n, i) + {A(-n, -i)}*] k2e oei(1 - nei)

tI

= k(n -

,

i)

and the integration in (9.8) is over all space. Regarding B as known from far-field measurements we can view (9.8) as giving the Fourier transform of Hy , i.e. Hy can be found from Hy(x)

= -13 foo 8n

B expfie , x) de.

(9.9)

-00

Consequently Hy can be determined and the location of the target thereby discovered provided that B can be calculated for all tI. All values of tI can be covered by picking i to differ from D by a small vector in a chosen direction and then allowing k to range over large values. The advantage of this choice is that physical optics can be expected to be a good approximation at high

576

SOURCE DETECTION

frequencies for observations within a small cone about the direction of propagation of the incident plane wave. Alternatively, we can set x = - 0 so that only observations of backscattering are involved and again physical optics may be anticipated to furnish good results. Unfortunately, (9.7) is then nugatory since both sides vanish identically. However, by scalar multiplication of (9.6) by eo we can still retain (9.8) by putting

B

= 21teo • [A(o, -0) + {A(-0, o)}*] k2

•

(9.10)

Nevertheless, a large number of measurements of the amplitude and phase of the scattered wave will still be required. A check on accuracy is provided by the fact that strictly By should take only the values 0 or 1. If, therefore, (9.9) supplies an By which differs significantly from either of these values it is clear that errors have been committed. In particular, this will be true if the evaluation of the integral in (9.8) leads to an appreciable imaginary part. Actually, it is easier to compute n, grad By in which the integrand of (9.8) is multiplied by ie , 0, eliminating a small denominator in B when n is near x. Since grad By vanishes except on S a trace of the boundary is obtained. Usually, real data will be band limited so that the resolution of the boundary will be no better than a half wavelength. An obvious disadvantage of the above technique is the requirement to make observations for all values of (I. The procedure can be adapted to obviate this difficulty at the cost of solving an integral equation. Suppose that measurements are possible only for 0) } (t < 0).

(9.19)

Substitution in (9.18) supplies

eo.e(n,

x, t) = -12 foo {f(t 2nc

0

r)b(n, r)

+ f(t + r)b( -n, r)} dr.

(9.20)

The left-hand side is known by measurement and f is prescribed by the incident wave so that (9.20) constitutes an integral equation to determine b. Once solved the position of the obstacle is provided by (9.15). Should the incident source be impulsive (9.20) does not have to be solved because a can be measured directly and b found from (9.19). .

Exercises

10. If f(t) = b'(t) find b in terms of e and see what formula results for grad Hy(x). 11. Locate a sphere by examining the scattering in the time domain of (a) an impulsive plane wave, (b) a Gaussian plane wave pulse.

9.5 Moving targets The exact formula (9.16) refers to a target which is stationary with respect to the source of the irradiating field. If the source or target is moving it needs modification. The case of an obstacle travelling with constant velocity while the source is fixed with be considered here. Formulae when the roles of the obstacle and source are interchanged can be dealt with by a simple transformation. If, however, different parts of the system possess different velocities a more elaborate attack has to be mounted (Jones 1964, 1977, Ffowcs Williams and Hawkings 1969). Let x, t be the coordinate system in which the sources are at rest and x', t' a system in which the target is stationary. The two coordinate systems are related by a Lorentz transformation. It will be assumed that the spatial axes of the two systems are aligned and that the target moves along the positive Xl axis with speed v. By writing formulae in terms of the velocity v, however, the special choice of the axes can be eliminated.

580

SOURCE DETECTION

The Lorentz transformation can be expressed as x;

= P(X1 -

Xl

= P(X'l + vt'),

vt),

where p = (1 - v2/e2 ) ei

t X2 = X~, 1/ 2 .

X3 = xi,

,= p(

t-~,

VX 1)

(9.21)

+ VX') e2l

(9.22)

t = P( t'

In the x, t system the incident wave is

= eof(t -

nex/e),

hi

= (e/Jl,)1/2 n A e'

On substitution from (9.22)

f (t where

OeX)

{( , 0' eX')} = g (t, - -e0' eX')

----;- = f Y t - -e-

y=

p(t- n~v)

,

p(n 1

n1 =

-

v/e)

y

=

P(1- D;V).

(9.23)

, n2 , n 2 =-, y

In the x', t' system the incident wave is l• ei, = pei + (l - p)(e v)v + PIlV A hi

v2

, ('t - -eO'eX') = eog where

+ (eo.V){(l

eo = yeo

- P) :2

+ ~D}.

(9.24)

Thus if the incident plane wave has angular frequency to in the x, t space it appears to have angular frequency yw in the system in which the target is stationary. This change in frequency represents the Doppler shift due to motion. In addition, the apparent direction of propagation is 0' instead of D. As is clear from (9.24) the polarization of the waveis also altered (unless eo is perpendicular to v); this is to accommodate not only the change in direction of propagation but also the interdependence of electric and magnetic fields. Indicate explicitly the dependence of A on the polarization by writing A(e, 0, X, In). In the x', t' system the obstacle is motionless so that (9.16) can be applied. Hence e'(n' x,t'- Ix' lIe) eS'(x, t') = - , ----A

,

,

[x']

where e'(n', i, t') =

~ f""

21t - 00

G(co)A(eo, D', ii, co) exp(icot') dco.

581

THE INVERSE SOURCE PROBLEM

Now

yG(w) and, from (9.21),

x' (

, Ix'i

t where Yl = P(1 -

= F(~)

x' • = t , - -c=

~

x.

x)j

t - -c-

Yl

x. vic), A'

Xl

=

P(X 1

-

vic)

~,

,

1'1

_ X2

(9.25)

A2 - - ,

Yl

in analogy with (9.23). Hence

IX'I) = e ( , x,A, t -IXI) -

( , xA, , t' - eo,

1 0,

c

c

with et(o',

x', t) = 1!- foo

2ny - 00

F(YI w)A(e" 0', x

f

Y

,

'YI W)

exp(iwt) dw.

(9.26)

The distant field observed in the x, t system is now obtained from

e'(x, t)

= {Jes, + (1 -

(J)(e

SI

•

v) v2 v

-

~VA c

(X' A

e"),

(9.27)

Comparison of (9.26) with (9.16) reveals that (9.26) reduces to (9.16) when the target is immobile so that v = 0, P= 1 = Y = 1'1' X = x', 0 = 0'. Also (9.26) displays evidence of the effect of motion in four places. There is a frequency shift in the incident signal waveform due to the factor YIly. The scattering amplitude A has a frequency shift caused by the presence of co, an angular distortion due to 0', x' replacing 0, x and a polarization warping because e~ substitutes for eo. An additional polarization twist also occurs in the conversion back to source coordinates through (9.27). The whole of (9.27) can be expressed in terms of x, t by means of (9.23) - (9.26). The integral in (9.26) can be written in forms analogous to (9.17) and (9.18) if desired. For further details and experimental measurements see Vogel (1991).

THE INVERSE SOURCE PROBLEM 9.6 Harmonic sources The assumption made in this section is that the observed radiation is to be accounted for by a distribution of sources placed in a homogeneous isotropic medium. Then a solution of (9.1) is sought such that j is confined to some part

SOURCE DETECTION

582

T

S Fig. 9.5. The inverse-source problem.

of the interior T of a closed surface S. The tangential components of both E and H are to be measured on S and the aim is to determine j from these data (Fig. 9.5). Now E(x) H(x)

f

= (grad div + k2 ) ~ = curl

lWB

L

T

j(xq)t/J(P, q) dx q ,

j(x q)IjJ(P, q) dx,

(9.28) (9.29)

where x or P denotes the point of observation and t/J(P, q) = exp( -ikr)/4nr with r the distance between P and q. Since j is non-zero only in a portion of T there is no loss in making the volume of integration the whole of T. The fields inside S can be exposed by means of volume and surface integrals. Although it is customary to base such a representation on retarded potentials, it will prove to be advantageous here to employ advanced potentials. The appropriate relations are E(x)

= (grad div + k2 ) x

{~f j(x q)IjJ*(P, lW6 T

- curl

H(x) =

L

Oq /\

lW6

S

Dq 1\

Ht/J*(P, q) dS q}

EIjJ*(P, q) dS q,

CUrI{L j(x q)IjJ*(P, q) dx, + ~ (grad div + k 2 ) lWJl

~f

q) dx, -

J,r

(9.30)

-1

Oq /\

Oq /\

HIjJ*(P, q) dSq}

EIjJ*(P, q)

as,

(9.31)

583


where Dq is the unit normal out of S. The surface integrals in (9.30) and (9.31) cannot be reckoned to vanish beecause advanced potentials have been utilized. They are, however, known from the measurements of the tangential components made on S. Hence E(x)

= (grad div + k2 ) ~ H(x)

= curl

t

lWB

f

T

j(xq)t/J*(P, q) dx,

j(xq)"'*(P, q) dx,

+

+ g(x),

h(x)

(9.32)

(9.33)

where g and h are known. Combining (9.32) with (9.28) and (9.33) with (9.29) we obtain (graddiv

+

curl

k2)~ f

t

lWB

T

j(xq){t/J*(P,q) - t/J(P,q)} dx,

j(xq){"'*(P, q) - ",(P, q)} dx,

+ g(x) = 0,

+ h(x) = O.

(9.34)

(9.35)

These two integral equations determine j from the given data. Notice that they are exact and do not require measurements to be made in the far field or the approximation of physical optics to be adopted. The integral equations are not independent because (9.35) is the curl of (9.34) since curl g + iWJlh = O. Nevertheless, they serve as useful verifiers of one another in numerical work where their relationship will not be exact. The kernel of (9.34) (or (9.35» is non-singular; indeed ",*(P, q) _ ",(P, q)

= ik io(kr) 21t

where io is the spherical Bessel function. Nevertheless, (9.34) is an ill-posed and ill-conditioned integral equation. Despite this fact, experiments with synthetic data (i.e. generated analytically) suggest that a tolerable if not very precise indication of the source distribution can be achieved (Bojarski 1974).

Exercises 12. Place an electric dipole at the origin and then calculate g and b if S is a sphere of radius a with centre at the origin. Then solve (9.34) and (9.35) numerically to see how well they predict the source. 13. Repeat Exercise 12 for a magnetic dipole. 14. Repeat Exercise 12 with two parallel electric dipoles on the z axis and equidistant from the origin. 15. If b = 0 and T is a sphere of radius a find the solutions of (9.35) by using the addition theorem to expand io in a series of spherical harmonics. (This provides confirmation of the non-uniqueness referred to already.)

SOURCE DETECTION

584

16. If T is a sphere of radius a, find the values of A. for which

t

j(xq)jo(kr) dx, = Aj(x)

inside T and show that they can be expressed as ta 3[{im(ka)}2 - im-l(ka)i ... + 1 (ka)]

for m = 0, 1,2 .... As m -+ 00 show that these eigenvalues tend to zero rapidly (like m- 2m - 3), thereby verifying that (9.34) and (9.35) are ill conditioned and ill posed.

If the tengential components of E and H can be observed only over part of S, say Sh and not over the remainder S2 put

g = gt

+ g2 =

r + Jr .

J

Sl

S2

Now g2 is unknown. However, it can be expressed in terms of the unknown j by finding E and H on S2 from (9.28) and (9.29). Then (9.34) becomes an integral equation of the first kind for j in which the kernel also contains an integral over S2. While the problem can be tackled in principle the integral equation is likely to become more ill behaved as SI shrinks in size. 9.7 Inhomogeneities The method of the preceding section is also adaptable to the investigation of finite regions of inhomogeneity in an otherwise homogeneous medium. Suppose, in particular, that the permittivity is e + e' where e' is non-zero only over a finite region. Excite the medium with a known wave E i , e.g. a plane wave. Then the scattered wave satisfies (9.1) with j = icos' (E i + ES). The current density j is determined from (9.34) or (9.35) and then ES can be calculated from (9.28). Since E i is known it follows that e' can be computed. Variations in u can also be coped with by using representations of the electromagnetic field in terms of magnetic current (cf. §7.2). The method does not seem to be very satisfactory from a numerical point of view and other modes of tackling the problem have been put forward (Cohen and Bleistein 1977). For scalar problems one suggestion (Bates, Boerner, and Dunlop 1976) has been the introduction of the Rytov approximation in which E = EiE' and the equation for E' handled. For stratified media where the variations are one dimensional more intricate analysis is feasible (Meyer 1975, 1975a, 1976; Bates and Wall 1976). Another technique, which is applicable to the general inverse source problem, is to expand j in a finite series of known functions such as polynomials but with coefficients to be discovered. The resulting field on S can then be calculated from (9.28) and (9.29). The coefficients are now found by minimizing the sum of the squares of the differences between the predicted and measured tangential


585

components of E and H on S. In general, this will involve an optimization technique such as conjugate gradients as described in Chapter 4. It is likely that this method will be superior numerically to the integral equation approach if the expansion functions are well chosen. Simultaneous probing at several frequencies is also possible with this approach.

Exercises 17. Repeat Exercises 12-14 using the optimization technique with the expansion functions chosen as polynomials. Do you think some other choice would give better results? 18. Choose a simple f,' and compute the field scattered from an incident plane wave. Use the values as data to determine e' by (a) integral equations, (b) the optimization technique. Which do you think is the better method? What happens if you vary the frequency? 19. In fields which depend only on z, J.l and e have the constant values J.lo and f,o except for 0 ~ z ~ 1 where s = f,oe z • Use the known form of f, to generate data for the inverse problem and see how accurate your solution to the inverse problem is. Consider a number of frequencies. Repeat the investigation for (i) f, = f,o(l + z), (ii) s = f,o(l + sin xz), (iii) s = f,o( I + sinh 2z).

9.8 Statistical considerations Many natural sources change their positions and frequencies with time. The assumption of monochromatic radiation is then inappropriate. While it would be possible to contemplate the formulation of the integral equation (9.34) in the time domain it is doubtful whether sufficient measurements could be undertaken to make the calculation of the corresponding g a practical realizability. For the inverse source problem an outgoing solution of curl E

aE =

aH = 0,

+ u-

curl H - e -

at

at

J(x, t)

is required. Assuming that J is zero outside some finite region, we have from eqn (2) of Chapter 7

a

Jl E( x,t) - - - -

fOO

at41t

-00

1 - gra d ---

J(x', t -Ix - x'l/c) d ' x Ix-x'i

foo

41t8

_

p(x', t - [x - x'l/c) d ' x

Ix

00

where the integration is over all space and div J

+ ap

at

=

o.

- x'

I

(9.36)

586

SOURCE DETECTION

If random processes are operating it will usually be necessary to consider averages for comparison with experimental results. It is customary to regard E as a typical member of an ensemble of functions which characterize the statistical properties of the process. Common practice is to suppose that the ensembles which are encountered in electromagnetism are stationary and ergodic. Roughly speaking, an ensemble is stationary if all ensemble averages are independent of the time origin. Likewise, an ergodic process is one in which each ensemble average agrees with the corresponding time average for a typical member of the ensemble. Thus, in stationary ergodic random processes attention can be restricted to time averages and the absolute position of the time origin is irrelevant. Any practical disturbance will exist only during some finite time interval, say - T ~ t ~ T. It is mathematically desirable to allow T -+ 00 so that the assumption of stationarity can be invoked. For this reason the time average is defined by

=

(h)

lim T-+oo

~ fT 2T

h(t) dt.

-T

It is clear at once that, if this limit is non-zero, J~ 00 h(t) dt must be divergent. Since we expect to deal with non-zero averages this difficulty is bound to arise and there will be technical problems with Fourier analysis. To surmount these prolems we work with truncated functions such as hT where hT

= h(t)

(It I ~ T)

=0

(It I > T)

carry out the desired procedures with hT , and then proceed to the limit as T -+ 00. Provided that suitable smoothing is accomplished by ensemble averaging, the limit customarily exists and is the same as would be achieved by direct formal Fourier analysis. Accordingly, the technicalities will be ignored in the following and Fourier transforms will be manipulated as if there were no difficulties. On this understanding the average (E(x

1,

t1

+ r), E(x 2 , t 2 + r)

can be introduced. In view of the assumption that the random process is stationary, only the time difference t 2 - t 1 will occur. It is therefore sufficient to concentrate on (E(x

1,

r). E(x

2,

t

+ r)

which is known as the cross-correlation function at x, and x 2 . When this becomes (E(x

1,

r). E(x

t,

t

+ r)

which is often called the autocorrelation function.

Xl

= X2

587


An associated normalized quantity is (E(x 1 , t). E(X2' t + r» Y12(r) = {<E2(X1, t)<E 2(X2, t)}1/2

which may be deemed a measure of the degree of coherence. As a consequence of the Schwarz inequality (§1.5)

o ~ IYI2(r)1

~

1.

If 1Y12(t)1 = 1 the disturbance is often said to be coherent, whereas if 1Y12(t)1 = 0 it is called incoherent; for other values of IY12(r)1 there is partial coherence. Write <E(x 1, t).E(x 2, t

+ r) = f~oo Gdw)

exp( -iwr) dw,

(9.37)

<E(xt> t).E(x 1, t

+ r) = f~oo G1(w) exp( -iwr) dw.

(9.38)

Then the cross-correlation and autocorrelation functions are being effectively analysed into their Fourier components. That is why G1 is termed the power spectral density and G12 the cross-power spectral density. Since most random processes are not periodic with respect to r, Gl 2 and G1 are liable to vary continuously with OJ rather than contain a number of discrete frequencies. They are obviously functions of x 1 and x 2 as well as OJ in general. By Fourier inversion G 12 (W )

= -1

foo

(E(x 1 , t).E(x 2 , t

+

foo

(E(x l , t).E(x l , t

+ r)

2n - OCJ

G1 (eo) = - 1 2n:

- 00

r) expticor) dr ,

exp(imt) dr.

(9.39) (9.40)

Since the correlation functions are real, the spectral densities must satisfy {G I 2 (m)}*

= G1 2 ( -w),

{Gt(w)}*

= G1 ( -w)

(9.41)

as may be seen directly from either (9.37) or (9.39) (cf. (9.12». On account of (9.41) the spectral density is known at negative frequencies as soon as its values at positive frequencies are available. Therefore, the mean ill is defined by _ So eoIG 1(eo)1 2 dco OJ=~----SO IGt(w)1 2 deo and the effective spectral width t1w by (L\W)2

= SO' (w

2 - w)2IG1(w)1 dw. IG t (w)12 dw

So

(9.42)

SOURCE DETECTION

588

The coherence time L\r is specified by

(L\r)2 = S~ ex> r

2\gt(r)\2

dr

(9.43)

J:>oo Igt(r)1 2 dr

where

(9.44) The mean value of r is zero because (9.41) implies, via (9.38) and (9.44), that Igt (t)1 2 is an even function of r. Now

f~ex> Igt(r)1 dr = 21t f~ex>

IGt(wW dw = 41t IX> IGt(wW dw

because the integrand is even by virtue of (9.41). Similarly

f-00

oo t 2Ig1(t)1 2 dr = 41t 100 laG (W)12 dw. _1_ 0

Hence, from (9.43),

(L\r)2 If a. is real

= So

aw

2 loGt(w)/owI dw. J~ IG1 (w)1 2 dw

(9.45)

Choosing

we derive

{Iex> (hth 2 + hth!)dwf ~4Iex> Ihl

2dw IX> Ih 12dw. 2

1

(9.46)

Make the selection

Then

Iex>

o». + h!h!) dw = [(w -

w)IG1(wW]0' -

Iex> IG1(w)1 2 dw

after an integration by parts. The contribution from the upper limit must vanish if (9.42) is to have a meaning. The lower limit does not contribute if G1(0) = 0, which will now be assumed. Hence, from (9.46),

(Iex> \G1(wW dw r·~ 4Iex> (w -

w)2IG 1(wW dw Iex>

1~;12 dw.


589

It follows from (9.42) and (9.45) that (9.47) when G1(O) = O. Thus the effective frequency range is related to the reciprocal of the time over which the signal remains coherent. The smaller the time of coherence the greater the bandwidth involved. The definitions of spectral width and coherence time are satisfactory in the sense that they are based on the measurable correlation g 1 (or its Fourier transform) rather than the electric intensity itself which may be rapidly varying. There is another feature to be noted. In any region which does not contain sources

where V2 refers to the point x 2 , because each component of E satisfies such an equation. This permits the cross-correlation function to be represented by Kirchhoff's integral and the way in which it propagates can be determined. In particular, if it and its normal derivative are known on a closed surface it can be found elsewhere provided that no source intervenes. Correspondingly

in any region where there are no sources. If G1 2 and its normal derivative were measured on a closed surface an integral equation analogous to (9.34) and (9.35) could be set up for the source term appropriate to the cross-power spectral density partial differential equation. Alternatively, the technique described at the end of the preceding section could be adopted. In either case we arrive at some indication of the average source distribution. However, this direct attack does not seem to have been explored to any extent yet. Although the correlations have been defined in terms of E. E it is straightforward to handle other quantities. For instance, the cross-correlation of a single component of the field

or that of two components

(E x (x 1 , t)Ey (x 2 , t

+ r)

(which might be relevant to cross-polarization effects) or between the electric and magnetic intensities might be considered. Only minor modifications to the foregoing analysis would be needed.

590

SOURCE DETECTION

9.9 Correlation techniques

The first scheme to be discussed is based on measurements of a single component of the electric field such as Ex. This satisfies 2 1 02 Ex oJ x 1 op V E x - - 2 - 2 = J.1. - + - - =

c

ot

ot

s

ox -q(x,t)

and q furnishes a guide to the sources. At large distances from the source distribution Ex(x, t)

~ _1_ fq(x" t _Ix 4nlxl

c

XII) dx'.

(9.48)

If now correlations are defined in terms of this sole component, as explained at the end of §9.8, at large distances

gl(O) = ({E x (X, t)}2) =

2\ 2fS(Y)d Y 16n c [x]

(9.49)

f R( Y) dY

(9 50) ·

1 - 161t 2 c4 1x12

where

S(y) =

f\

c4q ( x', t - Ix ~ x'l)q(y, t _ Ix: YI)) dx',

R(y) = \ 41tc4 Ix\E.,(x, t)q(y, t _IX: YI)). In view of the similarity of (9.49) and (9.50) to the corresponding result for a simple point source both Rand S can be thought of as the source strength per unit volume. Since they involve averages measurement of them may be feasible. However, the insertion of probes close to the sources can scarceley be avoided and the noise generated thereby may pollute the readings. Even if this were not so the correlations to be observed may well be small with consequent doubt as to their accuracy and reliability. For example, should the right-hand side of (9.49) or (9.50) turn out to be negative the observations would have to be rejected. In any case there is no certainty that Rand S are being correctly interpreted. Any function, whose integral over space was zero, could be added to R or S without invalidating (9.49) or (9.50). So there is extensive scope for alternative interpretation and there seems to be no convincing argument why the additional function should be zero rather than some other value. Correlation can also arise in connection with harmonic problems. An illustration is the finding of a local deviation in a medium which is not varying with time. Suppose that s has the constant value eo except in some finite region.

591


Then a plane wave, whose wavelength is much larger than the inhomogeneity, can be fired at the medium. The Rayleigh-Gans or Born approximation is then reasonable and the second part of (9.1) can be written as curl H - iweoE = iw(s - eo) eo exp( - ikx • i) if the x axis is picked to be along the direction of propagation of the incident wave. The distant scattered field is therefore 2

ES(x)- k exp( -iklxl) {eo _ (eo.x)x}

4neol x I

flXl -

n(y) exp{iky .(x - i)} dy

00

where n = s - eo and k 2 is now w 2 Jleo. Clearly n is zero outside the blob of deviating matter. Hence

IES(x)1 2 '"

4

2k 2

2

16n eo Ix I

leo - (e.,; x)x 12

f

-

foo 00

n(y)n(z)

- 00

x exp{ik(y - z).(x - i)} dydz 41e

'"

k

X

o - (eo· x)xl 2 2 2 16n eO Ix I

2

foo -

exp

{·k (A e)} 1 W. x - I

00

f~lXl n(x)n(w + z) dzdw.

(9.51)

The inner integral in (9.51) can be perceived as the autocorrelation function of n (though now in space rather than time) so that measurements of the far field yield some information about this function. It is perhaps most profitable to make some simple assumption about the form of the autocorrelation function, such as that it is Gaussian with adjustable parameters, with the intent of evaluating the right-hand side of (9.51). Comparison with the measured left-hand side should then sanction the determination of the parameters, optimization techniques being utilized if convenient. If tolerable agreement can be secured by adjustment of the parameters some idea can be obtained of the rate at which the autocorrelation functions falls to zero. That enables one to get some notion of the correlation between deviations from the average and gain some indication of the extent of the inhomogeneity. This procedure is likely to be especially valuable in eliciting the properties of biological specimens which are usually subject to statistical variability. 9.10 Far-field cross-correlation technique There is another way in which correlation measurements can be related to source distributions. Let the observations be made at points sufficiently distance from the sources for (9.48) to hold. Then, if Ixll = IX21, the cross-correlation

SOURCE DETECTION

592

function g12(t)

g12

is given by

= <El(X I, t)E l ( X 2, t + r) = 21 2 fOO f'" / q(y, t - IX I 16n

IXl I

-

00

- 00 \

-

c

YI)q(z, t + r

-

X I2

-

c

ZI)) dy dz

with the slight generalization that E l is an arbitrary, but fixed, component of the field and q its associated effective source. The assumption of statistical stationarity enables this to be written as

1 2 fOO 16n IXl I -

gdt) =

f'" / q(y, t)q(Z, t + r + IX

2

00

-

I

-

c

00 \

YI - IX2 - ZI)) dy dz. c

The cross-power spectral density is, from (9.39),

G12 (W)

=

1

3

32n IxII x

f~

co

2

fro fa:> -

00

exp{ -ik(IX l

YI

-

-lx 2 -

z/)}

- 00

(q(y, t)q(z, t

+ r)

expfieor) dr dy dz.

The quantity

-1 foo q(y, Z, w) exp{ -ik(lx l - YI - IX I -

zl)} dy.

(9.54)


593

"", Movable sensor

~,

/ //

/ //

,K

/

/

/ x2

\

\

\

\

\

\

\ \

EJ\

/

(/ f)

x,

Fixed sensor

Fig. 9.6. Determination of cross-powerspectral density.

The resemblance between (9.55) and (9.49) suggests that Qt(z, w) might be called the apparent source strength per unit volume at the point z, radiating at the angular frequence ca. From (9.54), Qt relies on observations at X t and so it is the apparent source strength as viewed from that point. In general, changes in the reference point x t will entail alterations in the apparent source distribution. Determining Ql from (9.55) suffers from many of the objections levelled against (9.49) and (9.50). However, there is more opportunity for manoeuvre with (9.56) because of the variation with X 2 • Suppose that there is a fixed sensor at Xl and that there is a traversing sensor at X 2 which is always maintained at the same distance from the origin as the fixed sensor (Fig. 9.6). The record of simultaneous signals at the two sensors is Fourier analysed at a given angular separation 9 by passing it through narrow-band filters which will transmit only a small range of frequencies centred on the w/2n at which information about QI is desired. The cross-correlation function of the filtered signal then leads to G12(w) as a function of 9. In ideal circumstances, values for all 9 are feasible but in practical situations the magnitude of G12 frequently diminishes with increasing angular separation so that the correlation may not be measurable when ()exceeds, say, 60°. Also the observations will probably be made at discrete rather than continuous values of O. Notwithstanding, we shall assume that Gl 2 is known as a continuous function of 0 on a circle with centre at the origin and whose circumference passes through Xl. To find Ql from Gl 2 the relationship (9.56) has to be inverted. Unfortunately, Gl 2 is known only on a circle so that there is no hope of recovering full knowledge of Qt. It therefore pays to make simplifying assumptions about the source distribution. To put it another way, the measured cross-power spectral density is accounted for by a simple, but artificial, distribution of sources. Take the plane of the circle as the (x, y) plane with Xl on the x axis. It will now be assumed that the sources are not very far from the y axis. Then (9.56) simplifies to

Gl 2(m) =

1

16n

2

2

IXl I

fro-

ql(Y, m) exp{ik(lx 2 00

-

yjl -Ixli - yjl)} dy

594

SOURCE DETECTION

where Ql(Z2,m)

= f~oo

f~oo Ql(z,m)dz

1

dZ3

and i, j are unit vectors parallel to the x and y axes respectively. It will now be postulated that the points of observation are many wavelengths from the places where qt is non-zero. This is a non-trivial assumption because the positions of the apparent sources are unknown and forcing them to lie on the y axis may oblige them to be. some distance from the genuine sources about whose rough location a fair degree of confidence may have been felt. Actually, the postulate can only be truly checked a posteriori by examining whether (a) the resulting apparent sources are, in fact, nowhere near the circle of measurement and (b) increasing Ixtlleaves the apparent sources substantially unchanged. At any rate, squares and higher powers of y will now be neglected so that Ixti - yjl ~ Xl and IX 2 - yjl ~ XI - Y sin 8. Then

Gdm) =

1

2

2

16x Ix 1 I

fOO -

Ql(Y, m) exp( -iky sin 0) dy.

(9.57)

00

Eqn (9.57) can be regarded as a Fourier transform in which k sin 8 is the transform variable. But k sin 8 ranges only from - k to k so that G12 is constrained to that interval. By defining GI 2 to be zero outside the interval (which is plausible when IGI 2 1 drops sufficiently rapidly with increasing 8) we can take a Fourier inverse which will give a q'l which will satisfy (9.57) on the interval of measurement. The pertinent formula is

(9.58) In essence q't is a diffraction-limited image of the apparent source distribution qt. Assuming that the far-field approximation is legitimate we can be sure that the resolving power is independent of Ixtl because it is necessary only to maintain the same range of 8 for differing x t. On the other hand, a disadvantage is that the method prejudges the issue by insisting that the apparent sources be located on the y axis. It is advisable to see how well (9.58) performs when the real sources are actually on the y axis. To this end let q(x, t)

= Q(y) o. Circumstances under which this is possible (Leith and Upatnieks 1963) will now be investigated. Let ur and VCbe plane waves so that

ur = B exp( -ikn f .x), C V = Cexp(-ikoc.x). Then (9.64) gives where

exp{ik(n f - nC).x}, V2(O) = at CBui exp{ - ik(n f + OC). x}, v3 (O) = {ao + a 1(I l + BB*)} exp{ -iknc.x]

Vt(O)

= atCB*u t

(9.65) (9.66) (9.67)

with x == (x, y, 0). Suppose now that Of = nC so that the reference and reconstruction waves travel in the same direction. Then, from (9.65), Vt(O) = atCB*u1 and so VI is proportional to the field created at z = 0 by the light coming from the original obstacle in the absence of any reference wave. Thus, in z > 0, VI will be a constant multiple of U1 and will appear to be propagating from a virtual image of the original object, this virtual image being in precisely the same position relative to the hologram as was the original object relative to the photographic plate.

599

HOLOGRAPHIC TECHNIQUES

Define U(/, m) by

U(/, m) = -k2 foo 4n - 00 2

then a representation for

= f~CXl

Ul(Z)

Ut

foo -

Ut(x, y, 0) exp{ik(lx

+ my)} dx dy;

(9.68)

00

valid for all z is

f~CXl U(/, m) exp{ -ik(lx + my + nz)} dl dm

(9.69)

where n2 = 1 - 12 - m", n being positive when 12 + m2 < 1 and negative imaginary when 12 + m2 > 1. In fact, we shall suppose that U(/, m) is effectively zero when 12 + m 2 > 1 so that only real n need be considered in (9.69). Eqn (9.69) is a representation in plane waves, homogeneous if 12 + m2 < 1 and inhomogeneous otherwise. Since evanescent waves do not radiate the waves with 12 + m2 < 1 are often described as the visible part of the spectrum while the inhomogeneous waves are the invisible part. Let V2 be the field which coincides with uT on z = O. Then

V2(Z) =

f~CXl f~CXl {U( -I, -m)}* exp{ -ik(lx + my + nz)} dl dm

= {f~CXl

f~CXl U(l, m) exp{ -ik(lx + my -

nz)} dl dmr

(9.70)

on account of our assumption about U. It follows from (9.69) and (9.70) that (9.71) which is a sort of reciprocity theroem. That (9.69) is valid for z > 0 and z < 0 is a consequence of U t travelling to the right near z = O. Having established (9.71) in general we now turn to (9.66) when n' = n". If the components of n' are (n~, n~, n3) 2

k 4n

-2

foo

- 00

fOO uf(x, y, 0) exp{ - 2iknr • x + ik(lx + my)} dx dy -

00

= {U(2n~

-I, 2n2 - m)}*

and hence

viz) =

alCB{f~CXl f~CXl U(I + 2n~, m + 2n2) exp{ -ik(lx + my -

nz)} dl dmr. (9.72)

We shall now make the additional assumption that U(/, m) is negligible unless

600

SOURCE DETECTION

12 + m2 « 1. Then squares of 1 and m can be ignored and (9.69) becomes ut(x, y, z)

= exp( -ikz)

I~co I~co U(l, m) exp{ -ik(lx + my)} dl dm.

(9.73)

If, in (9.72), I + 2n~ and m + 2n2 are taken as new variables n

= {I

- (1- 2n~)2 - (m ~ (v 2 + 4(ln~ + mn2)}1/2

on rejecting squares of I, m and putting v2 is not near zero, the approximation 2(ln~

2n~)2}1/2

=1-

4(n~)2 - 4(n 2)2. Whenever

v

+ mn2)

n~v+-----

v

r

is legitimate. Consequently, in these circumstances

V2(Z)

= atCB{ U{ x -

2n~ ;, y -

'k( r x exp { - 21 nix

2n2; , -;)

2)}

ikz(l - V + nr) 2 y + - - v- - .

(9.74)

Thus V2 carries much the same information for positive z that U 1 does for negative z. Since U 1 must account for the presence of the obstacle in negative z, V2 must produce a real image of the object to the right of the hologram. If most of the object lies pretty well on Z = Z l' the image will occur on z = - VZl' Apart from a factor of proportionality and a phase factor, the complex amplitudes of corresponding points of the object and image are complex conjugates of one another. However, whereas the object is disposed about the Z axis, the image is displaced from it. In fact, if the object point (x o, Yo, Zl + ~), ~ being small, corresponds to the image point (xj, yj, Zj) Xi

= Xo

-

2n~zl'

Yi

= Yo

- 2n 2z t ,

z,

=

-V(ZI

+ e).

(9.75)

These relations show firstly that the image is displaced by a distance (1 -- V2)1I21ztl from the z axis. Secondly, transverse changes in which only X o and Yo vary are reproduced exactly but longitudinal displacements in which e alters are subject to a magnification v. Thus the image is a distorted replica of the object, being squashed lengthwise (Fig. 9.9). The analysis requires that v be not small. Since v is zero when the reference wave makes an angle of 30° with the z axis, we want the direction of propagation of the reference wave to be inclined at somewhat less than 30° to the z axis. However, for a good image the angle must not be too small. The reason is that V 3 essentially reproduces the reconstruction wave and a radiating wave. The image will therefore be easily distinguished from VI (another radiating wave)

601

HOLOGRAPHIC TECHNIQUES

/ Yirtua l/ Image

I

I

/

/

/

..- ..-

---- --

Real image

O. However, Il must not be too great otherwise there will be too much of a discrepancy between the f found from (9.101) and that sought. Keep J.L fixed. The minimization will depend upon u, which will be denoted by writing fll" Variations of fH alone in J indicate that the minimizing solution must satisfy (BHDB + J.LC)fJl = BHDg. (9.102) Since the matrix operating on

fJl

is positive definite it possesses an inverse and (9.103)

Useful theoretical results can be adduced by importing the eigenfunctions and eigenvalues J.Li satisfying

«Pi

(9.104) Since C is positive definite and BHDB positive semi-definite /li ~ 0 and the can be normalized so that (§1.10) «P~C«Pi = b;j'

«p~BHDB«Pi = Ilibij.

Also any vector can be expressed as a linear combination of the

«Pi

(9.105)

Methods in Electromagnetic Wave Propagation , 2nd Edition

Methods in electromagnetic wave propagation

Parabolic equation methods for electromagnetic wave propagation

Parabolic Equation Methods for Electromagnetic Wave Propagation

Theory of electromagnetic wave propagation

Propagation of Radiowaves, 2nd Edition (Electromagnetic Waves)

Wave Propagation in Fluids

Singularities in Linear Wave Propagation

Singularities in Linear Wave Propagation

Wave Propagation in Periodic Structures

Wave propagation in elastic solids

Experimental Methods in Wave Propagation in Solids and Dynamic Viscometry

Wave Fields in Real Media: Wave Propagation in Anisotropic, Anelastic Porous and Electromagnetic Media

Electromagnetic Wave Theory

Analytical and numerical methods for wave propagation in fluid media

Electromagnetic Propagation in Multi-mode Random Media

Electromagnetic diffraction and propagation problems

Electromagnetic Waves Propagation in Complex Matter

Fundamentals of seismic wave propagation

Seismic wave propagation in stratified media

2nd Wave

Elastic wave propagation and generation in seismology

Wave Propagation in Materials for Modern Applications

Elastic Wave Propagation and Generation in Seismology

Imaging Phonons: Acoustic Wave Propagation in Solids

Fundamentals of Seismic Wave Propagation

Essentials of Radio Wave Propagation

Contact geometry and wave propagation

Spheroidal wave functions in electromagnetic theory

The Mobile Radio Propagation Channel, 2nd Edition

Methods in Electromagnetic Wave Propagation , 2nd Edition