Essentials of Mathematical Methods in Science and Engineering
S. Selguk Bayin Middle Etrst Technical University Ankurc...
308 downloads
2438 Views
23MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Essentials of Mathematical Methods in Science and Engineering
S. Selguk Bayin Middle Etrst Technical University Ankurcl, T w k q
WILEY A JOHN WILEY & SONS, INC., PUBLICATION
This Page Intentionally Left Blank
Essentials of Mathematical Methods in Science and Engineering
This Page Intentionally Left Blank
Essentials of Mathematical Methods in Science and Engineering
S. Selguk Bayin Middle Etrst Technical University Ankurcl, T w k q
WILEY A JOHN WILEY & SONS, INC., PUBLICATION
Copyright C 2008 by John Wiley & Sons, Inc. All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means. electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written peimission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Perniissions Department, John Wiley & Sons, Inc., 11 1 River Street, Hoboken, NJ 07030. (201) 748-601 I , fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental. consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (3 17) 572-3993 or fax (3 17) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiIey.com.
Library of Congress Cataloging-in-Publication Data:
Bayin, $. SelGuk, 1951Essentials of mathematical methods in science and engineering / $. SelGuk Bayin. p. cm. Includes bibliographical references and index. ISBN 978-0-470-34379- I (cloth) I . Science-Mathematics. 2. Science-Methodology. 3. Engineering mathematics. 1. Title. Q158.5.B39 2008 501'S l L d c 2 2 2008004313
Printed in the United States of America. I 0 9 8 7 6 5 4 3 2 1
To my father, Omer Bayan
This Page Intentionally Left Blank
Contents in Brief
1
FUNCT I0NA L ANALYSIS
2
VECTOR ANALYSIS
3
GENERALIZED COORDINATES and TENSORS
139
4
DETERMINANTS and MATRICES
207
5
LINEAR ALGEBRA
241
6
SEQUENCES and SERIES
303
7
COMPLEX NUMBERS and FUNCTIONS
331
8
COMPLEX ANALYSIS
369
9
0RDINARY DIFFER ENT IA L EQ UAT I 0NS
407
1
57
10 SECOND-ORDER DIFFERENTIAL EQUATIONS and SPECIAL FUNCTIONS
11 BESSEL’S EQUATION and BESSEL FUNCTIONS
12 PARTIAL DIFFERENTIAL EQUATIONS and SEPARATION of VARIABLES
469 509 541
13 FOURIER SERIES
585
14 FOURIER and LAPLACE TRANSFORMS
607
15 CALCULUS of VARIATIONS
637
16 PROBABILITY THEORY and DISTRIBUTIONS
667
17 INFORMATION THEORY
721
vii
This Page Intentionally Left Blank
CONTENTS
Preface
xxi
Acknowledgments 1
xxvii
FUNCTIONAL ANALYSIS
1
1.1 1.2 1.3 1.4 1.5 1.G 1.7 1.8 1.9 1.10 1.11 1.12 1.13
1
Concept of Function Continuity and Limits Partial Differentiation Total Differential Taylor Series Maxima and Minima of Functions Extrema of Functions with Conditions Derivatives and Differentials of Composite Functions Implicit Function Theorem Inverse Functions Integral Calculus and the Definite Integral Riernann Integral Improper Integrals
4 6 8 10
14 18 22
24 30 32 34 37 ix
X
CONTENTS
1.14 1.15 1.16 1.17 1.18 1.19
Cauchy Principal Value Integrals Integrals Involving a Parameter Limits of Integration Depending on a Parameter Double Integrals Properties of Double Integrals Triple and Multiple Integrals Problcms
VECTOR ANALYSIS
2.1 2.2 2.3 2.4
2.5
2.6
2.7
2.8
2.9 2.10
Vector Algebra: Geometric Method 2.1.1 Multiplication of Vectors Vector Algebra: Coordinate Representation Lines and Planes Vector Differential Calculus 2.4.1 Scalar Fields and Vector Fields 2.4.2 Vcctor Differentiation Gradient Operator 2.5.1 Meaning of the Gradient 2.5.2 Directional Derivative Divergence and Curl Operators 2.6.1 Meaning of Divergence and the Divergence Theorem Vector Integral Calculus in Two Dimensions 2.7.1 Arc Length and Line Integrals Surface Area and Surface Integrals 2.7.2 An Alternate Way to Write Line Integrals 2.7.3 2.7.4 Green’s Theorem 2.7.5 Interpretations of Green’s Theorem Extension to Multiply Connected Domains 2.7.6 Curl Operator and Stokes’s Theorem 2.8.1 On the Plane 2.8.2 In Space 2.8.3 Geometric Interpretation of Curl Mixed Operations with the Del Operator Potential Theory 2.10.1 Gravitational Field of a Spherically Symmetric Star 2.10.2 Work Done by Gravitational Force
40 42 46 47 49 50 51
57 57 60 62 68 70 70 72 73 74 75 77 78 83 83 87 89 91 93 94 97 97 102 105 105 108 111 112
CONTENTS
2.10.3 Path Independence and Exact Differentials 2.10.4 Gravity and Conservative Forces 2.10.5 Gravitational Potential 2.10.6 Gravitational Potential Energy of a System 2.10.7 Helmholtz Theorem 2.10.8 Applications of the Helmholtz Theorem 2.10.9 Examples from Physics Problems
3
GENERALIZED COORDINATES and TENSORS
3.1
3.2
3.3
3.4
3.5
Transformations Between Cartesian Coordinates 3.1.1 Basis Vectors and Direction Cosines Transformation Matrix and the Orthogonality 3.1.2 Relation 3.1.3 Inverse Transformation Matrix Cartesian Tensors 3.2.1 Algebraic Properties of Tensors 3.2.2 Kronecker Delta and the Permutation Symbol Generalized Coordinates 3.3.1 Coordinate Curves and Surfaces Why Upper and Lower Indices 3.3.2 General Tensors 3.4.1 Einstein Summation Convention Line Element 3.4.2 Metric Tensor 3.4.3 How to Raise and Lower Indices 3.4.4 3.4.5 Metric Tensor and the Basis Vectors Displacement Vector 3.4.6 Transformation of Scalar Functions and Line 3.4.7 Integrals 3.4.8 Area Element in Generalized Coordinates Area of a Surface 3.4.9 3.4.10 Volume Element in Generalized Coordinates 3.4.11 Invariance and Covariance Differential Operators in Generalized Coordinates 3.5.1 Gradient 3.5.2 Divergence 3.5.3 Curl
xi
114 116 118 120 122 123 127 130
139 140 140 142 144 145 148 151 154 154 159 160 163 164 164 165 166 168 169 171 173 177 178 179 179 180 182
xii
CONTENTS
3.6
4
D E T E R M I N A N T S and M A T R I C E S
4.1 4.2 4.3 4.4 4.5 4.6 4.7 -1.8 -1.9 4.10
5
3.5.4 Laplacian Orthogonal Generalized Coordinates 3.6.1 Cylindrical Coordinates 3.6.2 Spherical Coordinates Problems
Basic Definitions Operations with Matrices Subinatrix and Partitioned Matrices Systems of Linear Equations Gauss’s Method of Elimination Determinants Properties of Determinants Cramer’s Rule Iiivcrse of a Matrix Homogeneous Linear Equations Problems
LINEAR ALGEBRA
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.11 5.15 5.16 5.17 5.18
Fields and Vector Spaces Linear Combinations, Generators, and Bases Coniponents Linear Transformations Matrix Representation of Transformations Algebra of Transformations Change of Basis Irivariants Under Similarity Transformations Eigenvalues and Eigenvectors Moment of Inertia Tensor Inner product Spaces The Inner Product Orthogonality and Completeness Gram -Schmidt Ort hogonalization Eigenvalue Problem for Real Symmetric Matrices Prcsciice of Degenerate Eigenvalues CJiiatlratic Forms Herniitian bIatrices
186 186 187 193 198 207 207 208 214 216 217 22 1 223 226 230 233 234
241 241 244 246 249 250 252 254 256 256 265 270 271 274 276 277 278 285 289
CONTENTS
5.19 5.20 5.21 5.22
6
SEQUENCES and SERIES
6.1 6.2 6.3
6.4 6.5 6.6 6.7 6.8 6.9 6.10
7
Matrix Representation of Linear Transformations Functions of Matrices Function Space and Hilbert Space Dirac’s Bra and Ket vectors Problems
Sequences Infinite Series Absolute and Conditional Convergence 6.3.1 Comparison Test 6.3.2 Limit Comparison Test 6.3.3 Integral Test 6.3.4 Ratio Test 6.3.5 Root Test Operations with Series Sequences and Series of Functions Ail-Test for Uniform Convergence Properties of Uniformly Convergent Series Power Series Taylor Series and Maclaurin Series Indeterminate Forms and Series Problems
COMPLEX NUMBERS and FUNCTIONS
7.1 7.2 7.3 7.3 7.5 7.G 7.7 7.8 7.9 7.10
The Algebra of Complex Numbers Roots of a Complex Number Infinity and the Extended Complex Plane Complex Functions Limits and Continuity Differentiation in the Complex Plane Analytic Functions Harmonic Functions Basic Differentiation Formulas Elementary Functions 7.10.1 Polynomials 7.10.2 Exponential Function 7.10.3 Trigonometric Functions
xiii
293 294 296 297 298 303 304 308 309 309 309 309 310 310 314 316 318 319 32 1 324 324 326
331 332 336 339 342 344 345 349 350 352 353 353 354 356
xiv
CONTENTS
7.10.4 Hyperbolic Functions 7.10.5 Logarithmic Function 7.10.6 Powers of Complex Numbers 7.10.7 Inverse Trigonometric Functions Problems
8
CO MPL EX ANALYSIS 8.1 8.2 8.3 8.4
8.5 8.6 8.7
8.8
8.9 8.10 8.11
9
Contour Integrals Types of Contours The Caucl-iy-Goursat Theorem Iiidefinit e Integrals Simply and Multiply Connected Domains The Cauchy Integral Formula Derivatives of Analytic Functions Coniplex Power Series 8.8.1 Taylor Series with the Remainder 8.8.2 Laurent Series with the Remainder Convergelice of Power Series Classification of Singular Points Residue Theorem Problems
0 R DI N A R Y DIFFER ENTIA L EQ UAT1 0 NS 9.1 9.2 9.3
9.4
Basic Definitions for Ordinary Differential Equations First-Order Differential Equations First-Order Differential Equations: Methods of Solution 9.3.1 Dependent Variable Is Missing 9.3.2 Independent Variable Is Missing The Case of Separable f ( z ,y) 9.3.3 9.3.4 Homogeneous f ( ~y), of Zeroth Degree 9.3.5 Solution When f ( z ,y) Is a Rational Function 9.3.6 Linear Equations of First-Order 9.3.7 Exact Equations 9.3.8 Integrating Factors 9.3.9 Bernoulli Equation 9.3.10 Riccati Equation 9.3.11 Equations That Cannot Be Solved for y’ Second-Order Differential Equations
357 358 359 362 362
369 370 372 376 379 381 381 384 385 385 389 393 394 397 40 1
407 408 410 412 412 412 412 413 413 416 417 419 423 424 426 429
9.5
9.6
9.7 9.8
10
CONTENTS
xv
Second-Order Differential Equations: Methods of Solution 9.5.1 Linear Homogeneous Equations with Constant Coefficients 9.5.2 Operator Approach 9.5.3 Linear Homogeneous Equations with Variable Coefficients 9.5.4 Cauchy -Euler Equation 9.5.5 Exact Equations and Integrating Factors 9.5.6 Linear Nonhomogeneous Equations 9.5.7 Variation of Parameters 9.5.8 Method of Undetermined Coefficients Linear Differential Equations of Higher Order 9.6.1 With Constant Coefficients 9.6.2 With Variable Coefficients 9.6.3 Nonhomogeneous Equations Initial Value Problem and Uniqueness of the Solution Series Solutions: Froberiius Method 9.8.1 Frobenius Method and First-Order Equations Problems
430 431 437 438 44 1 442 444 445 446 450 450 451 451 452 452 462 463
SECOND-ORDER DIFFERENTIAL EQUATIONS and SPECIAL 469 FUNCTIONS
10.1
10.2
Legendre Equation 10.1.1 Series Solution 10.1.2 Effect of Boundary Conditions 10.1.3 Legendre Polynomials 10.1.4 Rodriguez Formula 10.1.5 Generating Function 10.1.6 Special Values 10.1.7 Recursion Relations 10.1.8 Orthogonality 10.1.9 Legendre Series Hermite Equation 10.2.1 Series Solution 10.2.2 Hermite Polynomials 10.2.3 Contour Integral Representation 10.2.4 Rodriguez Formula 10.2.5 Generating Function
4 70 470 473 474 477 4 78 480 48 1 482 484 487 487 491 492 493 494
xvi
CONTENTS
10.3
11
BESSEL’S EQUATION and BESSEL FUNCTIONS
11.1
11.2
12
10.2.6 Special Values 10.2.7 Recursion Relations 10.2.8 Orthogonality 10.2.9 Series Expansions in Hermite Polynomials Laguerre Equation 10.3.1 Series Solution 10.3.2 Laguerre Polynomials 10.3.3 Contour Integral Representation 10.3.4 Rodriguez Formula 10.3.5 Generating Function 10.3.6 Special Values and Recursion Relations 10.3.7 Orthogonality 10.3.8 Series Expansions in Laguerre Polynomials Problems
Bessel’s Equation and Its Series Solution 11.1.1 Bessel Functions J*,(z), N,(z), and H:’”(x) 11.1.2 Recursion Relations 11.1.3 Generating Function 11.1.4 Integral Definitions 11.1.5 Linear Independence of Bessel Functions 11.1.6 Modified Bessel Functions I m ( z )and K,(z) 11.1.7 Spherical Bessel Functions jl(x),nl(z), and h1(1’2)(x) Orthogonality and the Roots of Bessel Functions 11.2.1 Expansion Theorem 11.2.2 Boundary Conditions for the Bessel Functions Problems
495 495 496 499 500 500 502 502 503 504 504 505 506 507
509 510 514 518 519 521 522 523 525 527 531 531 535
PARTIAL DIFFERENTIAL EQUATIONS and SEPARATION of VARIABLES 541
12.1
12.2
Separation of Variables in Cartesian Coordinates 12.1.1 Wave Equation 12.1.2 Laplace Equation 12.1.3 Diffusion and Heat Flow Equations Separation of Variables in Spherical Coordinates 12.2.1 Laplace Equ at ion ’
542 544 546 550 553 557
CONTENTS
12.3
13
14
12.2.2 Boundary Conditions for a Spherical Boundary 12.2.3 Helmholtz Equation 12.2.4 Wave Equation 12.2.5 Diffusion and Heat Flow Equations 12.2.6 Time-Independent Schrodinger Equation 12.2.7 Time-Dependent Schrodinger Equation Separation of Variables in Cylindrical Coordinates 12.3.1 Laplace Equation 12.3.2 Helmholtz Equation 12.3.3 Wave Equation 12.3.4 Diffusion and Heat Flow Equations Problems
xvii
558 563 563 564 565 566 567 569 570 570 572 580
FOURIER SERIES
585
13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8
Orthogonal Systems of Functions Fourier Series Exponential Forni of the Fourier Series Convergence of Fourier Series Sufficient Conditions for Convergence The Fundamental Theorem Uniqueness of Fourier Series Examples of Fourier Series 13.8.1 Square Wave 13.8.2 Triangular Wave 13.8.3 Periodic Extension 13.9 Fourier Sine and Cosine Series 13.10 Change of Interval 13.11 Integration and Differentiation of Fourier Series Problems
585 59 1 592 593 595 596 597 597 597 599 600 601 602 603 604
FOURIER and LAPLACE TRANSFORMS
607
14.1 14.2 14.3 14.4 14.5 14.6 14.7
Types of Signals Spectral Analysis and Fourier Transforms Correlation with Cosines and Sines Correlation Functions and Fourier Transforms Inverse Fourier Transform Frequency Spectrums Dirac-Delta Function
607 610 611 615 615 617 618
xviii
CONTENTS
14.8 14.9 14.10 14.11 14.12 14.13
15
General Fourier Transforms and Their Properties Basic Definition of Laplace Transform Diffcrcntial Equations arid Laplace Transforms Transfer Functions and Signal Processors Coririectiori of Signal Processors Problems
CALCULUS of VARIATIONS 15.1 15.2 15.3 15.4 15.5 15.6 15.7 15.8 15.9 15.10
16
A Case with Two Cosines
A Siiiiple Case Variational Analysis Alternate Form of Euler Equation Variational Notation A Nore General Case Hamilton’s Principle Lagrange’s Equations of Motion Definition of Lagrangian Prescrice of Constraints in Dynamical Systems Conservation Laws Problems
PROBABILITY T H E O R Y and DISTRIBUTIONS 16.1
16.2
Introduction t o Probability Theory 16.1.1 Fundamental Concepts 16.1.2 Basic Axioms of Probability 16.1.3 Basic Theorems of Probability 16.1.4 Statistical Definition of Probability 16.1.5 Conditional Probability and Multiplication Theorem 16.1.6 Bayes’ Theorem 16.1.7 Geometric Probability and Buffon’s Needle Problem Permutations and Combinations 16.2.1 The Case of Distinguishable Balls with Replacement 16.2.2 The Case of Distinguishable Balls Without Replacement 16.2.3 The Case of Indistinguishable Balls
619 620 622 625 627 629 632 637 638 639 642 645 647 65 1 653 657 659 662 663 667 668 668 669 669 672 673 674 677 678 678 679 680
CONTENTS
16.2.4 Binomial and Multinomial Coefficients Applications to Statistical Mechanics 16.3.1 Boltzmann Distribution for Solids 16.3.2 Boltzmann Distribution for Gases 16.3.3 Bose-Einstein Distribution for Perfect Gases 16.3.4 Fermi -Dirac Distribution 16.4 Statistical Mechanics and Thermodynamics 16.4.1 Probability and Entropy 16.4.2 Derivation of p 16.5 Random Variables and Distributions 16.6 Distribution Functions and Probability 16.7 Examples of Continuous Distributions 16.7.1 Uniform Distribution 16.7.2 Gaussian or Normal Distribution 16.7.3 Gamma Distribution 16.8 Discrete Probability Distributions 163.1 Uniform Distribution 16.8.2 Binomial Distribution 16.8.3 Poisson Distribution 16.9 Fundamental Theorem of Averages 16.10 Moments of Distribution Functions 16.10.1 Moments of the Gaussian Distribution 16.10.2 Moments of the Binomial Distribution 16.10.3 Moments of the Poisson Distribution 16.11 Chebyshev’s Theorem 16.12 Law of Large Numbers Problenis
681 682 684 686 687 688 689 689 691 693 696 698 698 699 699 700 70 1 701 703 704 705 706 707 708 710 712 713
INFORMATION THEORY
721
16.3
17
xix
17.1 17.2
Elements of Information Processing Mechanisms Classical Information Theory 17.2.1 Prior Uncertainty and Entropy of Information 17.2.2 Joint and Conditional Entropies of Information 17.2.3 Decision Theory 17.2.4 Decision Theory and Game Theory 17.2.5 Traveler’s Dilemma and Nash Equilibrium 17.2.6 Classical Bit or Cbit 17.2.7 Operations on Cbits
724 726 729 731 735 736 742 746 750
XX
CONTENTS
17.3
Quantum Information Theory 17.3.1 Basic Quantum Theory 17.3.2 Single-Particle Systems and Quantum Information 17.3.3 Mach- Zchnder Interferometer 17.3.4 Mathematics of the Mach-Zehnder Interferometer 17.3.5 Quantum Bit or Qbit 17.3.6 The No-Cloning Theorem 17.3.7 Entanglement and Bell States 17.3.8 Quantum Dense Coding 17.3.9 Quantum Teleportation Problems
752 752 758 760 763 767 770 771 776 777 780 787
I11dcs
793
Prefa ce
After a year of freshman calculus, the basic mathematics training in science and engineering is accomplished during the second and third years of college education. Students are usually required to take a sequence of three courses on the subjects of advanced calculus, differential equations, complex calculus, and introductory mathematical physics. The majority of science and engineering departments today are finding it convenient t o use a single book that assures uniform formalism and a topical coverage in tune with their average needs. The objective of Essentials of Mathematical Methods in Science and Engineering is to equip students with the basic mathematical skills that are required by the majority of science and engineering undergraduate programs. Some of the basic courses taught in these programs are on the subjects of classical electrodynamics, classical mechanics, statistical mechanics, thermodynamics, modern physics, quantum mechanics, and relativity. The entire book contains a sufficient amount of material for a three-semester course meeting three or four hours a week. All this being said, respecting the disparity of the mathematics courses taught throughout the world, the topical coverage and the modular structure of the book make it versatile enough to be adopted for a number of mathematics courses and allows instructors the flexibility to individualize their own teaching while maintaining the integrity xxi
xxii
PREFACE
of the discussions in the book for their students.
About the Book
We give a coherent treatment of the selected topics with a style that makes the essential mathematical skills easily accessible to a multidisciplinary audience. Sirice t,he book is written in modular format, each chapter covers its subject thoroughly and thus can be read independently. This makes the book very useful as a reference or refresher for scientists. It is assumed that the reader has been exposed to two semesters of freshman calculus, which is usually taught, at the level of Thomas’ Calculus by Thomas, Jr. and Finney, or has acquired an equivalent level of mathematical maturity. The derivations and discussions are usually presented in sufficient detail so that the reader can follow the mathematics without much pause. Occasionally, when the proofs get t,oo technical for our purposes, we quote them without proof but refer to an appropriate book. All t,he references are collected at the back in alphabetical order with their full titles. Whenever there is credit due or some special reference worth pointing out, it is cited within the text. However, most of the references in our list are included as extra resources for the interested reader who wants to dwell on these topics further. Along with these references, students and researchers can use the websites http://en.wikipedia.org and http://scienceworld.wolfram.com/ for further resources. Of course, the website litt,p://lanl.arxiv.org/ is an indipensible tool for researchers on any subject,. This book concentrates on the analytic techniques. Computer programs like MathematicaO and MapleTh’arc capable of performing symbolic as well as numerical calculations. Even though they are extremely useful to scientists, one still needs a full grasp of the basic mathematical techniques to produce the desired result and to interpret it correctly. There are books specifically writt,eri for niatheniatical methods with these programs. The books by Kelly on Matheniatica and by Wang on Maple are included in our list of references at the back.
Summary of the Book
Chapter 1. Functional Analysis: This chapter aims to fill the gap between the introductory calculus and advanced mathematical analysis courses. It introduces the basic techniques that are used throughout mathematics. Limits, derivatives, integrals, extremum of functions, implicit function theorem, inverse functions, and improper integrals are among the topics discussed. Chapter 2. Vector Analysis: Since most of the classical theories can
PREFACE
xxiii
tie introduced in terms of vectors, we present a rather detailed treatment of vectors and their techniques. Vector algebra, vector differentiation, gradient, divergence and curl operators, vector integration, Green’s theorem, integral theorems, and the essential elements of the potential theory are among the topics discussed. Chapter 3. Generalized Coordinates and Tensors: Starting with the Cartesian coordinates, we discuss generalized coordinate systems and their transformations. Basis vectors, transformation matrix, line element, reciprocal basis vectors, covariant and contravariant components, differential operators in generalized coordinates, and introduction t o Cartesian and general tensors are among the other essential topics of mathematical methods. Chapter 4. Determinants and Matrices: A systematic treatment of the basic properties and methods of determinants and matrices that are much needed in science and engineering applications are presented here with examples. Chapter 5. Linear Algebra: We start with a discussion of abstract linear spaces, also called vector spaces, and then continue with systems of linear equations, inner product spaces, eigenvalue problems, quadratic forms, Hermitian matrices, and Dirac’s bra and ket vectors. Chapter 6. Sequences and Series: This chapter starts with sequences and series of numbers and then introduces absolute convergence and tests for convergence. We then extend our discussion to series of functions and introduce the concept of uniform convergence. Power series and Taylor series are discussed in detail with applications. Chapter 7. Complex Numbers and Functions: After the complex number system is introduced and their algebra is discussed, complex functions, complex differentiation, Cauchy-Riemann conditions and analytic functions are the main topics of this chapter. Chapter 8. Complex Analysis: We introduce the complex integral theorems and discuss residues, Taylor series and Laurent series along with their convergence properties. Chapter 9. Ordinary Differential Equations: We start with the general properties of differential equations, their solutions and their boundary conditions. Most commonly encountered differential equations in applications are either first- or second-order. Hence, we discuss these two cases separately in detail and introduce methods of finding their analytic solutions. We also study linear equations of higher order. We finally conclude with the Frobenius method applied to first- and second-order differential equations with interesting and carefully selected examples. Chapter 10. Second-Order Differential Equations and Special Functions: In this chapter, we discuss three of the most frequently encountered second-order differential equations of physics and engineering, that is, Legendre, Hermite, and Laguerre equations. We study these equations in detail from the viewpoint of the Frobenius method. By using the boundary conditions, we then show how the corresponding orthogonal polynomial sets
xxiv
PREFACE
arc constriictcd. We also discuss how and under what conditions these polynomial sets can be used to represent a general solution. Chapter 11. Bessel’s Equation and Bessel Functions: Bessel functions are among t,lie most frequently used special functions of mathernatical physics. Siiice their orthogonality is with respect to their roots and not with respect to it parameter in the differential equation, they are discussed here sepa,rately in great detail. Chapter 12. Partial Differential Equations and Separation of Variables: Most of the second-order ordinary differential equations of physics and engineering are obtained from partial differential equations via the method of separation of variables. We introduce the most commonly encountered partial differential equations of physics and engineering and show how the method of separation of variables is used in Cartesian, spherical, and cylindrical coordinates. Interesting examples help the reader connect with the knowledge gained in the previous three chapters. Chapter 13. Fourier Series: We first introduce orthogonal systems of functions and then concentrate on trigonometric Fourier series. We discuss their convergence and uniqueness properties along with specific examples. Chapter 14. Fourier and Laplace Transforms: After a basic introduction t,o signal analysis and correlation functions, we introduce the Fourier transforms and their inverses. We also introduce Laplace transforms and their applicat,ions to differential equations. We discuss met hods of finding inverse Lapla.cc transforms and their applications to transfer functions and signal proccssors. Chapter 15. Calculus of Variations: We introduce basic variational analysis for different types of boundary conditions. Applications to Hamilton‘s principle and to Lagrangian mechanics is investigated in detail. The presciicc of const,raiiits in dynaniical systems along with the inverse problem are discusscd with examples. Chapter 16. Probability Theory and Distributions: Some of the interest,ing t,opics covered in this chapter include the basic theory of probability, permutations and combinations, applications to statistical mechanics, and the connection with thermodynamics. We also discuss Bayes’ theorem, random variables, distributions, distribution functions and probability, fundamental theorem of averages, moments, Chebyshev’s theorem, and the law of large numbers. Chapter 17. Information Theory: The first part of this chapter is devoted to classical information theory, where we discuss topics from Shannon‘s tlieory, dccision theory, game theory, Nash equilibrium, and traveler’s dileninia. The definition of Cbits and operations with them are also introduced. Thc second part of this chapter is on quantum information theory. After a general survey of quantum mechanics, we discuss Mach-Zehnder interferometer, Qbits, entanglement, and Bell states. Along with the no-cloning theorem. quantum cryptology, quantum dense coding, and quantum teleportation arc amoiig the other interesting topics discussed in this chapter. This
PREFACE
XXV
chapter is written with a style that makes these interesting topics accessible to a wide range of audiences with minimum prior exposure t o quantum mechanics.
Course Suggestions Chapters 1-15 consist of the contents of the three, usually sequentially taught, core mathematical methods courses meeting 3-4 hours a week that most science and engineering departments require. These chapters consist of the basic mathematical skils needed for the majority of undergraduate science and engineering courses. Chapters 1-8 can be taught during the second year as a two-semester course. During the first or the second semester of the third year, a course composed of the Chapters 9-15 can complete the sequence. Chapters 9 through 12 can also be used in a separate one-semester course on differential equations and special functions. The two extensive chapters on probability theory and information theory (Chapters 16 and 17) are among the special chapters of the book. Even though most of the mathematical methods textbooks have chapters on probability, we have treated the subject with a style and level that prepares the reader for the following chapter on information theory. We have also included sections on applications to statistical mechanics and thermodynamics. The chapter on information theory is unusual for the mathematical methods textbooks at both the graduate and the undergraduate levels. By selecting certain sections, Chapters 16 and 17 can be incorporated into the advanced undergraduate curriculum. In their entirety, they are more suitable t o be used in a graduate course. Since we review the basic quantum mechanics needed, we require no prior exposure to quantum mechanics. In this regard, Chapter 17 is also designed to be useful to beginning researchers from a wide range of disciplines in science and engineering. Even though it is not meant to be complete, we have a rich list of references a t the back on probability theory, decision theory, game theory, and classical and quantum information theories. Others can be traced from these. Examples and exercises are always an integral part of any learning process, hence the topics are introduced with an ample number of examples. To maintain continuity of the discussions, we have collected excercises at the end of each chapter, where they are predominantly listed in the same order that they are discussed within the text. Occasionally, when proofs or extensions of certain results are too technical to be discussed within the text, they are assigned as exercises. Hence, it is recomended that the entire problem sections be read quickly before their solutions are attempted. Parts of this book are based on my lectures delivered at Canisius College, Buffalo, NY, during the years 1984-1986 and the Middle East Technical University, Ankara, Turkey, on various occasions. With their exclusive chap-
xxvi
PREFACE
ters, uniform level of formalism and coordinated, and complenientary coverage of topics, Essentzals of Mathematacal Methods an Scsence and Enganeerang connects with rny graduate textbook, Mathematzcal Methods zn Scaence and Engzneering, thus forming a complete set spanning a wide range of basic mathematical techniques for students, instructors, and researchers. For communications about the book and for some relevant sites to our readers, we will usc the website http://www.physics.metu.edu.tr/" bayin.
5.
Selquk Bayin ODTU Ankara, Turkey April 2008
Ac knowIedgment s
I would like to thank Prof. J.P. Krisch of the University of Michigan for always being there whenever I needed advice and for sharing my excitement at all phases of the project. My special thanks go to Prof. J.C. Lauffenburger and Assoc. Prof. K.D. Scherkoske at Canisius College. I am grateful to Prof. R.P. Langlands of the Institute for Advanced Study at Princeton for his support and for his cordial and enduring contributions t o METU culture. I am indebted to Prof. P.G.L. Leach for his insightful comments and for meticulously reading two of the chapters. I am grateful to Wiley for a grant to prepare the camera-ready copy, and I would like to thank my editor Susanne SteitzFiller for sharing my excitement. My work on the two books Mathematical Meth,ods in Science and Engineering and Essentials of Mathematical Methods in Science and Engineering has spanned an uninterrupted period of 6 years. With the time spent on my two books in Turkish published in the years 2000 and 2004, which were basically the forerunners of my first book, this project has dominated my life for almost a decade. In this regard, I cannot express enough gratitude to my darling young scientist daughter Sumru and beloved wife Adalet, for always being there for me during this long and strenuous journey, which also involved many sacrifices for them.
8.S.B. xxvii
This Page Intentionally Left Blank
CHAPTER 1
FUNCTIONAL ANALYSIS
A function is basically a rule that relates the members of one set of objects to the members of another set. In this regard, it has a very wide range of applications in both science and mathematics. Functional analysis is basically the branch of mathematics that deals with the functions of numbers. In this chapter, we confine ourselves t o the real domain and introduce some of the most commonly used techniques in functional analysis. 1.1 CONCEPT OF FUNCTION We start with a quick review of the basic concepts of set theory. Let S be a set of objects of any kind: points, numbers, functions, vectors, etc. When s is an element of the set S , we show it as s E
s.
(1.1)
For finite sets we may define S by listing its elements as SE
{Sl,SZ,...
>%I-.
Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
(1.2)
9. S e l p k Bayin
1
2
FUNCTIONAL ANALYSIS
For infinite sets, S is usually defined by a phrase describing the condition to be a member of the set, for example,
S = {All points on the sphere of radius R } .
(1.3)
When there is no room for confusion, we may also write a n infinite set as
S = { l , 3 , 5 ,." } .
(1.4)
When each member of a set A is also a member of set B, we say that A is a subset of B and write
A
c B.
(1.5)
The phrase B covers or contains A is also used. The union of two sets,
A
U B,
(1.6)
consists of the elements of both A and B. The intersection of two sets, A and B,is defined as
A n B = {All elements common t o A and B } .
(1.7)
When two sets have no common element, their intersection is called the null set or the empty set, which is usually shown by 4. The neighborhood of a point, ( x l , y l ) , in the zy-plane is the set of all points, ( x , y ) , inside a circle centered at (zl, y1) and with the radius 6:
An open set is defined as the set of points with neighborhoods entirely within the set. The interior of a circle defined by
x2 + y2 < 1
(1.9)
is an open set. A boundary point is a point whose every neighborhood contains at least one point in the set and at least one point that does not belong to the set. The boundary of the set in Equation (1.9) is the set of points on the circumference, that is,
x2 + y2
= 1.
(1.10)
An open set plus its boundary is a closed set. A function, f , is in general a rule, a relation that uniquely associates members of one set, A , with the members of another set, B. The concept of function is essentially the same as that of mapping, which in general is so broad that it allows mathematicians to work with them without any resemblance to the simple class of functions with numerical values. The set
CONCEPT OF FUNCTION
3
A that f acts upon is called the domain, and the set B composed of the elements that f can produce is called the range. For single-valued functions the common notation used is f :z
+f(z).
(1.11)
Here f stands for the function or mapping that acts upon a single number z, which is an element of the domain, and produces f ( z ) ,which is an element of the range. In general, f refers to the function itself and f ( x ) refers to the value it returns. However, in practice, f ( z ) is also used t o refer to the function itself. In this chapter we basically concern ourselves with functions that take numerical values as f ( z ) ,where the argument, z, is called the independent variable. We usually define a new variable, y, as
which is called the dependent variable. Functions with multiple variables, that is, multivariate functions, can also be defined. For example, for each point (z,y) in some region of the zy-plane we may assign a unique real number, f ( z ,y), according t o the rule
We now say that f (z, y) is a function with two independent variables, 2 and y. In applications, f ( z ,y) may represent physical properties like the temperature or the density distribution of a flat disc with negligible thickness. Definition of function can be extended to cases with several independent variables as
where rL stands for the number of independent variables. The term function is also used for the objects that associate more than one element in the domain to a single element in the range. Such objects are called multiple-to-one relations. For example,
f ( z ,y) = 2zy + x2: f ( z ) = sinz: f ( z ,y) = z + x2: f ( z ) = x 2 , z # 0: f(z, y) = sin zy:
single-valued or one-to-one, many- t o-one, single-valued, two-to-one, many-to-one.
Sometimes the term “function” is also used for relations that map a single point in its domain to multiple points in its range. As we shall discuss in Chapters 7 and 8, such functions are called multivalued functions, which are predominantly encountered in complex analysis.
4
1.2
FUNCTIONAL ANALYSIS
CONTINUITY AND LIMITS
Similar to its usage in everyday language, the word continuity in mathematics also implies the absence of abrupt changes. In astrophysics, pressure and density distributions inside a solid neutron star are represented by continuous functions of the radial position: P ( r ) and p ( r ) , respectively. This means that small changes in the radial position inside the star also result in small changes in the pressure and density. At the surface, r = R, where the star meets the outside vacuum, pressure has to be continuous. Otherwise, there will be a net force on the surface layer, which will violate the static equilibrium condition. In this regard, in static neutron star models pressure has t o be a monotonic decreasing function of T , which smoothly drops to zero at the surface:
P ( R ) = 0.
(1.15)
On the other hand, the density at the surface can change abruptly from a finite value to zero. This is also in line with our everyday experiences, where solid objects have sharp contours marked by density discontinuities. For gaseous stars, both pressure and density have t o vanish continuously at the surface. In constructing physical models, deciding on which parameters are going to be taken as continuous at the boundaries requires physical reasoning and some insight. Usually, a collection of rules that have to be obeyed at the boundaries are called the junction conditions or the boundary conditions. We are now ready to give a formal definition of continuity as follows: Continuity: A numerically valued function f ( z ) defined in some domain D , is said to be continuous at the point 20 E D if, for any positive number E > 0, there is a neighborhood N about z g such that If(.) - f(zo)l < E for every point common to both N and D , that is N fl D. If the function f ( z ) is continuous at every point of D , we say it is continuous in D. We finally quote two theorems, proofs of which can be found in books on advanced calculus: Theorem 1.1. Let f ( z ) be a continuous function at z and let {z,} be a sequence of points in the domain of f ( z ) with the limit lim zn
n+cc
42 ;
(1.16)
then the following is true:
(1.17) Theorem 1.2. For a function f(x) defined in D , if the limit (1.18) exists whenever x, E D and lim
n-cc
2, +z E
D,
(1.19)
CONTINUITY AND LIMITS
5
then the function f(z)is continuous a t z. For the limit in Equation (1.18) to exist, it is sufficient t o show that the right and the left limits agree, that is, lim f(z- E )
E’O
=
+E),
(1.20)
f(.).
(1.21)
lim f(z
E+O
f (z-1 = f(.+)
=
In practice, the second theorem is more useful in showing that a given function is continuous. If a function is discontinuous at a finite number of points in its interval of definition, [z,,zb], it is called piecewise continuous. Generalization of these theorems t o multivariate functions is easily accomplished by taking z to represent a point in a space with n independent variables as
(1.22)
z = (z1, 5 2 , . . . ,z n ) .
However, with more than one independent variable one has to be careful. Consider the simple function
(1.23) which is finite at the origin. Depending on the direction of approach to the origin, f(h,y) takes different values: lim(z,y)+(o,o)f(z,y) lim(z,y)+(o,o)f(z,y) lim(z,y~+(o,o) f ( z ,y)
+0
if we approach along the y = z line, 1 if we approach along the z axis, + -1 if we approach along the y axis. +
Hence the limit lim(z,y)+(o,o) f(s, y) does not exist and the function f ( z , y ) is not continuous at the origin. Limits: Basic properties of limits, which we give for functions with two variables, also hold for a general multivariate function: Let u = f ( z , y ) and ‘u = g(z,y) be two functions defined in the domain D of the zy-plane. If the limits
exist, then we can write =
fo + go,
(1.26)
= f o . go, =
fo
-, go
go
(1.25)
# 0.
(1.27)
6
FUNCTIONAL ANALYSIS
If the functions f ( x , y ) and g ( x , y ) are continuous a t (xo,yo), then the functions
are also continuous at (zo, yo), provided that in the last case g(x,y) is different from zero at ( 5 0 yo). , Let F ( u , v ) be a continuous function defined in some domain Do of the uv-plane and let F ( f ( z ,y), g(x,y)) be defined for (x,y) in D. Then, if ( f o , go) is in Do, we can write (1.29) If f ( x ,y) and g(x,y) are continuous at ( 2 0 , yo), then so is F ( f ( z ,y ) , g(x,y)). In evaluating limits of functions that can be expressed as ratios, L’HBpital’s rule is very useful. L’Hbpital’s rule: Let f and g be differentiable functions on the interval a 5 x < b with g’(z) # 0 there, where the upper limit b could be finite or infinite. If f and g have the limits lim f(x) = 0 and lim g(x) = 0
(1.30)
lim f(x) = 03 and lim g ( x ) = 00,
(1.31)
f’(x) = L lim -
(1.32)
x-b
x-b
or x-b
x-b
and if the limit x+b
g’(Z)
exists, where L could be zero or infinity, then = L.
1.3
(1.33)
PARTIAL DIFFERENTIATION
A necessary and sufficient condition for the derivative of f(x) to exist at xo is that the left, f L ( z ~ )and , the right, f$(xo), derivatives exist and be equal (Fig. 1.1),that is, fL(It.0) =
f’(Zo),
(1.34)
where (1.35) (1.36)
7
PARTIAL DIFFERENTIATION
t'
line.
Figure
When the derivative exists, we always mean a finite derivative. If f ( x ) has derivative a t xo, it means that it is continuous a t that point. When the derivative of f (x)exists a t every point in the interval (a, b ) , we say that f (x) is differentiable in (a, b ) and write its derivative as (1.37) Geometrically, derivative a t a point is the slope of the tangent line at that point:
(1.38) When a function depends upon two variables: z
=
f (z, Y),
(1.39)
the partial derivative with respect to x at (xo,yo) is defined as the limit lim Ax-0
f
(20
+ Ax,Yo) Ax
-
f ( 2 0 , Yo)
(1.40)
and we show it as in one of the following forms:
Similarly, the partial derivative with respect to y at
( 2 0 yo) ,
is defined as
(1.42)
8
FUNCTIONAL ANALYSIS
A geometric interpretation of the partial derivative is that the section of the surface z = f ( x , y ) with the plane y = yo is the curve z = f(x,yo); hence the partial derivative ~ ( X yo) O , is the slope of the tangent line (Fig. 1.2) t o z = f (x, yo) at ( 2 0 ,yo). Similarly, the partial derivative ~ ( x oyo) , is the slope of the tangent line to the curve z = f(x0, y) at (XO, yo). For a multivariate function the partial derivative with respect to the i t h independent variable is defined as df(X1,.
=
. . , x i , . . . , xn) 8x2 f(x1,. . . , x i
lim
+ Axz,. . . , x n )
-
f ( x 1 , . . . , x i , .. .
AX^
A z , -0
'xn).
(1.43)
For a given f ( x , y) the partial derivatives fz and fy are functions of x and y and they also have partial derivatives which are written as f
--=--
22 -
a2f ax2
YI
--=.["i d2f
f xy
When f C c gand
- dxdy
fyz
dx dy
["I
- y dy=dy-' -
y y - a2f d
ax ax
I
f
yx -
dydx a2f
-
i"
dy a x
(1.44) (1.45)
are continuous at (20,yo), then the relation fzy
(1.46)
=fyz
holds at (20,yo). Under similar conditions this result can be extended to cases with more than two independent variables and to higher-order mixed partial derivatives.
1.4 TOTAL DIFFERENTIAL When a function depends on two or more variables, we have seen that the limit at a point may depend on the direction of approach. Hence, it is important that we introduce a nondirectional derivative for functions with several variables. Given the function
for a displacement of Ar = (Ax, Ay, Az) we can write its new value as
f(r
+ A r ) = (X + Ax)(z + Az)
-
(y
+A Y ) ~
(1.48)
+ X A Z+ Z A X+ AXAZ y2 - 2yAy ( A Y ) ~ (1.49) = (xz y2) + (ZAX 2yAy + XAZ)+ AXAZ ( A Y ) ~ , (1.50) = xz
-
-
-
-
-
TOTAL DIFFERENTIAL
Figure 1.2
Partial derivative, f z , is the slope of the tangent line to
9
z = f(z,yo).
where r stands for the point (x,y, z ) and A r is the displacement (Ax, Ay, Az). For small. A r the change in f (x,y, z ) to first order can be written as
A f = f ( +~AT) - f ( r ) = (XZ - y 2 ) + (zAx - 2yAy + ZAZ)- ( I C Z - y2), (1.51)
Af
=
ZAX- 2yAy + X A Z .
(1.52)
Considering that the first-order partial derivatives of f are given as
af = z , -a f-- -2y, -
af = 2 , -
(1.53)
a f + -Az. af + -Ay dY dz
(1.54)
aY
dX
dz
Equation (1.52) is nothing but
af Af = -Ax dX
In general, if a function f (x,y, z ) is differentiable at (x,y, z ) in some domain D with the partial derivatives (1.55) then the change in f(x,y, z ) in D t o first order in (Ax, Ay, as
Af
2
-Ax df dX
+ -Ay a f + -Az. af dY
dz
Az) can be written (1.56)
10
FUNCTIONAL ANALYSIS
Figure 1.3
Total differential gives a local approximationto the change in a function.
In the limit as A r
-+
0 we can write Equation (1.56) as (1.57)
which is called the total differential of f (2, y, z ) . In the case of a function with one variable, f (x),the differential reduces to
Af
N
dfAx,
(1.58)
dx
which gives the local approximation to the change in the function at the point x via the value of the tangent line (Fig. 1.3) at that point. The smaller the value of Ax, the better the approximation. In cases with several independent variables, Af is naturally approximated by using the tangent plane at that point.
1.5
TAYLOR SERIES
The Taylor series of a function about
1?(x n.
50, when
it exists, is given as
00
f(x)=
-
(1.59)
x0)n
n=O
= a0
a2 + al(x - 2 0 ) + -(x 2!
-
2
XO)
+ .' .
.
(1.60)
To evaluate the coefficients, we differentiate repeatedly and set x = xo to find
11
TAYLOR SERIES
(1.61)
where (1.62) and the zeroth derivative is defined as the function itself, that is,
f'O'(x) = f ( x ) .
(1.63)
Hence the Taylor series of a function with a single variable is written as (1.64) This formula assumes that f ( x ) is infinitely differentiable in an open domain including X O . Functions that are equal to their Taylor series in the neighborhood of any point xo in their domain are called analytic functions. Taylor series about xo = 0 are called Maclaurin series. Using the Taylor series, we can approximate a given differentiable function in the neighborhood of zo to orders beyond the linear term in Equation (1.58). For example, to second order we obtain
(1.66) (1.67) Since xo is any point in the open domain that the Taylor series exists, we can drop the subscript in xo and write
df A ( 2 ' f (= ~ )-AX dx
1d2f + --(AX) 2 dx2
2
,
(1.68)
where A(')f denotes the differential of f to the second order. Higher-order differentials are obtained similarly.
12
FUNCTIONAL ANALYSIS
The Taylor series of a function depending on several independent variables is also possible under similar conditions and in the case of two independent variables it is given as
(1.69) n=O
where the derivatives are to be evaluated at (x0,yo). For functions with two independent variables and to second order in the neighborhood of (x,y ) , Equation (1.69)gives
which yields the differential, A(2)f (x, y) = f (x
+ Ax, y + Ay)
-
f ( x , y), as
(1.71)
For the higher-order terms note how the powers in Equation (1.69)are expanded. Generalization to n independent variables is obvious.
Example 1.1. Partial derivatives: Consider the function
z(x,y)
= xy2
Partial derivatives are written as dz
- = y2
dX dz
+ ex.
+ ex,
(1.74) (1.75)
- = 2xy,
d
&
dY dz
(&)
d22 = @ = ex,
&)=v d
dz
$($)=z&-
d2z
= 22,
d2z -
d 2z
(1.73)
2Y,
(1.76) (1.77) (1.78)
&($)=a-
2y.
(1.79)
TAYLOR SERIES
13
Example 1.2. Taylor series: Using the partial derivatives obtained in the previous example, we can write the first two terms of the Taylor series [Eq. (1.69)] of z = zy2 + e" about the point (0, I). First, the required derivatives at ( 0 , l ) are evaluated as
z ( 0 , l ) = 1,
(1.80) (1.81)
Using these derivatives we can write the first two terms of the Taylor series about the point (0,1>as
z(x, y) = z ( 0 , I )
($) (")
+
(Y-1)
0
+ - (1- ) 0 d 22 2+ z
x ( y - l ) + - ( -1) d 2 z ( y - 1 I 2 + . dXdY 0 2 dY2 0 1 1 = 1 + 22 + O(y - 1) + -x2 22(y - 1) -O(y - 1 ) 2 2 2 (1.87) 1 (1.88) = 1 22 -x2 2x(y - 1) + . .. . 2 2
8x2
+
+ +
+
+
t . .
+
where the subscript 0 indicates that the derivatives are to be evaluated at the point ( 0 , l ) . To find A(2)z(0,l ) ,which is good to the second order, we first write Lwz(0,l) =
(g)o + ($) ay Ax
(1.89)
0
= 2Ax
(1.90)
14
FUNCTIONAL ANALYSIS
Figure 1.4
Maximum and minimum points of a function.
and then obtain
(1.91)
1 = 2Ax + - (Ax)’ 2
+ 2AxAy.
(1.92)
1.6 M A X I M A A N D M I N I M A OF F U N C T I O N S We are frequently interested in the maximum or the minimum values that a function, f ( z ) , attains in a closed domain [a,b].The absolute maximum, M I , is the value of the function at some point, XO, if the inequality MI
=f(X0)
2 f(.)
(1.93)
holds for all x in [a,b]. An absolute minimum is also defined similarly. In general we can quote the following theorem (Fig. 1.4): Theorem 1.3. If a function, f(x), is continuous in the closed interval [a,b], then it possesses an absolute maximum, M I , and an absolute minimum, Adz, in that interval. Proof of this theorem requires a rather detailed analysis of the real number system, which can be found in books on advanced calculus. On the other hand, we are usually interested in the extremum values, that is, the local maximum or the minimum values of a function. Operationally, we can determine whether a given point, XO, corresponds to an extremum or not by
MAXIMA AND MINIMA OF FUNCTIONS
Figure 1.5
15
Analysis of critical points.
looking at the change or the variation in the function in the neighborhood of 2 0 . The total differential introduced in the previous sections is just the tool needed for this. We have seen that in one dimension we can write the first, Af('), the second, A(2)f , and the third, A(3)f , differentials of a function with single independent variable as (1.94) (1.95)
Extremum points are defined as the points where the first differential vanishes, which means (1.97) In other words, the tangent line a t an extremum point is horizontal (Fig. 1.5a,b). In order to decide whether an extremum point corresponds t o a local maximum or minimum we look at the second differential: (1.98) For a local maximum the function decreases for small displacements about the extremum point (Fig. 1.5a), which implies A(2)f(xo)< 0. For a local minimum a similar argument yields Ac2)f (xg) > 0. Thus we obtain the following criteria: = 0 and
(s) < 0 d2f
for a local maximum
(1.99)
for a local minimum.
(1.100)
50
and = 0 and
(z) d2f
xo
>0
16
FUNCTIONAL ANALYSIS
Figure 1.6
Plot of y(z) = z3.
In cases where the second derivative also vanishes, we look at the third differential, ~ l ( ~ ) f (We z ~now ) . say that we have an inflection point; and depending on the sign of the third differential, we have either the third or the fourth shape in Figure 1.5. Consider the function
f(.) = x 3 ,
(1.101)
where the first derivative, f ’ ( z ) = 3x2, vanishes at zo = 0 . However, the second derivative, f ” ( z ) = 622, also vanishes there, thus making 20 = 0 a point of inflection. From the third differential: (1.102)
1 3!
= -6(Az)3,
(1.103)
we see that A(3)f(zo)> 0 for Ax > 0 and A(3)f(zo)< 0 for Az < 0. Thus we choose the third shape in Figure 1.5 and plot f ( z ) = z3 as in Figure 1.6. Points where the first derivative of a function vanishes are called the critical points. Usually the potential in one-dimensional conservative systems can be represented by a (scalar) function, V ( z ) .Negative of the derivative of the potential gives the z component of the force on the system:
F,(z)
=
dV dz
--.
(1.104)
Thus the critical points of a potential function, V ( z ) correspond , t o the points where the net force on the system is zero. In other words, the critical points are the points where the system is in equilibrium. Whether an equilibrium is stable or unstable depends on whether the critical point is a minimum or maximum, respectively. Analysis of the extrema of functions depending on more than one variable follows the same line of reasoning. However, since we can now approach the
MAXIMA AND MINIMA OF FUNCTIONS
17
critical point from infinitely many different directions, one has to be careful. Consider a continuous function
z = f(X,Y),
(1.105)
defined in some domain D. We say this function has a local maximum at ( 2 0 , yo) if the inequality S(X,Y)
5 f(X0,Yo)
is satisfied for all points in some neighborhood of minimum if the inequality
(1.106) ( 5 0 , yo)
and to have a local
f ( x , Y) 2 f(z0,Yo)
(1.107)
is satisfied. In the following argument we assume that all the necessary partial derivatives exist. Critical points are now defined as the points where the first differential, A(')f(z, y), vanishes:
A ( l ) f ( ~ , y=) AX :[
+3 A y ] dY
= 0.
(1.108)
Since the displacements A x and Ay are arbitrary, the only way to satisfy this equation is to have both partial derivatives, fz and fv, vanish. Hence at the critical point ( I C O , yo), shown with the subscript 0, one has
(g)o
= 0,
(1.109)
($)o
= 0.
(1.110)
To study the nature of these critical points, we again look at the second differential, A(2)f(xo,yo), which is now given as
For a local maximum the second differential has to be negative, A(2)f(xo,yo) < 0, and for a local minimum positive, ~ I ( ~ ) f ( xyo) o , > 0. Since we can approach the point (50,yo) from different directions, we substitute (Fig. 1.7)
Ax
= Ascosd
and Ay
= Assind
(1.112)
to write Equation (1.111) as 1
A(2)f(xo,yo)= - [ A c o s 2 d + 2 B c o s d s i n ~ + C s i n 2 d ]AS)^, 2
(1.113)
18
FUNCTIONAL ANALYSIS
Figure 1.7
Definition of As.
where we have defined
A=
(g)o,(g)o> (w),’ f B=
‘=
d2
(1.114)
Now the analysis of the nature of the critical points reduces to investigating the sign of ~ I ( ~ ) f ( yo) z o ,[Eq. (1.113)]. We present the final result as a theorem (Kaplan). Theorem 1.4. Let z = f(z, y) and its first and second partial derivatives be continuous in a domain D and let (20,yo) be a point in D , where the partial derivatives (&)nand
($)
vanish. Then, we have the following cases:
n
I. For B2 - AC-< 0 and A % C < 0 we have a local maximum at (20,yo). 11. For B2 - AC < 0 and A + C > 0 we have a local minimum at (zo, yo). 111. For B2 - AC > 0 , we have a saddle point a t (z0,yo). IV. For B2 - AC = 0 , the nature of the critical point is undetermined. When B2 - AC > 0 at (z0,yO) we have what is called a saddle point. In this case for some directions A ( 2 ) f ( z ~ , y ois) positive and negative for the others. When B2 - AC = 0 , for some directions A(’)f(zo,yo) will be zero, hence one must look at higher-order derivatives to study the nature of the critical point. When A , B , and C are all zero, then A(2)f ( 2 0 , yo) also vanishes. Hence we need to investigate the sign of A(3)f (zo,yo).
1.7
EXTREMA OF FUNCTIONS W I T H CONDITIONS
A problem of significance is finding the critical points of functions while satisfying one or more conditions. Consider finding the extremums of
w
= f(z,y,z)
(1.115)
gl(z,Y,z) = 0
(1.116)
while satisfying the conditions
EXTREMA OF FUNCTIONS WITH CONDITIONS
19
In principle the two conditions define two surfaces, the intersection of which can be expressed as (1.118) (1.119) (1.120) where we have used the variable x as a parameter. We can now substitute this parametric equation into w = f (x,y, z ) and write it entirely in terms of 2 as
extremum points of which can now be found by the technique discussed in the previous section. Geometrically, this problem corresponds to finding the y, z ) on the curve defined by the intersection of extremum points of w = f(z, g1(.r, y, z ) = 0 and g2(x, y, z ) = 0. Unfortunately, this method rarely works to yield a solution analytically. Instead, we introduce the following method: At a critical point we have seen that the change in w to first order in the differentials Ax, Ay, and Az is zero:
Aw
=
af -Ax dX
8.f af + -Az + -Ay dY dz
= 0.
(1.122)
We also write the differentials of g1(x, y, z ) and g2(2,y, z ) as
%ax + -Ay ag1 dX
dY
-Ax dg2
+ -Ay ag2
dX
dY
+ -Az 891
=0
dz
(1.123)
and
+dg2 az
= 0.
dz
We now multiply Equation (1.123) with A 1 and Equation (1.124) with add to Equation (1.122) to write
(1.124) A2
and
(1.125) Because of the given conditions in Equations (1.116) and (1.117), Ax,Ay, and Az are not independent. Hence their coefficients in Equation (1.122)
20
FUNCTIONAL ANALYSIS
cannot be set to zero directly. However, the values of A 1 and X2, which are called the Lagrange undetermined multipliers, can be chosen so that the coefficients of A x ,Ay, and Az are all zero in Equation (1.125):
(1.126) (1.127) (1.128) Along with the two conditions, g1(x, y,z ) = 0 and g2(x, y,z ) = 0, these three equations are t o be solved for the five unknowns:
The values that A 1 and A2 assume are used to obtain the x,y, and z values needed, which correspond to the locations of the critical points. Analysis of the critical points now proceeds as before. Note that this method is quite general and as long as the required derivatives exist and the conditions are compatible, it can be used with any number of conditions.
Example 1.3. E x t r e m u m problems: We now find the dimensions of a rectangular swimming pool with fixed volume Vo and minimal area of its base and sides. If we denote the dimensions of its base with x and y and its height with z , the fixed volume is
vo = xyz
(1.130)
and the total area of the base and the sides is
a
= xy
+ 2x2 + 2yz.
(1.131)
Using the condition of fixed volume we write a as a function of x and y as
avo + -.avo
a = xy+ -
Y
X
(1.132)
Now the critical points of a are determined from the equations
(1.133) which give the following two equations:
(1.134) (1.135)
21
EXTREMA OF FUNCTIONS WITH CONDITIONS
or
yz2 - 2vo = 0,
(1.136)
2vo = 0.
( 1.137)
zy2
-
If we subtract Equation (1.137) from Equation (1.136), we obtain
(1.138)
Y = 5,
which when substituted back into Equation (1.136) gives the critical dimensions
(1.139) (1.140)
.=(?)
1/3
,
(1.141)
where the final dimension is obtained from Vo = xyz. To assure ourselves that this corresponds to a minimum, we evaluate the second-order derivatives at the critical point,
(1.142) (1.143)
(I.144) and find
B2 - AC
=
1-4
=
- 3 < 0 and A + C = 2 + 2 = 4 > 0.
(1.145)
Thus the critical dimensions we have obtained [Eqs. (1.139)-(1.141)] are indeed for a minimum by Theorem 1.4.
Example 1.4. Lagrange undetermined multipliers: We now solve the above problem by using the method of Lagrange undetermined multipliers. The equation to be minimized is now f(5, y, z ) = xy
+ 2zz + 2yz
(1.146)
with the condition g(z, g, 2 ) =
& - xyz = 0.
(1.147)
22
FUNCTIONAL ANALYSIS
The equations to be solved are obtained from Equations (1.126)-(1.128) as
y x 22
+ 22 - yzx = 0, + 22 xzx = 0, + 2y xxy = 0. -
(1.148) (1.149)
-
(1.150)
Along with VO= xyz, these give 4 equations to be solved for the critical dimensions x , y , z , and A. Multiplying the first equation by x and the second one by y and then subtracting gives
x
= y.
(1.151)
Substituting this into the third equation [Eq. (1.150)] gives the value of the Lagrange undetermined multiplier as A = 4/x, which when substituted into Equations (1,148)-(1.150) gives
xy
+ 2x2 4yz = 0, x + 22 - 42 = 0, 22 + 2y 4y = 0. -
-
(1.152) (1.153) (1.154)
Using the condition Vo = xyz and equation (1.151) these three equations [Eqs. (1.152)-(1.154)] can be solved easily to yield the critical dimensions in terms of Vo as =
(1.155)
y=
(1.156)
(T)
1/3
z=
(1.157)
Analysis of the critical point is done as in the previous example by using Theorem 1.4.
1.8 DERIVATIVES A N D DIFFERENTIALS OF COMPOSITE FUNCTIONS
In what follows we assume that the functions are defined in their appropriate domains and have continuous first partial derivatives. Chain rule: If z = f ( x , y) and x = x ( t ) , y = y(t), then
dz - -_ dzdx dt
Similarly, if z
= f ( x , y)
dx dt
+--ddyz ddty
and x = g ( u , v) and y = h(u,v), then
(1.158)
DERIVATIVES AND DIFFERENTIALS OF COMPOSITE FUNCTIONS
23
(1.159) (1.160)
A better notation t o use is
(1.162) This notation is particularly useful in thermodynamics, where z may also be expressed with another choice of variables, such as
(1.163) (1.164) (1.165) Hence, when we write the derivative
dz -
dX’
(1.166)
we have t o clarify whether we are in the ( ~ , y or ) the ( x , w ) space by writing
(1.167) These formulas can be extended to any number of variables. Using Equation (1.158) we can write the differential dz as
dz =
(”ax at +--d y ”) dt at dz
= -dx
ax
dz + -dy. dy
(1.168) (1.169)
We now treat x,y and z as functions of (u, v) and write the differential dz as
dz dU
dz
dz = - du + - dv
=
(g)
dV
(1.170) (1.171)
24
FUNCTIONAL ANALYSIS
Since z and y are also functions of u and u,we have the differentials (1.173) and (1.174) which allow us to write Equation (1.172) as dz
=
dz
dz
dX
dY
- dX + - dy.
(1.175)
This result can be extended t o any number of variables. In other words, any equation in differentials that is true in one set of independent variables is also true for another choice of variables. Formal proofs of these results can be found in books on advanced calculus (Apostol, Kaplan). 1.9
IMPLICIT FUNCTION THEOREM
A function given as
can be used to describe several functions of the form
z = f(X,Y), y = g(x,z ) , etc.
(1.177)
+ z2
(1.179)
(1.178)
For example,
x2 +y2
-
9=0
can be used to define the function
z
=
JW'
(1.180)
or (1.181) both of which are defined in the domain x2 + y2 + z 2 5 9. We say these functions are implicitly defined by Equation (1.179). In order t o be able to define a differentiable function. = f ( x ,Y),
(1.182)
IMPLICIT FUNCTION THEOREM
25
by the implicit function F ( x ,y, z ) = 0, the partial derivatives
a f and ax
af
(I.183)
-
ay
should exist in some domain so that we can write the differential (1.184) Using the implicit function F ( x ,y, z ) = 0, we write
F, dx
+ Fu d y + F, dz = 0
(1.185)
and F X
dz = -- dx F,
-
3 dy,
(1.186)
Fz
where
dF F --, F
,-ax
dF
--
dy
y -
andF,=-
dF dz
(1.187)
Comparing the two differentials [Eqs. (1.184) and (1.186)], we obtain the partial derivatives (1.188) Hence? granted that F, # 0, we can use the implicit function F ( x ,y , z ) = 0 to define a function of the form z = f ( x , y ) . We now consider a more complicated case, in which we have two implicit functions: (1.189) (1.190) Using these two equations in terms of four variables, we can solve, in principle, for two of the variables in terms of the remaining two as (1.191) (1.192) For f (x,y ) and g(x,y) to be differentiable, certain conditions must be met by F ( x ,y, z , w) and G ( x ,y, z , w). First we write the differentials
+
+
+
F, dx Fy dy F, dz F, dw = 0 , G, dx+Gy dy+G, dz+G, dw=O
(1.193) (1.194)
26
FUNCTIONAL ANALYSIS
and rearrange them as
+
F, d z Fw d w G, dz + Gw d w
-Fz d x - Fy d y ; = -G, d x - G, dy. =
We now have a system of two linear equations dur to be solved simultaneously. We can either determinants and the Cramer’s rule to write -F, d x - Fy dy -G, d x - Gy d y dz =
(1.195) (1.196)
for the differentials d z and solve by elimination or use
Gw Fw
I
(1.197)
and
F, dw =
~
G,
-F, d x -G, d x
Fy d y - G, d y -
(1.198)
Using the properties of determinants, we can write these as
and
For differentiable functions, z = f (x,y ) and w = g ( x , y ) , with existing firstorder partial derivatives we can write (1.201) (1.202) Thus by comparison with Equations (1.199) and (l.200), we obtain the partial derivatives
d ( F ,G) a ( F ,G) d f - - a(xlw) 3f - - d(Y,W) d(F,G)’ & d(F,G) dx q z , w) a ( z , w)
(1.203)
IMPLICIT FUNCTION THEOREM
27
and
(1.204)
(1.205) are called the Jacobi determinants. In summary, given two implicit equations
we can define two differentiable functions = f ( x , y ) and
w
= g(Z,Y)
(1.207)
with the partial derivatives given as in Equations (1.203)-(1.204), provided that the Jacobian
(1.208) is different from zero in the domain of definition. This useful technique can be generalized t o a set of m equations in n number of unknowns:
+m
(1.209)
(1.210)
28
FUNCTIONAL ANALYSIS
and obtain a set of m linear equations to be solved for the m differentials, d y i , i = 1,.. . ! nz, of the dependent variables. Using Cramer’s rule, we can solve for dyi if and only if the determinant of the coefficients is different from zero, that is,
To obta.in closed expressions for the partial derivatives,
(1.213)
we take partial derivatives of the Equations (1.209) to write
(1.2 14)
dYi
which gives the solubion for - as dXj
and similar expressions for the other partial derivatives can be obtained. In general, granted that. the Jacobi determinant does not vanish, namely
IMPLICIT FUNCTION THEOREM
29
dYi we can obtain the partial derivatives - as dXj
8% dXj
q y 1 , . . . ,yi-1, Xj,yi+1,. ‘ . ,Ym) d(F1,. ,Fm)
>
( 1.217)
”
a(vl,.. . > Ym) where i = 1 , .. . , m and j = 1,.. . n.We conclude this section by stating the implicit function theorem, a proof of which can be found in Kaplan: Implicit function theorem: Let the functions
Fi(y1,. . . , y m , x l , . . . ,xn)= 0, i = 1 , . . . ,m,
(1.218)
be defined in the neighborhood of the point
with continuous first-order partial derivatives existing in this neighborhood. If (1.220) then in an appropriate neighborhood of Po,there is a unique set of continuous functions yi = fi(zl,.. . ,x,), i = l , ,. . ,m,
(1.221)
with continuous partial derivatives,
where i = 1 , .. . , m and j = 1 , . . . n,such that ;yoi = fi(z01,. . . , ZO,),
i = 1 , .. . , m,
(1.223)
and
Fi(fl(zl,. . . , z n ) ., . . , f m ( z l , . . . , z n ) , z l , . . . ,x,) = 0, i = l , ,. . , m , (1.224) in the neighborhood of Po. Note that if the Jacobi determinant [Eq. (1.120)] is zero at the point of interest, then we search for a different set of dependent variables to avoid the difficulty.
30
FUNCTIONAL ANALYSIS
1.10 INVERSE FUNCTIONS
A pair of functions, (1.225) (1.226) can be considered as a mapping from the xy space to the uu space. Under certain conditions, this maps a certain domain D,, in the xy space t o a certain domain D,, in the uu space on a one-to-one basis. Under such conditions, an inverse mapping should also exist. However, analytically it may not always be possible to find the inverse mapping or the functions: (1.227)
( 1.228) In such cases, we may consider Equations (1.225) and (1.226) as implicit functions and write them a s
We can now use Equation (1.215) with y1 = u,y2 = u and x1 write the partial derivatives of the inverse functions as
= x, x2 = y
to
a F 1 , F2) (1.231)
(1.232)
(1.233) Similarly, the other partial derivatives can be obtained. As seen, the inverse function or the inverse mapping is well-defined only when the Jacobi determinant J is different from zero, that is, (1.234) where J is also called the Jacobian of the mapping. We will return to this point when we discuss coordinate transformations in Chapter 3. Note that
INVERSE FUNCTIONS
31
the Jacobian of the inverse mapping is 1/J.In other words,
(1.235)
Example 1.5. Change of independent variable: We now transform the Laplace equation:
(1.236) into polar coordinates, that is, to a new set of independent variables defined by the equations
x = r cos 4,
(1.237) (1.238)
y = r sin 4,
4 E [0,27r]. We first
where r E (0, cm)and of 2 = z(x,y) :
write the partial derivatives
dz d z d x dzdy - -+ --, dr dxdr dydr dz d z d x dzdy --a$ - dxdcp +--> dyd$
(1.239) (1.240)
which lead to d z = dz
-cos$+-sin$, dz .
dY
dx
dr
d z = -(-rsind) dz
a$
ax
+ -(rcosd). dz dY
(1.241) (1.242)
Solving for dzldx and dzldy,we obtain
dz
dz
- = -cosq!dx dr dz az . - = -sin$ dy dr
dz 1 --sin$,
(1.243)
dz 1 + -cos$.
(1.244)
84
84 r
32
FUNCTIONAL ANALYSIS
We now repeat this process with dz/dx to obtain the second derivative d 2 z / d z 2 as
[
1
sin4 d d z -cos+--sin4 r 84 ar 84 r . d 2z d2z 2 d2z 1 - -cos2 4 - -cos 4 sin 4 + -- sin2 4 dr drd4 r r2 1dz 2 dz 2 +--sin 4+--sin@cos4. (1.245) r dr 84 r2
A similar procedure for dz/dy yields d 2 z / d y 2 : d22 d2z . 2 822 2 d2z 1 -=-sin 4+-sin 4cos 4 -- cos2 4 dy2 dr2 drd4 r d42 r2 182 2 a2 2 -- cos 4 - -- sin4cos4. (1.246) r dr dd r2
+
,
+
Adding Equations (1.245) and (1.246), we obtain the transformed equation as
d 2 z ( r ,6 ) dr
( r ,0) z ( r ,6 ) + -r1-d z dr + -r21 d 2&b2 = 0.
(1.247)
Since the Jacobian of the mapping is different from zero, that is, J = - -d(x,Y)
d ( r , Q )-
I
- rc0s4 sin$
rcos4
= r, r
# 0,
(1.248)
the inverse mapping exists and it is given as
r=d
1.11
m
(1.249)
4= tan-' 2. 5
(1.250)
INTEGRAL CALCULUS A N D T H E D E F I N I T E INTEGRAL
Let f ( x ) be a continuous function in the interval [x,, 561. By choosing (n- 1) points in this interval, xl,z 2 , . . , ~ ~ - we 1 ,can subdivide it into n subintervals, Ax1 , Ax2, . . . , Az,, which are not necessarily all equal in length. From
INTEGRAL CALCULUS AND THE DEFINITE INTEGRAL
33
-z
AX, Ax2 Ax3 xo
XI
x2
Figure 1.8
Ax4
x3
.-.X b
*
Upper (left) and lower (right) Darboux sums.
Theorem 1.3 we know that f(x) assumes a maximum, M , and a minimum, m,in [x,,xb].Let Mi represent the maximum and mi the minimum values that f (x)assumes in Axi. We now denote a particular subdivision by d and write the sum of the rectangles shown in Figure 1.8 (left) as n
S ( d )= C M Z A X ,
(1.251)
i=l
and in Figure 1.8 (right) as n
~ ( d=)
C
(1.252)
miAXi.
i=l
The sums S ( d ) and s ( d ) are called the upper and the lower Darboux sums, respectively. Naturally, their values depend on the subdivision d. We pick the smallest of all S ( d ) and call it the upper integral of f(x) in [x,,xb]: (1.253) Similarly, the largest of all s ( d ) is called the lower integral of f(x) in [x,,xb] : (1.254) When these two integrals are equal, we say the definite integral of f (x)in the interval [x,,xb] exists and we write
I:'
-f ( x ) dx = l y f ( x ) dx = -
l:
f(x) dx.
( 1.255)
34
FUNCTIONAL ANALYSIS
T' Figure 1.9
Riemann integral
This definition of integral is also called the Riemann integral, and the function f(x) is called the integrand. Darboux sums are not very practical to work with. Instead, for a particular subdivision we write the sum n
a(d) =
f(zk)axk,
(1.256)
k=l
where 5 k is an arbitrary point in Axk (Fig. 1.9). It is clear that the inequality
s ( d ) 5 a ( d )5 S ( d )
(1.257)
is satisfied. For a given subdivision the largest value of Axi is called the norm of d , which we will denote as n ( d ) . 1.12
R I E M A N N INTEGRAL
We now give the basic definition of the Riemann integral as follows: Definition 1.1. Given a sequence of subdivisions d l , dz, . . . of the interval [x(~ q,] , such that the sequence of norms n ( d l ) ,n ( d z ) , . . . has the limit lim n(&)
---f
k-oo
( 1.258)
0
and if f ( ~ is ) integrable in [ x , , z ~ ] then , the Riemann integral is defined as
f(x) dx = lim a ( & ) , k-cc
(1.259)
where lim S ( d k ) = lim s ( d k ) = lim a ( & ) .
k-cc
k-cc
k-cc
(1.260)
RIEMANN INTEGRAL
35
Theorem 1.5. For the existence of the Riemann integral
L-b
f(x) dx,
where x, and xb are finite numbers, it is sufficient t o satisfy one of the following conditions: i) f ( x ) is continuous in [x,,zb]. ii) f(z)is bounded and piecewise continuous in [x,,z b ] . From these definitions we can deduce the following properties of Riemann integrals. Their formal proofs can be found in books on mathematical analysis such as Apostol: I. If fl(z)and fi(z) are integrable in [z,,zb],then their sum is also integrable and we can write
JI:'
[fl
(z)
+ f 2 ( ~ ) 1 dx =
l"
fl(z) dx +
Ixb
f2(z) dz.
(1.261)
2,
11. If f(x) is integrable in [z,, zb],then the following are true: a f ( z )dz = a
i:'
f(z)d z , a is a constant,
(1.262)
(1.263)
(1.264)
(1.265)
111. If f(z)is continuous and f(x) 2 0 in [z,,xb],then
f(z)dx = 0 means . f ( z ) = 0. IV. -The average or the mean, defined as
(f), of f(z)in the interval
(1.266)
[z,,zb]
is
(1.267)
36
FUNCTIONAL ANALYSIS
If f ( x ) is continuous, then there exist a t least one point z* E [x,,xb] such that
1:
f ( ~dx) = f ( ~ * ) ( b - a).
(1.268)
This is also called the mean value theorem or Rolle's theorem. V. If f ( x ) is integrable in [x,,xb]and if x, < z,< X b , then
l:
f ( z ) dx =
l:
f ( x )d z
+
1"
f ( x ) dx.
( 1.269)
VI. If f ( z ) 2 g(z) in [x,,xb],then ( 1.270) VII. Fundamental theorem of calculus: If f ( z ) is continuous in [x,,zb], then the function
(1.271) is also a continuous function of x in [z,, zb]. The function F ( x ) is differentiable for every point in [x,,zb] and its derivative at x is f ( x ) :
(1.272)
F ( z ) is called the primitive or the antiderivative of f ( x ) .Given a primitive, F (x),then F ( x ) + constant
(1.273)
is also a primitive. If a primitive is known for [x,,xb], then we can write
l:
f(x) dx
=
1:'
dx
(1.274) (1.275) (1.276)
When the region of integration is not specified, we write the indefinite in-
tegral
1
f ( x ) dx = F ( x )
+ C,
where C is an arbitrary constant and F ( z ) is any function the derivative of which is f(x).
37
IMPROPER INTEGRALS
VIII. If f(x) is continuous and f(x) 2 0 in [z,, zb], then geometrically the integral (1.277) Jza
is the area under f (x)between 2, and 2 6 . IX. A very useful inequality in deciding whether a given integral is convergent or not is the Schwarz inequality:
(1.278)
X. One of the most commonly used techniques in integral calculus is the integration by parts:
i:'
uddx = [UW];~
(1.279)
or
I:'
u du = [ U U ] ; ~-
1'"
u du,
(1.280)
where the derivatives u' and v' and u and v are continuous in [x,,xb]. XI. In general the following inequality holds:
szab
sxxab
that is, if the integral If(x)l dx converges, then the integral f(x) dx also converges. A convergent integral, f(x) dx, is said to be absolutely convergent, if If(x)l dx also converges. Integrals that converge but do not converge absolutely are called conditionally convergent.
sz:
s'y
1.13
IMPROPER INTEGRALS
We introduced Riemann integrals for bounded functions with finite intervals. Improper integrals are basically their extension to cases with infinite range and to functions that are not necessarily bounded. Definition 1.2. Consider the integral rc
(1.281) which exists in the Riemann sense in the interval [a,c],where a < c < b. If the limit
(1.282)
38
FUNCTIONAL ANALYSIS
exists, where the function f ( x ) could be unbounded in the left neighborhood b of b, then we say the integral f ( x )dx exists, or converges, and write
sa
[/(XI
dx = A .
(1.283)
Example 1.6. Improper integrals: Consider the improper integral (1.284) where the integrand, x / ( l - x ) l l 2 , is unbounded at the end point x = 1. We write I1 as the limit (1.285) =
lim
2(1 - x)3/2
(1.286)
c-1-
thereby obtaining the value of I1 as 413. We now consider the integral (1.288) which does not exist since 12 =
lim
1': 5
(1.289)
(1 - 2 ) = lim [-1n(l - x ) ] : c-1-
0
(1.290)
C-1-
=
lirn [-In(1 - c)] 4
00.
c-1-
(1.291)
In this case we say the integral does not exist or is divergent, and for its value we give fco.
A parallel argument is given if the integral
ib
f ( x ) dx
(1.292)
exists in the interval [c,b ] , where a < c < b. We now write the limit b
I = lim c-a+
f ( x ) dx,
(1.293)
IMPROPER INTEGRALS
39
where f(x) could be unbounded in the right neighborhood of a. If the limit (1.294) exists, we write
{ f(x) dx
= B.
(1.295)
We now present another useful result from integral calculus: Theorem 1.6. Let c be a point in the interval ( a , b ) and let f(x) be integrable in the intervals [a,a’] and [b’, b ] , where a < a’ < c < b’ < b. Furthermore, f(x) could be unbounded in the neighborhood of c. Then the integral (1.296) exists if the integrals (1.297) and b
I2 =
f(x) dx
(1.298)
both exist and when they exist, their sum is equal to I : I = 11 + 1 2 .
If either 11 or
12
(1.299)
diverges, then I also diverges.
Example 1.7. I m p r o p e r integrals: Consider the integral (1.300) (1.301) which converges provided that the integrals in Equation (1.301) converge. However, they both diverge: = lim
dx - = lim [ln1x1]: x c40-
- lim 1nIcl + -a (1.302) - c-0-
40
FUNCTIONAL ANALYSIS
and similarly,
i3e+
lim
2
c-o+
dx = lirn ; c+o+
[In 1x11:
-
In 3 - lim In IcI c-o+
--j
+a, (1.303)
hence the integral,
l:J $, also diverges.
When the range of the integral is infinite, we use the following results: If f(x) is integrable in [ q b ] and the limit rb
exists, we can write
La
f(x) dx = A.
(1.305)
f(x) dz
(1.306)
Similarly, we define the integral = B.
If the integrals (1.307) and (1.308) both exist, then we can write (1.309)
1.14
CAUCHY PRINCIPAL VALUE INTEGRALS
In Example 1.7, since the integrals
11 = l ! J
$ and 12 = Ji% both diverge,
-
%
we used Theorem 1.6 to conclude that the integral I = J !l is divergent. However, notice that I1 diverges as 1nIe -00, while I , diverges as lim,,o+(- In Icl) +cm.In other words, if we consider the two integrals
-
CAUCHY PRINCIPAL VALUE INTEGRALS
41
together, the two divergences offset each other, thus yielding a finite result for the value of the integral as (1.310) =
lim In IcI - In 1
C-0-
+ In3 -
lim In IcI -+ In3
(1.311)
c-o+
= ln3.
(1.312)
J21
The problem with $ is that the integrand, 1/x, diverges at the origin. However, at all the other points in the range [-1,3] it is finite. In Riemann integrals (Theorem 1.6), divergence of either I I or I2 is sufficient to conclude that the integral I does not exist. However, as in the above case, sometimes by considering the two integrals, I1 and 1 2 , together, one may obtain a finite result. This is called taking the Cauchy principal value of the integral. Since it corresponds t o a modification of the Riemann definition of integral, it has to be mentioned explicitly that we are taking the Cauchy principal value as X
= ln3.
(1.313)
Another example is the integral cc
I =
[
( 1.314)
x3dx,
J -co
which is divergent in the ordinary sense, since a4 + 00. x3dx = lim l a x 3 d x = lim a-cc a-co 4
(1.315)
However, if we take its Cauchy principal value, we obtain lim x3dx = a-cc
[Ta + la x3dx
x’dx]
(1.316) (1.317)
Example 1.8. Cauchy principal value: Considering the integral O3
( 1 + x ) dx
(1.318)
we write
( 1.319)
42
FUNCTIONAL ANALYSIS
For a finite c we obtain the integral (1.320) = tan-'
c
+ -21 l o g ( l + c'),
(1.321)
+
+
log(1 c')] 00. which in the limit as c + 00 diverges as [tan-' c Hence the integral I also diverges in the Riemann sense by Theorem 1.6. However, since the other integral also diverges, but this time as tan-'(-c)
00'C
-
C-00
1 2
- log(1
1
+ c') ,
--f
(1.322)
we consider the two integrals in Equation (1.319) together to obtain the Cauchy principal value of I as O0
1.15
( 1 + x ) dx
(1.323)
=T.
INTEGRALS I N V O L V I N G A P A R A M E T E R
Integrals given in terms of a parameter play an important role in applications. In particular, integrals involving a parameter and with infinite range are of considerable significance. In this regard, we quote three useful theorems: Theorem 1.7. If there exists a positive function Q(x) satisfying the inequality /f(cr,x)I Q(x) for all cy E [ a 1 , c r 2 ] ,and if Q(x)dx is convergent, then the integral
s,"
0, there exists a number co depending on E but independent of cr such that f ( a ,z)dxl < E for all c > cg > a.
IsCm
Example 1.9. Uniform convergence: Consider the integral
I =
lo
(1.325)
e-"" sinx dx,
which is uniformly convergent for cr E this we choose Q(x) as e c E Z so that
[ E , o ~ ) for
le-"" sinxi 5 e P Z
every
E
> 0. To show (1.326)
INTEGRALS INVOLVING A PARAMETER
is true for all a
43
2 E. Uniform convergence of I follows, since the integral (1.327)
is convergent. Note that by using integration by parts twice we can evaluate the integral I as 1
f”
(1.328) The case where Q = 0 may be excluded, since the integral does not converge at all.
Theorem 1.8. Let f ( o , x )and and x E [a,m). If the integral
”d(a’ a
sinx dx
be continuous for all a E
[QI, Q Z ]
(1.329) exists for all a E [ a l ,CYZ] and if the integral
(1.330) is uniformly convergent for all Q E [ a 1 , a ~then ] , g ( a ) is differentiable in [ a l ,a21 (at a1 from the right and a t a2 from the left) with the derivative
(1.331) In other words, we can interchange the order of differentiation with respect to Q and integration with respect to x as
(1.332) This is also called the Leibnitz’s rule (Kaplan). Theorem 1.9. Let f(cu,x) be continuous for all a E [a,m). Also let the integral
[al,cu2]and LC
E
(1.333) be uniformly convergent for all a E [al,a2]. Then, (a) g ( a )is continuous in [ C Y ~ , Q(at ~ ]a1 from the right and at left).
a2
from the
44
FUNCTIONAL ANALYSIS
(b) The relation
that is,
Jdm [/""
1: [I"
f(a', x) dx] da' =
1
f ( z ,a') da' dx,
(1.335)
is true for all a E [ a l , a 2 ]In . other words, the order of the integrals with respect to z and a' can be interchanged. Note that in case (a) the interval for cy does not have to be finite. Remark: In the above theorems, if the limits of integration are finite but the function f ( a ,x) or its partial derivative d f ( a ,x ) / d a is not bounded in the neighborhood of the segment defined by x = b and a E [a1,a2],we say that the integral
s(a)=
Jd
b
(1.336)
f ( a , x )dx
is uniformly convergent for all a E [ a l ,a2],if for every 60 > 0 independent of LY such that the inequality
E
> 0 we can find
a
(1.337) is true for all S E [0,So].We can now apply the above theorems with the upper limit 03 in the integrals replaced by b and the domain x E [a,00) by x E [a,b]. Example 1.10. Integrals depending on a parameter: Given the integral
(1.338) we differentiate with respect to a to write (1.339) However, this is not correct. The integral on the right-hand side of
1 dcr [--] O0
d
sinax
dx=l"cosaxdx
(1.340)
does not exist, since the limit lim
6+m
sin ax cos ax dx = lim -
(1.341)
INTEGRALS INVOLVING A PARAMETER
45
dg is not justified (Theorem does not exist. Hence the differentiation da
1.8). On the other hand, given the integral
dx
p / 2
J,
a2
cos2 x
7r
a>o, 2ff ' + sin2 x - -
(1.342)
we can write
1
dx
r/2 a2
cos2 x
d
+ sin2 x
(1.343)
to obtain the integral 2cr cos2 x dx
- --7r
2 -
2cr2'
(1.344)
Example 1.11. Integrals depending on a parameter: Consider >
f(ff,x)=
XfO, x
(1.345)
= 0,
which is continuous for all x and a.Since (1.346) which is also continuous for all x and a,and the integral (1.347) converges uniformly for all a > 0 (Example 1.9), using Theorem 1.8 we conclude that (1.348) exists and can be differentiated to write
da
X
dx =
1
" d
sin x [ e C a X T ] dx
(1.349)
( 1.350) where we have used the result in Equation (1.328). We now use Theorem 1.9 to integrate g'(a) [Eq. (1.349)], which is continuous for all a > 0 to obtain
46
FUNCTIONAL ANALYSIS
However, we can also write
Lm
g’(a)dcy= -
Lm[I*
e--az sinxdx] da
=-La[- I, e-ax sin x x
=
-
Lm[/I
-L
03
dx,
=
1
eCaXsinxda d x
sin x Tdx,
a > 0, (1.352)
which along with Equation (1.351) yields the definite integral sin x
dx = n/2.
( 1.353)
1.16 LIMITS OF INTEGRATION DEPENDING ON A PARAMETER Let A ( x ) and B ( x ) be two continuous functions with continuous derivatives
] , B ( x ) > A ( x ) . Also let f ( t , x ) and a f ( t ’ x ) be continuous in in [ x 1 , x 2 with dX the region defined by [x1,x2]and [ X I = A(x),x2 = B ( x ) ] .We can now write the intcgral ~
(1.354) and its partial derivative with respect to x as (1.355) Using the relations [Eq. (1.272)] (1.356) (1.357) we can write (1.358) (1.359) We can also write
DOUBLE INTEGRALS
47
Ai
Y;
X
Figure 1.10 The double integral.
Thus obtaining the useful formula
1.17 DOUBLE INTEGRALS Consider a continuous and bounded function, f ( ~y), , defined in a closed region R of the xy-plane. It is important that R be bounded, that is, we can enclose it with a circle of sufficiently large radius. We subdivide R into rectangles by drawing parallels t o the z and the y axes (Fig. 1.10). We choose only the rectangles in R and numerate them from 1 to n. Area of the i t h rectangle is shown as AAi and the largest of the diagonals, h, is called the norm of the mesh. We now form the sum n
(1.362) i=l
where, as in the one-dimensional integrals, (x:, y,2) is a point arbitrarily chosen in the i t h rectangle. If the sum converges to a limit as h + 0, we define the double integral as the limit n
f ( ~ 5 yT)AAi ,
lim
h-0
i=l
+
ss
f ( x , y) dxdy.
R
(1.363)
48
FUNCTIONAL ANALYSIS
Figure 1.11
Ranges in the iterated integrals.
When the region R can be described by the inequalities Yl(2)
IY I YZ(Z),
51 I
Zl(Y)
Iz I5 2 ( Y ) ,
Y1
5
I52
(1.364)
or
IY 5 Y2,
(1.365)
where Y ~ ( s ) , Y ~ ( zand ) z1(y),22(y) are continuous functions (Fig. l.ll),we can write the double integral for the first case as the iterated integral (1.366) The definite integral inside the square brackets will yield a function F ( z ) , which reduces I to a one-dimensional definite integral:
l:
F(x) dx.
(1.367)
A similar argument can be given for the second case [Eq. (1.365)]. We now present these results in terms of a theorem: Theorem 1.10. If f ( z , y) is continuous and bounded in a closed interval described by the region Yl(Z)
IY I YZ(X),
51
I2 I 52,
(1.368)
then (1.369)
PROPERTIES OF DOUBLE INTEGRALS
49
is a continuous function of x and
Similarly, if
R is described by Zl(Y)
F 2 L 52(Y), Y1 I YI Y2,
(1.371)
then we can write
A formal proof of this theorem can be found in books on advanced calculus. 1.18
PROPERTIES OF DOUBLE INTEGRALS
We can summarize the basic properties of double integrals, which are essentially same as the definite integrals of functions with single variable as follows:
I.
(1.373)
f(x,y) dxdy, c is a constant;
cf(x,y ) dxdy = c
(1.374)
R
(1.375) where
R is composed of R1 and R2,which overlap only at
the boundary.
11. There exists a point (XI,y1) in R such that
where A is the area of R.The value f(xl,yl) is also the mean value, ( f ) , of the function in the region R : (1.377)
50
FUNCTIONAL ANALYSIS
(1.378) where A l is the absolute maximum, that is,
in R and A is the area of R. IV. Uses of double integrals: If we set f ( z , y ) = 1 in J J f ( z , y ) dzdy, the double integral corresponds to the area of the region R :
R
dzdy = area of R.
(1.380)
R
For f ( : r , y ) 2 0 we can interpret the double integral as the volume between the surface z = f ( z , y ) and the region R in the zy-plane. If we interpret f(z, y ) as the mass density of a flat object lying on the zy-plane covering the region R, the double integral (1.381) gives its total mass M TRIPLE A N D MULTIPLE INTEGRALS
1.19
Methods and results developed for double integrals can easily be extended to triple and multiple integrals:
/ / If(.,
y, z ) dzdydz,
J’J’J’/f(x,
R
y, z , w) dxdydzdw . . . .
(1.382)
R
Following the arguments given for the single and the double integrals, for a continuous and bounded function f (2, y, z ) in a bounded region R defined by
we can define the triple integral
(1.384)
PROBLEMS
An obvious application of the triple integral is when f ( x , y , z ) gives the volume of the region R :
SJ'S R
d x d y d z = volume of R.
=
51
1, which (1.385)
In physical applications, total amount of mass, charge, etc., with the density p(z,y, z ) are given as the triple integral
( 1.386) The average value of a function f ( z ,y, z ) in the region R with the volume V is defined as
(1.387) Example 1.12. Volume between two surfaces: To find the volume between the cone z = d G 5and the paraboloid z = x 2 + y', we first write the triple integral
L'I'[L2+g2 L' [ &GiF
I/ = =
dz] dxdy
,/'= - x2 - y2] d x d y .
(1.388) (1.389)
We now use plane polar coordinates to write this as (1.390) (1.391) (1.392) (1.393)
PROBLEMS 1. Determine the critical points as well as the absolute maximum and minimum of the functions y = Inz, 0 < z 5 2, (i)
(iii)
+ 2x2 + 1, -2
y = z3
< x < 1.
52
FUNCTIONAL ANALYSIS
2. Determine the critical points of the functions (i)
z =z3
(ii)
z
=1
(iii)
z
= z2 - 42y
-
62y2 + y 3 ,
+ z2 + y2, -
y2.
and test for maximum or minimum.
3. Find the maximum and minimum points of z = x2
subject to the condition x2
+ 24xy + 8y2
+ y2 = 25.
4. Find the critical points of w=x+y subject to
x2
+ y2 + z 2 = 1
and identify whether they are maximum or minimum.
5. Express the partial differential equation
in spherical coordinates (r,I9,d) defined by the equations
x y
4, = r sin I9 sin 4, = r sin I9 cos
z = rcose.
where r E [0,co),I9 E [O,7r] , E [0,an]. Next, first show that the inverse transformation exists and then find it.
6. Given the mapping x
= u2 -
2,
y = 2uv,
(i) Write the Jacobian. (ii) Evaluate the derivatives
(e)z
and
($)z.
PROBLEMS
7. Find
(g)y
and
53
( h ) for dY
eu + x u - yv - 1 = 0, e” - xu yu - 2 = 0.
+
8. Given the transformation functions
show that the inverse transformations = u(x,Y),
4 2 ,Y)
=
satisfy
du
1 dy d u - _--1 ax _ dv 1 dy dv 1 dx - --- _ - -J d v ’ dy J d v ’ dx J d u ’ dy Jdu’
-- --
dx
where J = a(x ’). Apply your result t o Problem 1.6.
9. Given the transformation functions
x
= z ( u ,V ’w), = y ( u , ‘u, w),
z = z ( u ,v,w) with the Jacobian
J = a(x’”
show that the inverse transformation
v,w)’
functions have the derivatives
d u - 1 d(z,x) d u - -~ 1 d(x,y) J d ( v , w ) ’ dy Jd(v,w)’ dz Jd(v,w)’ 1 d ( y , z ) dv - 1 d(z,x) dv 1 d(x,y) - -~ J d(w, u)’ dy J d(w, u)’ d z J d(w, u)’
du 1 d(y,z) -
dx dv dx dw _ dz
-
d ( y_ , z )__ dw -- -1 _
J d ( u , v ) ’ dy
-
1 d(z,x) dw
Jd(u,v)’ dz
-
1d(x,y) -
Jd(u,v)’
Verify your result in Problem 1.5. 10. In one-dimensional conservative systems the potential can be represented by a (scalar) function, V(x),where the negative of the derivative of the potential gives the x component of the force on the system:
dV F3:(x)= -dx
54
FUNCTIONAL ANALYSIS
With the aid of a sketch, analyze the forces on a system when it is displaced from its equilibrium position by a small amount and show that a minimum corresponds to stable equilibrium and a maximum corresponds to unstable equilibrium. 11. In one-dimensional potential problems show that near equilibrium potential can be approximated by the harmonic oscillator potential V ( 2 )= -1k ( z 2
where k is a constant and
20 is
-20)
2
,
the equilibrium point. What is k?
12. Expand z(z, y) = x3 sin y
+ y2 cos z
in Taylor series up to third order about the origin. 13. If z
= z ( u ,v)
and y
= y(u, u ) , then
show the following:
14. Show the integrals
Hint: Use
som9 dx = 4.
15. Evaluate the improper integrals: dx
.I
'I2
fi(1
dx -
22)
16. First show the following: - coverges if and only if p
(ii)
(iii)
11$,
coverges if and only if p
.IC&?
> 1,
< 1,
coverges if and only if p
0,b > 0 ,
where a and b are two parameters, show the integral
.i
dx
T/2
20. Determine the convergent:
(u2
a!
cos2 x
2=&(+2+$).
+ b2 sin2 x>
values for which the following integrals are uniformly
21. Can the order of integration be interchanged in the following integral (explain):
1
x-a!
dx da.
22. Use the result
.I
03
g(a)
=
sin xa! dx x(x2 1)
+
=
iT
-(1 2
-
e-a),
a!
> 0,
56
FUNCTIONAL ANALYSIS
to deduce the integrals sin x a
dx
=
7r
-(122
e P Q ) ,c
>0
and
23. Evaluate the double integral
I = //2y
dxdy
over the triangle with vertices (-1,0), (0, l ) ,and (2,O).
24. Evaluate I = J’Szy
dzdy
over the triangle with vertices (0,0), (1,l),and ( 1 , 3 ) .
25. Evaluate the integral
26. First evaluate the integral
and then repeat the integration over the same region but with the x integral taken first.
27. Test the following integral for convergence:
CHAPTER 2
VECTOR ANALYSIS
Certain properties in nature like mass, charge, temperature, etc., are scalars. They can be defined at a point by just giving a single number, that is, their magnitude. On the other hand, properties like velocity and acceleration are vector quantities which have both direction and magnitude. Most of the Newtonian mechanics and Maxwell's electrodynamics are formulated in terms of the language of vector analysis. In this chapter, we introduce the basic properties of scalars, vectors, and their fields. 2.1
VECTOR ALGEBRA: GEOMETRIC M E T H O D
Abstract vectors on a plane are defined by directed line segments. The length of the line segment describes the magnitude of the physical property and the arrow indicates its direction. As long as we preserve their magnitude and * 3 in direction, we can move vectors freely in space. In this regard, A and A Figure 2.1 are equivalent vectors:
' A =3 A. Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
8. SelGuk Bayin 57
58
VECTOR ANALYSIS
Figure 2.1
Abstract vectors.
We use Latin letters with an arrow to show vector quantities:
It is also customary to use boldface letters, A, B , a,b, . . . , for vector quantities, as we use in the figures. Magnitude or norm of a vector is a positive number and is shown as
lXl,1Z1,IZl... or simply as A , B , a , .. . Multiplication of a vector with a positive number, cr > 0, multiplies the magnitude by the same number while leaving the direction untouched:
Multiplication of a vector with a negative number, /3 < 0, reverses the direction while changing the magnitude as
Two vectors can be added by using the parallelogram method (Fig. 2.2). A convenient way to add vectors is to draw them head to tail as in Figure 2.3. This allows us to define a null vector d as -+
A
+ (-1)X = d.
(2.8)
Using the cosine and the sine theorems, we can find the magnitude, r, and the angle, 4, of the resultant,
T+=x+z,
(2.9)
VECTOR ALGEBRA: GEOMETRIC METHOD
59
r=A+B
k5-A
B Figure 2.2
Addition of vectors, r = A
+ B.
0
Figure 2.3
Addition of vectors by drawing them head to tail.
as T =
(A2+B2+2ABcosB)1’2,
4 = arcsin
(t
sind) .
(2.10) (2.11)
With respect to addition, vectors commute:
x+3=z+x and associate:
+z 3 + ( B + + Z ) .
( X + 3) A set of vectors,
=
(2.12)
(2.13)
{ x,z,5 , .. . } , can be added by drawing them head t o
tail. In Figure 2.3 (left), the resultant is ?;’=
x + 3+z+a’.
(2.14)
60
VECTOR ANALYSIS
Figure 2.4
Force problems.
If the resultant is a null vector, then the head of the last vector added and the tail of the first vector added meet (Fig. 2.3, right): d = 3 + 2 + 3 +
i?+Z+T.
Example 2.1. Point of application: In physics the point of application of
a vector is important. Hence we have t o be careful when we move them around to find their resultant. In some equilibrium problems, where the forces act at the center of mass, the net force is zero (Fig. 2.4, left). In Figure 2.4 (right), where there is a net force, the resultant also acts a t the point 0. 2.1.1
Multiplication of Vectors
For the product of two vectors there are two types of multiplication: The scalar product, which is also known as the dot or the inner product, is
defined as +
A
. Z= A B C O ~ O (2.15)
where 0 is the angle between the two vectors. The dot product is also shown as (2,S‘). If we write -ri’ . as
Tt
(2.16) + Ag = A .EB = ACOSO,
2,
(2.17)
where 2~ is a unit vector along the direction of the dot product becomes a convenient way to find the projection of a vector along another vector, that is, the component, AB, of 2 along EB.
VECTOR ALGEBRA: GEOMETRIC METHOD
Figure 2.5
61
Dot or scalar product.
In physics, work is a scalar quantity defined as the force times the displacement along the direction of force. In other words, it is the dot product of the force with the displacement. For a particle in motion, the infinitesimal work is written as the dot product of the force, 3, with the infinitesimal displacement vector along the trajectory of the particle, d s ’ , as (Fig. 2.5)
sW=T.db.
(2.18)
We have chosen to write 6W instead of dW t o emphasize the fact that, in general, work is path-dependent. To find the total work done between two points A and B, we have to integrate over a specific path C connecting the two points: (2.19) JA
c
A different path connecting A to B in general yields a different value for the work done. Another type of vector multiplication is called the vector product or the cross product, which is defined as the binary operation (2.20)
3,
The result is a new vector, which is perpendicular to the plane defined by + A and 3 with the magnitude defined as (Fig. 2.6)
C = ABsinO.
(2.21)
The direction is found by the right-hand rule, that is, when we curl the fingers of your right hand from the first vector to the second, the direction of our
62
VECTOR ANALYSIS
Figure 2.6
Cross or vector product.
3.
thumb gives the direction of Note that when the order of the vectors multiplied is reversed, the direction of also reverses. Angular momentum
3
+
L = 7 x T
(2.22)
+ r = 7 x 3
(2.23)
and t,orque (Fig. 2.6)
are two important physical properties that are defined in terms of the vector product. In the above expressions, 7 is the position vector defined with respect to an origin 0 and and are the momentum and the force vectors, respectively. In celestial mechanics we usually choose the origin as the center of attraction, M . The gravitational force is central and directed toward the center of attraction; hence the torque is zero. Since the rate of change of the angular momentum is equal to the torque, in central force problems angular momentum is conserved, that is, its magnitude and direction remains fixed. This means that the orbit of a planet always remains in the plane defined by the two vectors 7 and 3. This allows us to use plane polar coordinates in orbit calculations thereby simplifying the algebra significantly.
3
Tg
2.2
VECTOR ALGEBRA: COORDINATE REPRESENTATION
A convenient way to approach vector algebra came with Descartes through the introduction of Cartesian coordinates. We define a Cartesian coordinate system by choosing three mutually orthogonal straight lines, which we identify as the 21,x2,x3-axes, respectively. We also draw three unit basis vectors, Zl,E2,E3, along these axes (Fig. 2.7). A point P in space can now be represented by the position vector 7, which can be written as the sum of three vectors, x1E1,x222, and 2323, along their respective axes as + T = XI21
+ x2s2 + 2323,
(2.24)
VECTOR ALGEBRA: COORDINATE REPRESENTATION
63
A
e.
..
\
\'
' 1
/-----__I/
Figure 2.7
Cartesian coordinates.
where z1,22,z3 are called the coordinates of the point P or the components ---f of 7. We also use 2 for the position vector. In general any vector, A , can be written as the sum of three vectors:
--f
where A l , A2, A3 are called the components of A . We can also write a vector as the set of ordered numbers
Since the unit basis vectors are mutually orthogonal, they satisfy the relations
el . e l = 1, e l . e2 = 0, 21 .23 = 0, e2 . e l = 0, e2 . e2 = 1, e2 . e3 = 0, e3 . e l = 0, e3 . e2 = 0, 23 .23 = 1, A , - .
A
h
A
A
-
h
A
A
-
A
-
h
(2.27) (2.28) (2.29)
which can be summarized as
e2. . 23. - 6.. 2 3 , i , j = 1,2,3.
h
(2.30)
The right-hand side, S i j , is called the Kronecker delta, which is equal to 1 when the two indices are equal and 0 when the two indices are different:
(2.31)
64
VECTOR ANALYSIS
Using Equation (2.30), we can write the square of the magnitude of a vector, + A , in the following equivalent ways:
The components, Ai, are obtained from the scalar products of unit basis vectors:
Ai
+
A . 2i, i
1
1
1,2,3.
2 with
the
(2.34)
In component notation, two vectors are added by adding their respective components:
Multiplication of a vector with a scalar a is accomplished by multiplying each component with that scalar:
Dot product of two vectors is written as --i
A 3 = ( X , T I ) = A l B l + A2B2 + A3B3
(2.39) (2.40)
3
=
C AiBi.
(2.41)
i=l
Using component notation one can prove the following properties of the dot product:
65
VECTOR ALGEBRA: COORDINATE REPRESENTATION
Properties of the dot product
(2.42) (2.43) (2.44) (2.45) (2.46) (2.47) (2.48) Equation (2.47) is known as the Schwarz inequality. Equation (2.48) is the triangle inequality, which says that the sum of the lengths of the two sides of a triangle is always greater than or equal to the length of the third side. Before we write the cross product of two vectors in component notation, we write the following relations for the basis vectors: el x El = 0, e2 x el = -e3, e3 x el = e2,
h
-
,
A
-
.
A
A
h
h
h
el x e2
A
= e3,
A
e2 x Z 2 = 0, h
h
h
h
e3 x e2 = -el,
h
h
el x e3 = -e2, e2 x e3 = e l , e3 x 23 = 0.
,
h
-
.
A
,
-
.
(2.49)
The cross product of two vectors can now be written as +
A x
I? = (AiEl+ A2Z2 + A3E3) x (BlEl+ B 2 E Z + B3E3) =
(AZB3 - A3B2)21+ (&B1 - AlB3)22
We now introduce the permutation symbol
& ZJk ..
=
{
0 1 -1
&ijk,
+ (All32
(2.50) -
A2Bl)E3.
which is defined as
When any two indices are equal. For even (cyclic) permutations: 123, 231, 312. For odd (anticyclic) permutations: 213, 321, 132.
(2.51)
An important identity that the permutation symbol satisfies is 3
(2.52) i=l Using the permutation symbol, we can write the i t h component of a cross product as 3
3
(2.53) j=1 k=l
66
VECTOR ANALYSIS
Using determinants we can also write a cross product as (2.54)
Note that we prefer to use the index notation ( Z ~ , I C ~ , over Z ~ ) labeling of the axes as ( ~ , y , z )and show the unit basis vectors as (21,&,23) instead of ( 2 , j , k ) . The advantages will become clear when we introduce generalized coordinates and tensors in n dimensions. A
h
-
Example 2.2. Triple product: In applications we frequently encounter the scalar triple product +
A
‘
(3x 3)= Al(B2C3 B3C2) + A2(B3Cl + A3(BlC2 B2C1), -
-
B1C3) (2.56)
-
which is geomet$ally by the vectors A ,
equal t o the volume of a parallelepiped defined Note that
3,and 3 (Fig. 2.8).
UBC =
1 3 x 31= B h l = B ( C s i n 4 )
(2.57)
is the area of the base and h2 = Acos0 is the perpendicular height t o the base thereby giving the volume as
V = h2 . UBC = (A cos 0) BC sin q5 =
(2.58) (2.59)
2 .(3x 3).
(2.60)
Using index notation, one can easily show that
v = X (3x 3) = 3. (3x X ) = 3. (2x 3). ’
(2.61)
The triple product can also be expressed as the determinant
X . ( x x 3)= d e t
(i
A2
l3;
E;
A3
)
.
Properties of the cross product can be summarized as follows:
(2.62)
VECTOR ALGEBRA: COORDINATE REPRESENTATION
67
B Figure 2.8
Triple product.
Properties of the cross product --f
AxI?=-ZXX,
(2.63)
2 x ( T ? + z ) = x X x + f x x ,
(2.64)
(02)x 3 = o ( X x 31,a is a scalar, 2 x (Z x 3)= Z(2. 3)- Z(Z.Z), (2x 3). (2x 2)= A2B2- (2.Z)2,
(2.65) (2.66) (2.67)
~ x ( Z x z ) + ~ x ( ~ x x ) + z x ( 2 x Z ) = 0 . (2.68)
Using the index notation, we can prove Equation (2.66) as 3
3
3
j=1 k = l 3
3
1=1 m=l 3
r 3
j = 1 1=1 m = l Lk=l 3
3
3
3
1
J
68
VECTOR ANALYSIS
Figure 2.9
2.3
Equation of a line.
LINES A N D PLANES
We define the parametric equation of a line passing through a point + ( ~ 0 1 , : ~ 0 2 , 2 0 2and ) in the direction of the vector A as
3 = 3 +tx,
3= (2.71)
where 3 = ( z 1 , x 2 , ~ 3 )is a point on the line and t is a parameter (Fig. 2.9). + If the components of A are (al, a 2 , a 3 ) , we obtain the parametric equation of a line in space as
(2.72) (2.73) (2.74) In two dimensions, say on the xlx2-plane, the third equation above is absent. Thus by eliminating t among the remaining two equations, we can express the equation of a line in one of the following forms: (2.75)
(2.76) a122 - a 2 2 1
=
(202a1 - z o 1 a 2 ) .
(2.77)
Consider a plane that contains the point P with the coordinates ( 1 ~ 0 1 , 2 0 2 , 2 0 2 ) . Let 3 be any nonzero vector normal to the plane at P and let 3 be any
LINES AND PLANES
Figure 2.10
69
Equation of a plane.
point on the plane (Fig. 2.10). Since ( 2- ?) is a vector on the plane whose dot product with 2 is zero, we can write
2 = 0.
(3 - ?),
(2.78)
Since any 2 perpendicular to the plane satisfies this equation, we can also write this equation as
( 2- 3). ti? = 0,
(2.79)
where t is a parameter and i? is the unit normal in the direction of write 6 as
6 = (721,722, 723),
72;
+ + 72; 72;
= 1,
we can write the equation of a plane, that includes the point and with its normal pointing in the direction i? as 721x1
+
72222
+
72323 = [zolnl
+
202722
+
5037233.
2. If we (2.80)
(~01,2025 , 02)
(2.81)
Example 2.3. Lines and planes: The parametric equation ofthe line passing through the point ? = (3,1,1) and in the direction of A = (1,5,2) is 2l(t)
=3
+t,
+ 5t, Q ( t ) = 1 + 2t. 22(t) =
1
(2.82) (2.83) (2.84)
70
VECTOR ANALYSIS
For a line in the z122-plane passing through 3 = ( 2 , 1 , 0 ) and in the + direction of A = (1,5,0) we write the parametric equation as
2 l ( t )= 2 52(t) =
1
+t,
(2.85) (2.86)
+ 5t.
We can now eliminate t to write the equation of the line as 22 =
52,
-
9.
(2.87)
For a plane including the point 7 = (2,1,-2) and with the normal = (-1,1, I ) the equation is written as [Eq. (2.8l)l
3
+ 2 2 + 5 3 = -3.
-21
(2.88)
In general, a line in the zlzs-plane is given as a51
+ b Z 2 = c.
(2.89)
Comparing with Equation (2.77), we can now interpret the vector ( a ,b) as a vector orthogonal to the line, that is,
( a , b ) .(a1,a2) = (-a2,a1).
(a1,a2)
= 0.
(2.90)
To find the angle between two planes,
+ + = 2, + + 223 = 1,
221 -21
22
(2.91) (2.92)
23
22
we find the angle between their normals, 3 1
= (2,1,1),
(2.93)
3
= (-1,1,2),
(2.94)
2
as (2.95)
e=cos-l =
2.4
[
-2+1+2
1 cos-1 6
436
]
(2.96) (2.97)
VECTOR DIFFERENTIAL CALCULUS
2.4.1 Scalar Fields and Vector Fields We have mentioned that temperature is a scalar quantity, hence a single number is sufficient to define it at a given point. In general, the temperature inside
VECTOR DIFFERENTIAL CALCULUS
71
a system varies with position. Hence in order t o define temperature in a system completely, we have to give the temperature at each point of the system. This is equivalent to giving temperature as a function of position:
This is an example of what we call a scalar field. In general, a scalar field is a single-valued differentiable function,
f(m, z2,23),
(2.99)
representing a physical property defined in some domain of space. In short, for f ( 2 1 , 2 2 , ~ we 3 ) also write f(7) or f(2). In thermodynamics temperature is a well-defined property only for systems in thermal equilibrium, that is, when the entire system has reached the same temperature. However, granted that the temperature is changing sufficiently slowly within a system, we can treat a small part of the system as in thermal equilibrium with the rest and define a meaningful temperature distribution as a differentiable scalar field. This is called the local thermodynamic equilibrium assumption and it is one of the main assumptions of the theory of stellar structure. Another example for a scalar field is the gravitational potential in Newton’s theory, a(?). For a point mass M located at the origin, the gravitational potential is written as
a(?)
=
M -G-, r
(2.100)
where G is the gravitational constant. For a massive scalar field, the potential is given as
a(?)
e-Pr
=
k-,
r
(2.101)
where p-lis the mass of the field quanta and k is a coupling constant. We now consider compressible flow in some domain of space. Assume that the flow is smooth so that the fluid elements, which are small compared to the body of the fluid but large enough to contain many molecules, are following well-defined paths called the streamlines. Such flows are called irrotational or streamline flows. At each point of the streamline we can associate a vector tangent to the streamline corresponding to the velocity of the fluid element at that point. In order to define the velocity of the fluid, we have to give the velocity vector of the fluid elements at each point of the fluid as
This is an example of a vector field. In general, we can define a vector field by assigning a vector to every point of a domain in space (Fig. 2.11).
72
VECTOR ANALYSIS
Figure 2.11
2.4.2
Flow problems.
Vector Differentiation
Trajectory of a particle can be defined in terms of the position vector ? ( t ) , where t is a parameter, which is usually taken as the time. The velocity 3 ( t ) and the acceleration 2 ( t )are now defined as the derivatives (Fig. 2.12)
7(t + v ( t )= lim
+ At)
7 ( t )d?(t) dt
(2.103)
3 ( t )d27(t) dt2 '
(2.104)
-
at
At-0
and
3(t + a ( t )= lim
+ At)
-
At
At-0
In general, for a differentiable vector field given in terms of a single parameter
t, +
+
+
A ( t )= Ai(t)Zi A2(t)Z2 A3(t)Z3,
(2.105)
we can differentiate componentwise as
(2.106) Higher-order derivatives are found similarly according t o the rules of calculus. Basic properties of vector differentiation
d +
-(A dt
dx dx + 3)= + -, dt dt
(2.107) (2.108)
d +
-(A dt
. Z )= -ddt.xZ +
-+ d 3
A . -, dt
d + d z dx - ( A x ~ ) = X X -dt+ - Xdt Z dt
(2.109) (2.110)
GRADIENT OPERATOR
73
J Figure 2.12
Vector differentiation.
Vector fields depending on more than one parameter can be differentiated partially. Given the vector field +
A (7) = A i ( 7 ) Z I + A 2 ( 7 ) Z 2 + A3(?)Z2,
since each component is a differentiable function of the coordinates, we can differentiate it as
(2.111) x 1 , x 2 , z3,
(2.112) (2.113)
2.5
GRADIENT OPERATOR
Given a scalar field @(7) defined in some domain of space described by the Cartesian coordinates ( x l , z 2 , z 3 ) ,we can write the change in a(?) for an infinitesimal change in the position vector as
@(7 +A?)
-
@(7) =d@(7)
(2.114)
If we define two vectors: (2.1 16)
74
VECTOR ANALYSIS
x A 3
/ x1
Figure 2.13
Equipotential surfaces.
and
d?
=
(dzl,d22,d ~ g ) ,
(2.117)
Va. d?.
(2.118)
we can write d@ as dQ,(?)
=
Note that even though Q, is a scalar field, introduce the differential operator
a‘@ is a vector
field. We now
(2.119) which is called the gradient or the del operator. On its own the del operator is meaningless. However, as we shall see shortly, it is a very useful operator. 2.5.1
Meaning of the Gradient
In applications we associate a scalar field, a(?), with a physical property like the temperature, gravitational, or the electrostatic potential. Usually we are interested in surfaces on which a scalar quantity takes a single value. In thermodynamics, surfaces on which temperature takes a single value are called the isotherms. In potential theory equipotentials are surfaces on which potential is a constant, that is,
GRADIENT OPERATOR
75
If we treat C as a parameter, we obtain a family of surfaces as shown in Figure 2.13. Since (a(?;') is a single-valued function, none of these surfaces intersect each other. For two infinitesimally close points, ?;'I and ?;'2, on one of the surfaces, @(XI, ~ 2 ~ x = 3 )C , the difference (Fig. 2.14), d?;' = 7
2
(2.121)
-71,
is a vector on the surface. Thus the equation
T(a.d?;'
=0
(2.122)
indicates that ?(a is a vector perpendicular to the surface This is evident in the special case of a family of planes: (a(?;')
= 12121
+
72252
+
12323
=
(a = C
(Fig. 2.14).
c,
(2.123)
where the gradient:
T@= ( n l , n 2 , 1 2 3 ) >
(2.124)
is clearly normal to the plane. For a general family of surfaces, naturally the normal vectors depend on the position in a given surface.
Example 2.4. Equation of the tangent plane to a surface: Since the normal to a surface, F ( z 1 , ~ 2 , 2 3 ) = C, and the normal t o the tangent plane at a given point, ? = ( 2 0 1 , 5 0 2 , I C O ~ ) ,coincide, we can write the equation of the tangent plane at P as
( 2- 9 ).TF = 0,
(2.125)
where 2 is a point on the tangent plane. In the limit as 2 can write ( 2- ?) = d 2 . Hence the above equation becomes
---f
a'F. d 2
? we
= 0.
In other words, in the neighborhood of a point 3 ,the tangent plane approximately coincides with the surface. To be precise, this approximation is good to first order in ( 2- ?). 2.5.2
Directional Derivative
We now consider a case where ?;'I is on the surface (a(?) = C1 and 7 2 is on the neighboring surface (a( 7) = C2. In this case the scalar product T(a.d?;' is different from zero (Fig. 2.15). Defining a unit vector in the direction of d? as B = d 7 / Id?;'l, we write (2.126)
76
VECTOR ANALYSIS
Figure 2.14 Gradient.
which is called the directional derivative of @ in the direction of G. If we move along a path, A , that intersects the family of surfaces iP = Ci, it is apparent from Figure 2.15 tha,t the directional derivative, (2.127) is zero when ct = 7 ~ 1 2 that , is, when we stay on the same surface. It is a maximum when we are moving through the surfaces in the direction of the gradient. In other words, the gradient indicates the direction of maximum change in as we move through the surfaces (Fig. 2.15). The gradient of a scalar field is very important in applications and usually defines the direction of certain processes. In thermodynamics heat flows from regions of high temperatures to low temperatures. Hence, the heat current density, is defined as proportional to the tempcrature gradient as
f,
J’(7;f)= - k v T ( ? ) ,
(2.128)
where k is the thermal conductivity. In transport problems mass flows from regions of high concentration to low. Hence, the current density of the flowing material is taken as proportional to the gradient of concentration, p C ( ? ) , as
7(7) = -KVC(?), where
ti
is the diffusion constant.
(2.129)
DIVERGENCE AND CURL OPERATORS
Figure 2.15
2.6
77
Directional derivative.
DIVERGENCE A N D CURL OPERATORS
The del operator,
d d 7= z1+ /.a e2 -+ z3 -, 8x1 8x2 ax3
(2.130)
+ can also be used to operate on a given vector field A either as
v.2,
(2.131)
7x 2.
(2.132)
or as
The first operation results n a scalar field:
7
dAl A=8x1
4
dA2 +-+8x2
dA3 8x3
(2.133)
is called the divergence o the vector field 3,and the operator V. is called the div y e r a t o r . The second operation gives another vector field, called the curl of A , components of which are given as
(2.134)
78
VECTOR ANALYSIS
or as
dA3
dA2
dAz
where
Vx
dA1
dA3
dA1
(2.135)
is called the curl operator and di stands for d / d x i .
Basic properties of the gradient, divergence, and the curl operators
a'(d$)
=
$Vd + dV$,
(2.136) (2.137) (2.138)
V x ( X + 3)= V x x + V x 3,
(2.139) (2.140)
2.6.1 Meaning of Divergence and the Divergence Theorem For a physical understanding of the divergence operator we consider a tangible case like the flow of a fluid. The density of the fluid, p ( 7 , t ) , is a scalar field and gives the amount of fluid per unit volume as a function of position and time. The current density, J(?,t ) ,is a vector field that gives the amount of fluid flowing per unit area per unit time. Another critical parameter related t o the current density is the flux of the flowing material through an area element A b . Naturally, flux depends on the relative orientation of 7 and A d . For an infinitesimal area, d 3 ,flux is defined as (2.141) (2.142) (2.143) which gives the amount of matter that flows through the infinitesimal area element da per unit time in the direction of the unit normal 5i to the surface (Fig. 2.16). Notice that when the area element is perpendicular to the flow, that is, 0 = ./a, the flux is zero. We now consider a compressible flow such as a gas flowing in some domain of space, which is described by the current density = ( J l ,J2,J3) and the
7
DIVERGENCE AND CURL OPERATORS
79
/
Figure 2.16
Flux through a surface.
matter density p. Take a small rectangular volume element
Ar = A x ~ A x ~ A x ~
(2.144)
centered at 7= (*, *, *) as shown in Figure 2.17. The net amount of matter flowing per unit time in the x2 direction into this volume element, that is, the net flux 4 2 in the 2 2 direction, is equal to the sum of the fluxes from the surfaces 1 and 2:
A42
= [ J ( x l ,0
+ Ax2)
23,
t)
+ T1\(X1,0,23,t ) ] 6AzlAx3 '
(2.145)
(2.147) where for the flux through the second surface we have used the Maclaurin series expansion of J ' ( 7 , t )for 2 2 and kept only the first-order terms for a sufficiently small volume element. Note that the flux through the first surface is negative, since 5 2 and the normal 6 to the surface are opposite in direction. Similar terms are obtained for the other two pairs of surfaces. Thus their sum gives us the net amount of material flowing into the volume element:
a4 = a41 + A42 + A43
(2.148)
Since the choice for the location of our rectangular volume element is arbitrary, for an arbitrary point in our domain we can write (2.150)
80
VECTOR ANALYSIS
"3
Figure 2.17
which is nothing but
Ad
=
Flux through a cube.
"
'
J'(7,t p r .
(2.151)
Notice that when the net flux A 4 is positive, it corresponds to a net loss of dP matter within the volume element Ar. Hence we equate it t o ----Or. Since dt
d 7
the position of the volume element is fixed, that is, dt = 0, we can write
-!&AT dt
=-
-
d p dx2 [dp& +--+--+-
8x1 dt
8x2 dt
dp d ~ 8x3 dt
dP
3
"I
dt
AT
(2.152)
--AT
at
to obtain (2.153) Since the volume element AT is in general different from zero, we can also write
a'
'
J'(?,t)
+ d P ( 7 , t ) = 0,
(2.154)
DIVERGENCE AND CURL OPERATORS
81
For a compressible fluid flow, current density can be related to the velocity field of the fluid as
7(?,t)
=p(?,t)T+(T,t),
(2.155)
where 3 ( ? , t ) is the velocity of the fluid element at ? and t. Equation (2.154) is called the equation of continuity and it is one of the most frequently encountered equations of science and engineering. It is a general expression for conserved quantities. In the fluid flow case it represents conservation of mass. In the electromagnetic theory, p stands for the electric charge density and is the electric current density. Now the continuity equation becomes an expression of the conservation of charge. In quantum mechanics, the continuity equation is an expression for the conservation of probability, where p = 99*is the probability density, while is the probability current density. For a finite rectangular region R with the surface area S we can use a network of n small rectangular volume elements, each of which satisfies
7
7
(2.156)
where the subscript i denotes the i t h volume element at T i . When we take the sum over all such cells and consider the limit as n -+ 03, fluxes through the adjacent sides will cancel each other, thus giving the integral version of the continuity equation as
Since the integral on the right-hand side is convergent, we can interchange the order of the derivative and the integral t o write
(2.158)
sv
where in the last step we have used total derivative since p ( 7 , t ) d r is only a function of time. The right-hand side is the rate of change of the total
dm
amount of matter, m, within the volume V . When m is conserved, - = 0. dt In other words, unless there is a net gain or loss of matter from the region, the divergence is zero. If there is net gain or loss of matter in a region, it implies the presence of sources or sinks within that region. That is, a nonzero divergence is an indication of the presence of sources or sinks in that region.
82
VECTOR ANALYSIS
It is important to note that if the divergence of a field is zero in a region, it does not necessarily mean that the field there is also zero, it just means that the sources are elsewhere. Divergence theorem: Another way to write the left-hand side of Equation (2.158) is by using the definition of the total flux, j SJ’ . 6 do,that is, the net amount of material flowing in or out of the surface, S , per unit time, where S encloses the region R with the volume V . Equating the left-hand side of Equation (2.158) with the total flux gives
9 . f ( T + t, ) d r =
f
.??do,
(2.159)
where 6 is the outward unit normal to the surface S bounding the volume V and do is the area element of the surface (Fig. 2.18). Equation (2.159), which is valid for any piecewise smooth surface S with the volume V and the outward normal 6, is called Gauss’s theorem or the divergence theorem, which can be used for any differentiable and integrable vector field Gauss’s theorem should not be confused with Gauss’s law in electrodynamics, which is a physical law. A formal proof of the divergence theorem for any piecewise smooth surface that forms a closed boundary with an outward unit normal 6 can be found in Kaplan. Using the divergence theorem for an infinitesimal region, we can write an integral or an operational definition for the divergence of a vector field f as
7.
(2.160)
where S is a closed surface enclosing the volume V. In summary, the divergence is a measure of the net in or out flux of a vector field over the closed surface S enclosing the volume V. I t is for this reason that a vector field with zero divergence is called solenoidal in that region. Derivation of the divergence theorem has been motivated on the physical model of a fluid flow. However, the result is a mathematical identity valid for a general differentiable vector field. Even though f . 6 da represents the flux of f through d d , may not represent any physical flow. As a mathematical identity, divergence theorem allows us to convert a volume integral to an integral over a closed surface, which then can be evaluated by using whichever is easier.
7
VECTOR INTEGRAL CALCULUS IN TWO DIMENSIONS
Figure 2.18
2.7
2.7.1
83
Area element on S.
VECTOR INTEGRAL CALCULUS IN T W O DIMENSIONS Arc Length and Line Integrals
A familiar line integral is the integral that gives the length of a curve as 1=
]c d s ] d d x f + d x i , =
(2.161)
C
where C denotes the curve the length of which is to be measured and s is the arc length. If the curve is parameterized as
(2.162) (2.163) we can write 1 as
1 We can also use either
=
21
.I,
/(%)2
or
22
+ (%)
2
dt.
(2.164)
as a parameter and write
(2.165) or
(2.166)
84
VECTOR ANALYSIS
Line integrals are frequently encountered in applications with linear densities. For example, for a wire with linear mass density ~ ( s )we, can write the total mass as the line integral
(2.167) or in parametric form as
Extension of these formulas to n dimensions is obvious. In particular, for a curve parameterized as ( z l ( t )z,2 ( t ) x, 3 ( t ) )in three dimensions, the arc length can be written as
(2.169) If the coordinate
z1
is used as the parameter, then the arc length becomes
(2.170) Example 2.5. W o r k done o n a particle: An important application of the line integral is the expression for the work done on a particle moving along a trajectory under the influence of a force 3 as
W=
ds,
(2.171)
where FT is the tangential component of the force, that is, the component along the displacement ds. We can also write W as
(2.172)
(2.173)
(2.174)
(2.175)
VECTOR INTEGRAL CALCULUS IN TWO DIMENSIONS
85
Normal and tangential components of force.
Figure 2.19
Using the relations (Fig. 2.19) F1 =
FT cos
+ FNcos
F2 =
FT sin a
-
(5 (5
- a ) = FTcos a
- a ) = FTsin a
FN sin
+ FN sin a ;
(2.176)
FN cos a ,
(2.177)
-
and
dxl = d s c o s a , dxz = d s s i n a ,
(2.178)
we can easily show the following equivalences:
W
=
L +L
+
[ F ~ c o s a F ~ s i n ac]o s a ds
[FTsin a
=
-
FNcos a]sin a ds
FT ds.
(2.179) (2.180)
In most applications, line integrals appear in combinations as
(2.181) which we also write as
(2.182)
Jc
We can consider P and Q as the components of a vector field Ti? as iij’ =
p(xi,xz)gi + Q(xi,x2)22.
(2.183)
86
VECTOR ANALYSIS
Figure 2.20
Unit tangent vector.
Now the line integral [Eq. 2.1811 can be written as
L
+ Q ( x 1 , m )dx2 =
P ( x I , x ~dzl )
LWT
ds,
(2.184)
where WT denotes the tangential component of %3 in the direction of the unit tangent vector ?(Fig. 2.20):
-t = -eldx1,
dxz+ -e2 ds + (sin a )2 2 ,
ds = (cos a )
(2.185) (2.186)
Using Equation (2.186) we write A
~ ~ = 8 ~ t = P c o s a + Q s i n a ,
(2.187)
hence proving
L
wT ds = =
(PcosQ
P dxl
+ Q sin a )ds
+ Q dxz
(2.188) (2.189)
If we represent the path in terms of a parameter t , we can write .d?
=L
P dzl + Q dx2
(2.190) (2.191) (2.192)
Example 2.6. Change i n kinetic energy: If we take 7 as the position of a particle of mass m moving under the influence of a force the
2,
VECTOR INTEGRAL CALCULUS IN TWO DIMENSIONS
87
work done on the particle is written as
W=L?.d?
(2.193)
= L T . d?=
dt.
(2.194)
d 3 d 3 . Substituting the second law of Newton, 3 = m-, where - is the dt dt acceleration of the particle, we can write W as
(2.195) (2.196) (2.197) (2.198) (2.199) The quantity we have defined as
T
= -mv 1 2
2
,
(2.200)
is nothing but the kinetic energy of the particle. Hence the work done on the particle is equal to the change in kinetic energy.
2.7.2
Surface Area and Surface Integrals
We have given the expressions for the arc length of a curve in space. Our main aim is now to find the corresponding expressions for the area of a surface in space, which could either be given as x3 = X S ( X ~ , X or ~ ) in parametric form as
(2.201) (2.202) (2.203) Generalizations of the formulas [Eqs. (2.170) and (2.169)] to the area of a given surface are now written as
88
VECTOR ANALYSIS
or as
S where
..’.J’
(2.205)
dudv,
=
+(z) +($), (z)2+(z)2+(z)2 2
2
2
E = ( 2 )
(2.206)
8x1 8x1 F = --+--+--
du dv
G=
8x2
8 x 3 ax3
du dv
du dv
ax2
(2.207)
(2.208)
A propcr treatment of the derivation of this result is far too technical for our purposes. However, it can be found in pages 371-378 of Treatise on Adva~iced Calculus by Franklin. An intuitive geometric derivation can be found in Advanced Calculus by Kaplan. We give a rigorous derivation when we introduce the generalized coordinates and tensors in the following chapter. For the surface analog of the line integral
(2.209) we write
I
=
/ s, ?
.db.
(2.210)
Consider a sniooth surface S with the outer unit normal defined as ~ = c o s f f ~ ~ + c o s ~ & + c o s y ~ ~
(2.211)
and take ? = (V1, V2,V3) to be a continuous vector field defined on S. We can now write the surface integral I as
/ /;(7
.6)d a =
/s,
(V1 cos 0
+ v2c o s p + v3cosy) d a
where we used the fact that projections of the surface area element d b = %a onto the coordinate planes are given as cosa d a = dx2dx3, cosp d o = dx3dx1, and cos y d a = d x l dx2. Similar to line integrals, a practical application of surface integrals is with surface densities. For example, the mass of a sheet described by the equation
VECTOR INTEGRAL CALCULUS IN TWO DIMENSIONS
89
x3 = x3(x1,2 2 ) with the surface density o ( x ~ , xx3) ~ ,is found by the surface integral
or in terms of the parameters u and v as (2.2 14)
2.7.3
An Alternate Way to Write Line Integrals
We can also write the line integral
L P d x l + Q dx2
(2.2 15)
as
(2.216) where the vector field
3 is defined
as + v = QZ1- P2z
(2.217)
and 2 is the unit normal to the curve, that is, the perpendicular to 2.21):
(Fig.
Fi=txz3
-
dx2-el ds
-
dxi-e2. ds
(2.218)
3 now becomes v, = 3 . 6
The normal component of
=z
(QE1- P&) . d ~ 2 dxl + P-,ds ds
= Q-
(2.2 19) el
-
-e2 ds
(2.220) (2.221)
which gives
(2.222)
90
VECTOR ANALYSIS
8
Figure 2.21
If we take
3
1
Normal and the tangential components.
3 as + u -
P dxl
+ Q dx2,
(2.223)
we get
(2.224) =
L
-Q dxl
+ P dx2.
(2.2 25)
Example 2.7. Line integrals: Let C be the arc y = x 3 from (0,O) t o (- 1,l).T h e line integral
I =
Ic
can be evaluated as
.I
-1
I =
+ x3y d y
(2.226)
+ 3 ~ ' )dx
(2.227)
xy2 dx
(x'
2'
-
(2.228)
8 -
_1 - - 1 8
3
(2.229) (2.230)
Example 2.8. Closed paths: Consider the line integral
I
=
P
y3 d x + x 2 d y
(2.231)
VECTOR INTEGRAL CALCULUS IN TWO DIMENSIONS
Figure 2.22
91
Closed path in Example 2.8.
over the closed path in Figure 2.22. The first integral from (0,O) to (1,O) is zero, since y = 0 along this path. In the second part of the path from (1,O) to (1.1) we use y as our parameter and find
1'
1 dy = 1.
(2.232)
We use x as a parameter t o obtain the integral over y = x as
lo
x3 dx
+ x2 dx
-
(2.234) 7 -_
(2.235)
12'
Finally, adding all these we obtain I 2.7.4
(2.233)
=
A.
Green's Theorem
Theorem 2.1. Let D be a simply connected domain of the xlz2-plane and let C be a simple (does not intersect itself) smooth closed curve in D with its interior also in D. If P ( z l , x 2 ) and Q(x1,z2) are continuous functions with continuous first partial derivatives in D , then (2.236) where R is the closed region enclosed by C. Proof: We first represent R by two curves,
a 5x15 4
fl(X1)
5 2 2 F f2(x1),
(2.237)
as shown in Figure 2.23 and write the second double integral in Equation (2.236) as
92
VECTOR ANALYSIS
I
I
I
l
a
b
Figure 2.23
*x
1
Green’s theorem.
The integral over x2 can be taken immediately to yield
(2.240) P(Xi,X2) dXl.
=
(2.241)
Similarly, we can write the other double integral in Equation (2.236)as
(2.242) thus proving Green’s theorem. Example 2.9. Green’s Theorem: Using Green’s theorem [Eq. (2.236)], we can evaluate
I =
16xy3 dx + 24x2y2dy,
where C is the unit circle, x2+y2 = 1, and P I
=
/
=
(2.243)
16xy3and Q = 24x2y2as
L(48xy2 - 48xy2) dxdy = 0.
(2.244)
93
VECTOR INTEGRAL CALCULUS IN TWO DIMENSIONS
Example 2.10. Green’s Theorem: For the integral 2y
dx+-
X
22
+ y2
dY,
(2.245)
+
where C is the circle x2 y2 = 2, we cannot apply Green’s theorem since P and Q are not continuous at the origin.
Example 2.11. Green’s Theorem:
For the integral
(32 - y) dx
I =
+ (X + 5y) dy
/L(1+
1) dxdy
= 2A.
(2.246)
(2.247) (2.248)
where A is the area enclosed by the closed path C.
2.7.5
Interpretations of Green’s Theorem
I. If we take
in [Eq. (2.184)] (2.250)
where W T is the tangential component of 3, that is, W T = 3 . T a n d notice that the right-hand side of Green’s theorem [Eq. (2.236)] is the 2 3 component of the curl of that is,
a,
(2.251)
we can write Green’s theorem as (2.252)
This is a special case of Stokes’s theorem that is discussed in Section 2.8. 11. We have seen that if we take 3 as + v = QZ1 - P&,
(2.253)
94
VECTOR ANALYSIS
we can write the integral
I =
P dxl + Q dx2
(2.254)
as [Eq. (2.2lG)l
I=Lg.??ds=
(2.255)
Now, the Green’s theorem for 5 can be written as (2.256) (2.257) This is the two-dimensional version of the divergence theorem [Eq. (2.159)]. 111. Area inside a closed curve: If we take P = 2 2 in Equation (2.241) or Q = x1 in Equation (2.242), we obtain the area of a closed curve as
/J
dxldx2 = R
i
52
=i x 1
dxl
dx2.
(2.258) (2.259)
Taking the arithmetic mean of these two equal expressions for the area of a region R enclosed by the closed curve C, we obtain another expression for the area A as
which the reader can check with Green’s theorem [Eq. (2.236)]. 2.7.6
Extension to Multiply Connected Domains
When the closed Dath C in Green’s theorem encloses Doints a t which one or dP both of the derivatives - and - do not exist, Green’s theorem is not
aQ
817:1
8x2
applicable. However, by a simple modification of the path, we can still use Green’s theorem to evaluate the integral
I
=
fc P dxl + Q dx2.
(2.261)
Consider the doubly connected domain D shown in Figure 2.24 (left) defined by the boundaries a and b, where the closed path C1 encloses the hole in the
VECTOR INTEGRAL CALCULUS IN TWO DIMENSIONS
Figure 2.24
95
Doubly connected domain and the modified path.
domain. As it is, Green’s theorem is not applicable. However, if we modify our path as shown in Figure 2.24 (right), so that the closed path is inside the region, where the functions P, Q and their first derivatives exist, we can apply Green’s theorem to write
[Pdxl + Q d ~ z ] (2.262) (2.263) where R is now the simply connected region bounded by the closed curve C = C1 L1 + L2 C2.We can choose the two paths L1 and L2 as close as possible. Since they are traversed in opposite directions, their contributions cancel each other, thereby yielding
+
+
In particular, when (2.265) we obtain
96
VECTOR ANALYSIS
“‘r
tx2
Figure 2.25
Paths in Example 2.12.
The advantage of this result is that by choosing a suitable path C2, such as a circle, we can evaluate the desired integral I , where the first path, C1, may be awkward in shape (Fig. 2.24, left). Example 2.12 Multiply-connected domains: Consider the line integral
where C1 and C2 are two closed paths inside the domain D defined by two concentric circles x: + xz = a2 and xf xz = b2 as shown in Figure 2.25 (left). For the path C1, P, Q, and their first derivatives exist inside aP C1.Furthermore, since and - are equal:
+
aQ
8x2
8x1
(2.269)
I is zero. For the second path C2, which encloses the hole at the center, we modify it as shown in Figure 2.25 (right), where C, is chosen as a circle with radius T ( T > a),so that the integral on the right-hand side of (2.270)
97
CURL OPERATOR AND STOKES'S THEOREM
L
Figure 2.26
Curl of a vector field
can be evaluated analytically. The value of the integral over 6'3, x? xi = r 2 , can be found easily by introducing a new variable 6' :
I
=
.d'
+
x1 = T C O S ~ ,
(2.271)
x2 = T sin 0, Q E [O, 2 ~ ]
(2.272)
[ P dxl
+ Q dxz]=
(2.273)
.1c3
2.8
CURL OPERATOR AND STOKES'S THEOREM
2.8.1 On the Plane Consider a vector field on the plane and its line integral over the closed rectangular path shown in Figure 2.26 as
We first consider the integral over C1, where
22 = 2 0 2 :
98
VECTOR ANALYSIS
Figure 2.27
Infinitesimal rectangular path.
Expanding 7 ( 5 1 , 2 2 ) in Taylor series about linear terms, we write
+
( ~ 0 1 , 5 0 2and )
(22
keeping only the
(2.276)
- 2021,
which, when substituted into Equation (2.275), gives
We now take the integral over C3, where Ax1 to 501:
201
+
2 2 = x02
+ Ax2 and x1 varies from
xo1 i 3 d ( z 1 , x 2 ) . d 7 = /
dZiVi(x1,xoz
+ Axz).
(2.278)
zoi+A~i
Substituting the Taylor series expansion [Eq. (2.276)] of ( 2 0 1 , xo2) evaluated at 2 2 = 2 0 2 Ax2:
+
Vl(x1, x2)
about
(2.279) into Equation (2.278) and integrating gives
-
AX2AX1.
(2.280)
CURL OPERATOR AND STOKES’S THEOREM
99
Note the minus sign coming from the dot product in the integral. Combining these results [Eqs. (2.277) and (2.280)] we obtain
L
.k
T ( x 1 , ~ 2 ) . d 7 + T‘(xi,x2).d?=
I
-
Ax~Ax~.
dV1(x11x2) dx2 (xo1 J 0 2 )
A similar procedure for the paths
C2
(2.281)
and C, yields
(2.282) which, after combining with Equation (2.281), gives
Since the location of 7 0 is arbitrary on the xlx2-plane, we can drop the subscript 0. If we also notice that the quantity inside the square brackets is the 2 3 component of x we can write
a‘ d,
(2.284) The approximation we have made by ignoring the higher-order terms in the Taylor series expansion is justified in the limits as Ax1 0 and Ax2 -+ 0. For an infinitesimal rectangle we can replace Axl with dxl and Ax2 with dx2. Similarly, AD= Ax1Ax2 can be replaced with the infinitesimal area element da12. For a finite rectangular path C, we can sum over the infinitesimal paths as shown in Figure 2.27. Integrals over the adjacent sides cancel, thereby leaving only the integral over the boundary C as -+
=
/l(7 7) x
.6da12,
(2.285)
where C is now a finite rectangular path. The right-hand side is a surface integral to be taken over the surface bounded by C , which in this case is the region on the xlx2-plane bounded by the rectangle C, with its outward normal 6 defined by the right-hand rule as the Z3 direction. Using Green’s theorem [Eq. (2.252)] in Equation (2.284), we see that this result is also valid for an arbitrary closed simple path C on the xlx2-plane as (2.286)
100
VECTOR ANALYSIS
tz3
Figure 2.28
Different surfaces with the same boundary C.
tf
where VT is the tangential component of along C and R is the region bounded by C. The integral on the right-hand side is basically a surface integral over a surface bounded by the curve C , which we have taken as S 1 lying on the zlzz-plane, with its normal 6 as defined by the right-hand rule (Fig. 2.28, left). We now ask the question, What if we use a surface Sz in three dimensions (Fig. 2.28, right), which also has the same boundary as the planar surface in Figure 2.28 (left)? Does the value of the surface integral on the right-hand side in Equation (2.286) change? Since the surface integral in Equation (2.286) is equal to the line integral (2.287) which depends only on and C, its value should not depend on which surface we use. In fact, it does not, provided that the surface is oriented. An oriented surface has two sides: an inside and an outside. The outside is defined by the right-hand rule. As in the first two surfaces in Figure 2.29, in an oriented surface one cannot go from one side to the other without crossing over the boundary. In the last surface in Figure 2.29 we have a Mobius strip, which has only one side. Following a closed path, one can go from one “side” with the normal 6i to the other side with the normal 60,which points exactly in the opposite direction, without ever crossing a boundary. Consider two orientable surfaces S 1 and S, with the same boundary C and cover them both with a network of simple closed paths Ci with small areas, Aa,, each (Fig. 2.30). In the limit as Aai 4 0, each area element naturally coincides with the tangent plane to the surface of which it belongs at that point. Depending on their location, normals point in different directions. For
CURL OPERATOR AND STOKES'S THEOREM
Figure 2.29
101
Orientable surfaces versus the Mobius strip.
each surface element we can write
?.d7
=
(? x 74) .%iAui,
(2.288)
where i denotes the ith surface element on either 5'1 or 5'2, with the boundary Ci.For the entire surface we have t o sum these as
l and for S (2.290) for S2. Since the surfaces have different surface areas, 1 and m are different in general. In the limit as 1 and m go to infinity, contributions coming from adjacent sides will cancel. Thus the sums on the left-hand sides of Equations (2.289) and (2.290) reduce t o the same line integral over their common boundary C : (2.291) On the other hand, the sums on the right-hand sides become surface integrals over their respective surfaces, S 1 and S2. Hence, in general we can write (2.292) where S is any oriented surface with the boundary C.
102
VECTOR ANALYSIS
Figure 2.30
2.8.2
Two orientable surfaces
In Space
In Equation (2.292) even though we took S as a surface in three-space, its boundary C is still on the zlx2-plane. We now generalize this result by taking the closed simple path C also in space. Stokes’s Theorem: Consider a smooth oriented surface S in space with a smooth simple curve C as its boundary. Then for a given continuous and differentiable vector field (2.293) in some domain D of space, which includes S, we can write
(2.294)
where 6 is the outward normal to S. Proof: We first write Equation (2.294) as
i
? . d?
=
V, dzl
+ V, dz2 + V3 dz3
(2.295)
CURL OPERATOR AND STOKES’S THEOREM
103
P-7 C
I I
I
I
I I
Figure 2.31
Stokes’s theorem in space.
which can be proven by proving three separate equations:
(2.298) We also assume that the surface S can be written in the form x3 = f(z1,xz)
(2.299)
and as shown in Figure 2.31, C12 is the projection of C onto the xlx2-plane. Hence, when (x1,~ 2 ~ x goes 3 ) around C a full loop, the corresponding point ( X I , 2 2 , O ) also completes a full loop in C l2 in the same direction. We choose the direction of 2 with the right-hand rule as the outward direction. Using Green’s theorem [Eq. (2.236)l with Q = 0, we can write
=
-
1L,,[9 9x1
We now use Equation (2.212) with
ax2
+
ax3
ax2
dxldx2.
(2.300)
7taken as (2.301)
104
VECTOR ANALYSIS
7
Note that in Equation (2.212) is an arbitrary vector field. Since the normal 75’ to d a is just the gradient to the surface 23
-
f (51,XZ)= 0,
(2.302)
that is,
*=
af
(--,--,I),
8x1
af
(2.303)
ax2
we write Equation (2.212) as
Since
where y is the angle between ?3 and 3, d a co s y is the projection of the area element, d b ,onto the xlx2-plane, that is,
We can now rewrite the left-hand side of Equation (2.304) as an integral over Rl2 to obtain the relation -
/ 1,2[zg + 21
dxldx2=
/ ./,2
dx3dx1 - dV1 dxldx2. 8x2 (2.307)
Substituting Equation (2.307) into (2.300), we obtain
which is Equation (2.296). In a similar fashion we also show Equations (2.297) and (2.298). Finally, adding Equations (2.296)-(2.298) we establish Stokes’s theorem.
MIXED OPERATIONS WITH THE DEL OPERATOR
105
A
n
Figure 2.32
2.8.3
Unit normal to circular path.
Geometric Interpretation of Curl
We have seen that the divergence of a vector field ? is equal to the ratio of the flux through a closed surface S to the volume enclosed by S in the limit as the surface area of S goes to zero [Eq. (2.160)], that is, (2.309) Similarly, we can give an integral definition for the value of the curl of a vector field in the direction G as
7
=
lim
r-0
jCr7. d 7 A,
(2.310) 1
where C, is a circular path with radius r and area A,, and G is the unit normal to A, determined by the right-hand rule (Fig. 2.32). In the limit as the size of the path shrinks to zero, the surface enclosed by the circular path can be replaced by a more general surface with the normal 6. Note that this is also an operational definition that can be used to construct a “curl-meter,” that is, an instrument that can be used to measure the value of the curl of a vector field in the direction of the axis of the instrument, 6.
2.9
MIXED OPERATIONS W I T H T H E DEL OPERATOR
3
By paying attention to the vector nature of the operator and also by keeping in mind that it is meaningless on its own, we can construct several other useful operators and identities. For a scalar field, @(7), a very useful operator can be constructed by taking the divergence of a gradient as
3.3@(7) = 32@(7),
a‘
(2.311)
where the operator is called the Laplacian or the Laplace operator, which is one of the most commonly encountered operators in science. Two
106
VECTOR ANALYSIS
very important vector identities used in potential theory are ~ ' . ( V+ AX )=O
(2.312)
and
Txvk=o,
(2.313)
where 2 and Q? are differentiable vector and scalar fields, respectively. In other words, the divergence of a curl and the curl of a gradient are zero. Using the definition of the operator, proofs can be written immediately. For Equation (2.312) we write
a'
a ' . ( v x x ) = d e t ( : A1
Az
2 ), A3
(2.314)
and obtain
(2.315)
(2.316)
(2.317) Since the vector field is differentiable, the order of differentiation in Equation (2.317) is unimportant and so the divergence of a curl is zero. For the second identity [Eq. (2.313)], we write (2.318)
(2.319) = 0.
(2.320)
Since for a differentiable Q? the mixed derivatives are equal, we obtain zero in the last step, thereby proving the identity.
107
MIXED OPERATIONS WITH THE DEL OPERATOR
Another useful identity is
a'.( U V V ) = a'u . a'v + u w u ,
(2.321)
where u and u are two differentiable scalar fields. We leave the proof of this identity as an exercise, but by using it we prove two very useful relations. We first switch u and u in Equation (2.321):
a'.(ua'u)= v u . vu + va'2u;
(2.322)
then we subtract this from the original equation [Eq. (2.32l)l to write
a'.( U V U ) - a'.(va'u)= ua'2u
-
va'2u.
(2.323)
Integrating both sides over a volume V bounded by the surface S and using the divergence theorem (2.324) we obtain Green's first identity:
d 2 . [ua'u - ua'u] =
d r [ua"u
-
ua'"u] .
(2.325)
Applying the similar process directly to Equation (2.321), we obtain Green's second identity:
~ d d - u ~ u - ~ d r [ ~ u - a ' ~ - u a ' ~ u(2.326) ]. Useful vector identities (2.327)
a'Xa'f=O,
9 .(T x 3 )= o ,
(2.328) (2.329)
(2.330) a' x a' x 2 = a'(T.2) v22, a ' . ( X x 3)= Z . ( ( a ' X 2 )- 2 . ( V x 3), (2.331) V ( 2 .3)= 2 x (a'x 3)+ 3 x (a'x 2)+ ( 2 .T ) 3+ (8. V)2, -
(2.332)
3 x (2x 3)= X p . 3) 3 p .2)+ (8. a')X - (2. V)3. -
(2.333)
108
VECTOR ANALYSIS
A
Figure 2.33
Gravitational force and gravitational field.
2.10 POTENTIAL THEORY The gravitational force that a point mass M located a t the origin exerts on another point mass m a t ? is given by Newton’s law (Fig. 2.33) as
Mm, 3 = -G-er, r2
(2.334)
where G is the gravitational constant and Zr is a unit vector along the radial direction. Since mass is always positive, the minus sign in Equation (2.334) indicates that the gravitational force is attractive. Newton’s law also indicates that the gravitational force is central, that is, the force is directed along the line joining the two masses. We now introduce the gravitational field 3 due to the mass M as +
M, r2
g = -G-er,
(2.335)
which assigns a vector to each point in space, with a magnitude that decreases with the inverse square of the distance and always points toward the central mass M (Fig. 2.33). Gravitational force that M exerts on another mass m can now be written as
F’
=m
3.
(2.336)
In other words, M attracts m through its gravitational field, which eliminates the need for action at a distance. Field concept is a very significant step in understanding interactions in nature. Its advantages become even more clear with the introduction of the Lagrangian formulation of continuum mechanics and then the relativistic theories, where the speed of light is the maximum speed with which any effect in nature can propagate. Of course, in Newton’s theory the speed of light is infinite and the changes in a gravitational field at
POTENTIAL THEORY
109
a given point are felt everywhere in the universe instantaneously. Today the field concept is an indispensable part of physics, at both the classical and the quantum level. We now write the flux 4 of the gravitational field of a point mass over a closed surface S enclosing the mass M (Fig. 2.34, left) as
(2.337) Since the solid angle, do, subtended by the area element du is (2.338) (2.339) (2.340) we can write the flux as
(2.341) = -GM i d , , ,
whcre dA is the area element in the direction of surface gives
(2.342)
&. Integration over the entire
4 = -4rGM,
(2.343)
where the solid angle subtended by the entire surface is 47r. We now use the divergence theorem [Eq. (2.159)] to write the flux of the gravitational field as
(2.344) which gives
7 .T d r = -47rGM,
(2.345)
where V is the volume enclosed by the closed surface S. An important property of classical gravity is linearity; that is, when there are more than one particles interacting with m, the net force that m feels is the
110
VECTOR ANALYSIS
Figure 2.34
Flux of the gravitational field.
sum of the forces that each particle exerts on m as if it were alone. Naturally, for a continuous distribution of matter with density p ( 7 ) interacting with a point mass m, the mass M in Equation (2.345) is replaced by an integral:
M
+
L
p ( 7 ) dr.
We now write Equation (2.345) as
.If 3 . T or as
d r = -47rG
L (7.+ T
L
(2.346)
p ( 7 ) dr
(2.347)
I r G p ( 7 ) ) d r = 0.
(2.348)
For an arbitrary but finite volume element, the only way to satisfy this equality is to have the integrand vanish, that is,
3 . ?J’+ 47rGp( 7)= 0,
(2.349)
which is usually written as
3 .3 = -47rGp(7).
(2.350)
This is the classical gravitational field equation to which Einstein’s theory of gravitation reduces in the limit of weak fields and small velocities. Given the
POTENTIAL THEORY
111
mass distribution p(?), it gives a partial differential equation to be solved for the gravitational field T . If we choose a closed surface that does not include the mass M , then the net flux over the entire surface is zero. If we concentrate on a pair of area elements, dAl and d A 2 , in the figure on the right (Fig. 2.34), we write the total flux as
+
(2.351)
d412 = d d i d4z = - G M d R l + GMdR2.
(2.352)
Since the solid angles, dR1 and dR2, subtended at the center by dA1 and Az, respectively, are equal, d41 and dq52 cancel each other. Since the total flux is the sum of such pairs, the total flux is also zero. The gravitational field equation to be solved for a region that does not include any mass is given as
T.7j+=O0.
(2.353)
As we have mentioned before, this does not mean the gravitational field is zero in that region, but it means that the sources are outside the region of interest.
2.10.1
Gravitational Field of a Spherically Symmetric Star
For a spherically symmetric star with density p ( r ) , the gravitational field depends only on the radial distance from the origin. Hence we can write ++
(
'
(2.354)
= g(')cT,
where ZT is a unit vector pointing radially outwards. To find g ( r ) ,we choose a spherical Gaussian surface, S ( T ) ,with radius r. Since the outward normal to a sphere is also in the ZT direction, we utilize the divergence theorem to convert the volume integral in Equation (3.347) to a surface integral,
JI,,, 7 .T
dr
and write
g ( r ) (ZT . G ) do
=
-4rG
f ' g ( r ) r z dR = -47rG
g(r)r2
T . d*,
=
f' dR = -47rG
L(T)
dr,
(2.356)
p(r)r2drdR,
(2.357)
f' dR,
(2.358)
P(T)
s,,) 1
(2.355)
p(r)r2 d r
where d o = r2dR = r2 sin 9 d9d+ is the infinitesimal surface area element of the sphere. Since $dR = 4n, we obtain the magnitude of the gravitational
112
VECTOR ANALYSIS
Figure 2.35
Work done by the gravitational field.
field as
(2.359) (2.360) (2.361) An important feature of this result is that part of the mass lying outside the Gaussian surface, which is a sphere of radius r, does not contribute to the field at r and the mass inside the Gaussian surface acts as if it is concentrated at the center. Note that dm, is the mass of an infinitesimal shell at r with thickness dr. Similarly, if we find the gravitational field of a spherical shell of radius R, we find that for points outside, r 2 R, the shell behaves as if its entire mass is concentrated a t the center. For points inside the shell, the gravitational field is zero. These interesting features of Newton’s theory of gravity also remain intact in Einstein’s theory, where they are summarized in terms of Birkhoff’s theorem.
2.10.2 Work Done by Gravitational Force We now approach the problem from a different direction. Consider a test particle of mass m moving along a closed path C in the gravitational field of another point particle of mass M (Fig. 2.35). The work done by the
POTENTIAL THEORY
113
gravitational field on the test particle is (2.362) (2.363) where gT is the tangential component of the gravitational field of M along the path. Using Stokes’s theorem [Eq. (2.294)], we can also write this as
W
=m
i 3. d 7
(2.364) (2.365)
If we calculate 3 x for the gravitational field of the point mass M located at the origin, we find
a‘~y=v’x =-GMTx
(2.366)
[
X l G (Lc;
+
x2z2
+ x323
+ x$ + x 3 3 / 2
= 0.
1
(2.367) (2.368)
3 T
Substituting x = 0 into Equation (2.365), we obtain the work done by the gravitational field on a point particle m moving on a closed path as zero. If we split a closed path into two parts as C1 and C2, as shown in Figure 2.35, we can write (2.369) CZ
c1
Interchanging the order of integration, we obtain r2
r2
(2.370) Since C1 and C2 are two arbitrary paths connecting points 1 and 2, this means that the work done by the gravitational field is path-independent. As the test particle moves under the influence of the gravitational field, it also satisfies the second law of Newton, that is, (2.371) (2.372)
114
VECTOR ANALYSIS
Using this in Equation (2.370), we can write the work done by gravity as (2.373)
(2.374)
(2.375)
(2.376)
(2.377)
(2.378)
(2.379) In other words, the work done by gravity is equal to the change in the kinetic energy,
T
=
1 2 -mu , 2
(2.380)
7
of the particle as it moves from point 1 to 2. This result, x 3 = 0, obtained for the gravitational field of a point mass M has several important consequences. First of all, since the gravitational interaction is linear, the gravitational field of an arbitrary mass distribution can be constructed from the gravitational fields of point masses by linear superposition. Hence Vx?j+=0
(2.381)
is a general property of Newtonian gravity, independent of the source and the coordinates used.
2.10.3 Path Independence and Exact Differentials We have seen that for an arbitrary vector field 3, if the curl is identically zero, then we can write 3 as the gradient of a scalar field, that is, if 7 X ? = O ,
(2.382)
POTENTIAL THEORY
we can always find a scalar field,
a(?),
such that
37 = ?a. The existence of a differentiable of 3, that is, by the conditions
115
(2.383)
is guaranteed by the vanishing of the curl
dVl
dV2
8x2
8x1
=0,
(2.384) (2.385)
dV3
dUl
8x1
8x3
1
v1 dxl
= 0.
(2.386)
+ v2 dx2 + u3 dx3.
(2.387)
We consider the line integral
l2
3. d 7=
If we can find a scalar function, are
2
@(XI,
x 2 , 2 3 ) , such
that its partial derivatives
(2.388) then the line integral [Eq. (2.387)] can be evaluated as
(2.389) (2.390) (2.391) = @(2) - @(l).
(2.392)
In other words, when such a @ can be found, the value of the line integral; J;” 37 .d?, depends only on the values that Q, takes at the end points, that is, it is path-independent. When such a function exists, vldxl + v2dxz + v3dx3 is called an exact differential and can be written as
116
VECTOR ANALYSIS
The existence of (a is guaranteed by the following sufficient and necessary differentiability conditions:
(2.394) (2.395) (2.396)
(2.397) (2.398) (2.399) which are nothing but the conditions [Eqs. (2.384)-(2.386)] for
2.10.4
Gravity and Conservative Forces
We are now ready t o apply all this t o gravitation. Since introduce a scalar function @ such that + g =-?(a, where
9 x 3 = 0.
(a(?)
9x3
=
0, we
(2.400)
is called the gravitational potential:
a(?)
= -G-
M r
(2.401)
The minus sign is introduced to assure that the force is attractive, that is, it is always toward the central mass M . We can now write Equation (2.370) as
(2.402) = -m[@(2) - @(I)].
(2.403)
Using this with Equation (2.379), we can write
[:
-m [@(a)- @(I)]= -mu
2]2-
[+U2l1.
(2.404)
POTENTIAL THEORY
117
If we rewrite this as (2.405) we see that the quantity 1 -mu2 +ma(?) = E 2
(2.406)
is a constant throughout the motion of the particle. This constant, E , is nothing but the conserved total energy of the particle. The first term, + m u 2 ,is the familiar kinetic energy. Hence we interpret ma(?) as the gravitational potential energy, 0, of the particle m,
R(?)
= m@(?),
(2.407)
and write 1 -mu2fR=E. 2
(2.408)
To justify our interpretation of R , consider m at a height of h from the surface of the Earth (Fig. 2.36) and write
R = m@(7) -
-m-
GM ( R+ h ) (2.409)
> h we can take the gravitational potential energy as
R
= mgh.
(2.412)
118
VECTOR ANALYSIS
tm I
Ih I
Figure 2.36
Gravitational potential energy.
From the definition of the gravitational field of a point particle, + MA g = -G-er,
(2.4 13)
r 2
it is seen that operationally the gravitational field at a point is basically the force on a unit test mass. Mathematically, the gravitational field of a mass distribution given by the density p ( 7 ) is determined by the field equation
V . ?= j’ -47rGp( 7),
(2.414)
which is also known as Gauss’s law for gravitation. Interactions with a vanishing curl are called conservative forces. Frictional forces and in general velocity-dependent forces are nonconservative, since the work done by them depends upon the path that the particles follow. 2.10.5
Gravitational Potential
We consider Equation (2.402) again and cancel m on both sides to write r2
3 . d 7
Q(2) - Q(1) = -
(2.4 15)
C
or (2.416)
If we choose the initial point 1 a t infinity and define the potential there as zero and the final point 2 as the point where we want to find the potential,
POTENTIAL THEORY
119
we obtain the gravitational potential as
-
@(?)
=
-
T
’
d?.
s’, -
(2.41 7)
From Figure 2.37 it is seen that the integral . d? is equal to the work that one has to do to bring a unit test mass infinitesimally slowly from infinity to ? : (2.4 18) (2.419) (2.420)
Note that for the test mass to move infinitesimally slowly, we have to apply a force by the amount (2.421)
so that the test particle does not accelerate towards the source of the gravitational potential. For a point mass M this gives the gravitational potential as
M @(?) = -G-. r
(2.422)
What makes this definition meaningful is that gravity is a conservative field. Hence @ is independent of the path we use (Fig. 2.37). We can now use + g =-?a
(2.423)
to write the gravitational field equation [Eq. (2.414)] as
7.g@= 47iGp,
(2.424)
V2@ = 4-irGp,
(2.425)
or as
which is Poisson’s equation. In a region where there is no mass, the equation t o be solved is
3% = 0,
(2.426)
which is Laplace equation, and the operator ?’ is called the Laplacian. The advantage of working with the gravitational potential is that it is a scalar and hence has only magnitude, which makes it easier t o work with.
120
VECTOR ANALYSIS
Figure 2.37
Gravitational potential.
Since gravity is a linear interaction, we can write the potential of N particles by linear superposition of the potentials of the individual particles that make up the systeni as N
mi
@(7) = -G
(2.427)
i=l
where 7 i is the position of the ith particle and 7is called the field point. In the case of a continuous mass distribution, we write the potential as an integral:
Q(7) = -G
p ( 7’) d37’
(2.428)
where the volume integral is over the source points 7’.After a(?) is found, one can construct the gravitational field easily by taking its gradient, which involves only differentiation. 2.10.6
Gravitational Potential Energy of a System
For a pair of particles, gravitational potential energy is written as [Eqs. (2.401) and (2.407)]
R
=
Mm -G-, T
(2.429)
where T is the separation between the particles. For a system of N discrete particles we can consider the system as made up of pairs and write the grav-
121
POTENTIAL THEORY
it,ational potential energy in the following equivalent ways: mimj 7.. - 7. - ---t
C
R=-G
-,
All pairs, i#j
211-
Ti
(2.430)
'Zj
(2.431)
(2.432) We have written R in three different ways. First of all, we do not include the cases with i = j , which are not even pairs. These terms basically correspond to the self energies of the particles that make up the system. We leave them out since they contribute as a constant that does not change with the changing configuration of the system. The factor of 1/2 is inserted in the last expression to avoid double counting of the pairs. Note that R can also be written as 1 2
+
+ . . . + m,@.,)
(2.433)
R = - (ml@1 m2@2 N
N
1 9. i f i. = - E m + @ + , = -GT
2
(2.434)
2=1
where @i is the gravitational potential at the location of the particle mi due to all other particles. If the particles form a continuum with the density p, we then write
R=il =
f
M
@dm @(?"')p(?"')
(2.435) (2.436)
d37',
where @ is the potential of the part of the system with the mass M acting on dm = p d 3 7 .
-
dm
Example 2.12. Gravitational potential energy of a uniform sphere: For a spherically symmetric mass distribution with density p ( r ) and radius R we can write the gravitational potential energy as
R(R) =
@ dm
(2.437) (2.438)
where m(r) is the mass inside the radius r, and d,m is the mass of the shell with radius r and thickness dr: dm = 47rr2p(r) dr.
(2.439)
122
VECTOR ANALYSIS
For uniform density po we write s1 as (47rp0r3/3) 47rr2po d r
R(R) = - G I
r
,
(2.440)
which gives
R(R) = --.
3GM2 5R
(2.441)
Because of the minus sign, this is the amount of work that one has t o do to disassemble this object by taking its particles t o infinity. 2.10.7
Helmholtz Theorem
We now introduce an important theorem due to Helmholtz, which is an important part of potential theory. Theorem 2.2. A vector field, if it exists, is uniquely determined in a region R surrounded by the closed surface S by giving its divergence and curl in R and its normal component on S. Proof: Assume that there are two fields, dl and 3 2 , that satisfy the required conditions, that is, they have the same divergence, curl, and normal component. We now need to show that if this be the case, then these two fields must be identical. Since the divergence, the curl, and the dot product are all linear operators, we define a new field 7i? as
d
=3 1
-
3 2 ,
(2.442)
which satisfies
?x
Since
d = O in R,
(2.443)
a'.7i? = 0 in R,
(2.444)
2 . 3= 0 on S.
(2.445)
? x 8 = 0, we can introduce a scalar potential @ as 7i? = -TQ.
(2.446)
Using Green's second identity [Eq. (2.326)]: (2.447)
with the substitution u = u
= @,
we write
POTENTIAL THEORY
123
When Equation (2.446) is substituted, this becomes
i d 3 . (@Tit)= h d r [-Tit. Tit - a ? . 31. Since the first integral,
A d z . @Tit =
A
do@( G . Tit) ,
(2.449)
(2.450)
is zero because of Equation (2.445) and the integral (2.451) is zero because of Equation (2.444), Equation (2.449) reduces t o
(2.452) Since (312 is always a positive quantity, the only way t o satisfy this equation for a finite volume is to have
5=0,
(2.453)
that is, (2.454) thereby proving the theorem.
2.10.8 Applications of the Helmholtz Theorem Helmholtz theorem says that a vector field is completely and uniquely specified by giving its divergence, curl, and normal component on the bounding surface. When we are interested in the entire space, the bounding surface is usually taken as a sphere in the limit as its radius goes to infinity. Given a vector field, we write its divergence and curl as (2.455) (2.456) where kl and k2 are constants. The terms on the right-hand side, p(?) and T(?),are known functions of position and in general represent sources and current densities, respectively. There are three cases that we analyze separately:
124
VECTOR ANALYSIS
(I) In cases for which there are no currents, the field satisfies (2.45 7) (2.458) We have already shown that when the curl of a vector field is zero, we can always find a scalar potential, @(?), such that
3 = -9@.
(2.459)
Now the second equation [Eq. (2.458)] is satisfied automatically and the first equation can be written as Poisson's equation
a'%
=
-k1p,
(2.460)
the solution of which can be written as (2.461) where the volume integral is over the source variable 7' and 7is the field point. Notice that the definition of scalar potential [Eq. (2.459)] is arbitrary up to an additive constant, which means we are free to choose the zero level of the potential. (11) In cases where p ( 7 ) = 0, the field equations become
9..=0,
(2.462)
a' x 3 = k 2 7 ( 7 ) .
(2.463)
We now use the fact that the divergence of a curl is zero and introduce a vector potential such that
x(?)
3 = 9 X Z .
(2.464)
We have already proven that the divergence of a curl vanishes identically. We now prove the converse, that is: if the divergence of a vector field 3 vanishes identically, then we can always find a vector potential 2 such that its curl + gives 3. Since we want A to satisfy Equation (2.464), we can write
dA3 8x2
dA2 8x3
= 'u1,
(2.465) (2.466) (2.467)
125
POTENTIAL THEORY
Remembering that the curl of a gradient is zero [Eq. (2.320)], we can always + add or subtract the gradient of a scalar function to the vector potential A ,
x x +T h , --f
(2.468)
without affecting the field 3.This gives us the freedom t o set one of the components of 3 to zero. Hence we set A3 = 0, which simplifies Equations (2.465)-(2.467) to (2.469) (2.470) dA2
ax,
dAl 8x2
= 213.
(2.471)
The first two equations can be integrated immediately to yield
I,, 23
A1 =
v2
(2.472)
dx3,
-lo, + x3
A2
=
211
dx3
f2(21,~2),
(2.473)
where f2(x1,x2) is arbitrary a t this point. Substituting these into the third equation [Eq. (2.471)],we obtain (2.474) Using the fact that the divergence of 3 is zero, that is, avl dv2 -+-+---=0, 8x1 ax2
av3
ax3
(2.475)
we can write Equation (2.474) as (2.476) The integral in Equation (2.476) can be evaluated immediately to give
+
a f 2 ( x 1 1 x 2 ) U3(51,22,23) -vuQ(Xl,x2,~03) = ‘k3(21,22,23),
ax 1
which yields
f2(x1,x2)as
the quadrature
(2.477)
126
VECTOR ANALYSIS
Substituting f2 into Equation (2.473) we obtain the vector potential as
Lo3 1:: z3
A1
=
A2
=
(2.480)
UZ(zl,zZ,z3) d53, V3(zlrz2,z03) dzl
-
A3 = 0.
1::
Ul(zl,Z2,23) dx3,
(2.481) (2.482)
In conclusion, given a vector field 3 satisfying Equations (2.462) and + (2.463), we can always find a vector potential A such that
3=a‘XX, where 2 is arbitrary up to Using a vector potential (2.463) as
(2.483)
tb gradient of a scalar function.
A , [Eq. (2.464)], we can now write Equation
a‘ x 3 = kJ(?;’),
(2.484)
X=kJ(T+),
(2.485)
a‘ x a ‘ x
V ( V .2) 322 = k 2 J ’ ( ? ; ’ ) . -
Using the freedom in the choice of
(2.486)
2 we can set
a‘.2=0,
(2.487)
which is called the Coulomb gauge in electrodynamics. The equation t o be solved for the vector potential is now obtained as 7
2
2
= -k,J’(?.).
(2.488)
Since the Laplace operator is linear, each component of the vector potential satisfies Poisson’s equation,
V2Ai= -k2Ji(?;’),
i = 1,2,3;
(2.489)
hence we can write its solution as (2.490) (111) In the general case, where the field equations are given as
a‘.3 = k1p(?;’), a‘ x 3 = kJ(?;’),
(2.491) (2.492)
POTENTIAL THEORY
we can write the field in terms of the potentials @ and
127
2 as
a' X 2.
37 = -a'@+
(2.493)
Substituting this into the first equation [Eq. (2.491)] and using the fact that the divergence of a curl is zero, we obtain
-a' a'@+ a'.(a'x 2)= kip, '
V2@ = -kip.
(2.494) (2.495)
Similarly, substituting Equation (2.493) into the second equation [Eq. (2.492)], we get
a' x (-a'@+ a' x 2)= k 2 7 ( 7 ) , -V x V@+ a' x a' x 2 = k 2 J ( 7 ) , a' (72 ) v2x = k J ( T + ) , -
'
(2.496) (2.497) (2.498)
where we used the fact that the curl of a gradient is zero. Using the Coulomb gauge (q. 2 = 0) and Equation (2.495), we obtain the two equations t o be solved for the potentials as (2.499) (2.500)
2.10.9
Examples from Physics
Gravitation: We have already discussed this case in detail. The field equations are given as
a'.Tj+= -47rGp(7), a'X?j+=O,
(2.501) (2.502)
where p( ?) is the source of the gravitational field, that is, the mass density. Instead of these two equations, we can solve Poisson's equation,
a"@= 47rGp(?'f),
(2.503)
for the scalar potential @, which then can be used to find the gravitational field by + g
Electrostatics:
=-a'@.
(2.504)
128
VECTOR ANALYSIS
In electrostatics the field equations for the electric field are given as
a'.3 = 47rp(?), a'XZ=O.
(2.505) (2.506)
Now, p(?) stands for the charge density and the plus sign in Equation (2.505) nieans like charges repel and opposite charges attract. Poisson's equation for the electrostatic potential is
V2@= -47rp(?), where
z
=
-T@.
(2.507)
(2.508)
Magnet ostat ics: Now the field equations for the magnetic field are given as
T..=O, a'x
(2.509)
3 = "c J ,
(2.510)
where c is the speed of light and J' is the current density. The fact that the divergence of 3 is zero is a direct consequence of the fact that magnetic monopoles do not exist in+ nature. Introducing a vector potential A and with the Coulomb gauge, 3 . A = 0, we can solve
(2.511) and obtain the magnetic field via
Z=VxX.
(2.512)
Maxwell's equations: The tinie-dependent Maxwell's equations are given as
3.23= 4np,
Tx
1 ax z+-0 c at , =
(2.513) (2.514)
T.Z=O,
(2.515)
1az =-J'. 47r a' x 3 --c at c
(2.516)
These equations are coupled and have to be considered simultaneously. We now introduce the potentials @ and 2 such that
(2.517) (2.518)
POTENTIAL THEORY
129
and use the Lorenz gauge: 1aQi
+
--+?.A c at
=O.
(2.5 19)
Hence Maxwell’s equations reduce t o (2.520) (2.521)
Applications of potential theory to electromagnetic theory can be found in Griffiths and Inan & Inan (a,b). Irrotational flow of incompressible fluids: For flow problems the continuity equation is given as
? . 7 + - =dP 0,
at
(2.522)
7
where is the current density and p is the density of the flowing material. In general, the current density can be written as
J
=p 3 ,
(2.523)
where d is the velocity field of the fluid. Hence the continuity equation becomes dP 7. ( p d ) + - = 0. at
(2.524)
For stationary flows, apldt = 0. If we also assume incompressible fluids, that is, p = constant, the continuity equation reduces to
T..=O.
(2.525)
However, from the Helmholtz theorem we know that this is not sufficient to determine the velocity field 3.If we also assume irrotational flow, which means ? X d = O ,
(2.526)
we can introduce the velocity potential Qi: ;ii’ =
?a.
(2.52 7)
Substituting this into Equation (2.525), we obtain Laplace equation
7%= 0.
(2.528)
130
VECTOR ANALYSIS
PROBLEMS
1. Using coordinate representation, show that
(2x 3). (3x 5)= (2.3)(3. 73)- (2.3)(3.3). 2. Using the permutation s l z b o l , show that the i t h component of the cross product of two vectors, A and 3, can be written as 3
3
j=1 k = l
3. Prove the triangle inequality
4. Prove the following vector identity, which is also known as the Jacobi identity:
2 x (2x 3)+2 x (3 x 2)+3 x (2 x 3) =o. 5 . Showthat
(2x 2)x (3x d)= (2.3 x 3 ) Z - ( 2 . 3x 7?)5. + * 6. Show that for three vectors, A , B and essary and sufficient condition is
3,to be noncoplanar the nec-
2 .(3x 73)# 0. 7. Find a parametric equation for the line passing through the points (a) ( 2 , 2 , - 2 ) and ( - 3 , 1 , 4 ) ,
(b) (-1, 4,3) and (4, -3,l). 8. Find the equation of the line orthogonal to i
(a) A = (1, -11,
(b)
3 = (-5,
21,
2 = ( 2 , -l), 7= (4,2).
9. Show that the lines 221 - 3 2 2 =
and
1
2 and passing through 7:
PROBLEMS
131
are not orthogonal. What is the angle between them?
10. Find the equation of the plane including the point normal 3:
3 and
with the
3 = (2,1, -11, 3 = (1, I, 21, (b) ? = (2,3,5), 3 = (-1,1,2).
(a)
11. Find the equation of the plane passing through the following three points: (a) (2,1,1), (4,1, -1) and ( L 2 , 21, (b) (-5, -1,2),(2,1, -1) and ( 3 , -1,2).
12. (a) Find a vector parallel t o the line of intersection of 4x1 - 2x2
+ 2x3 = 2
and
6x1
+ 2x2 + 2x3 = 4.
(b) Find a parametric equation for the line of intersection of the above planes.
13. Find the angle between the planes
14. Find the distance between the point
3x1
2 = (1,1,2) and the plane
+ 2 2 - 3x3 = 2.
15. Let P and Q be two points in n-space. Find the general expression for the midpoint of the line segment joining the two points. 16. If T ( t )and ?(t) are two differentiable vectors, then show that
(a) d?(t)
d
dz+(t)
x ?(t)] = ?(t) x -+ -x ?(t), d t [T(t) dt dt
d
+
- [ T ( t )x x ( t ) ]= 2 ( t ) x dt
Z(t).
132
VECTOR ANALYSIS
17. Given the parametric equation of a space curve, namely, = cost,
21
22
= sint,
23 =
2sin2t,
(a) sketch the curve,
(b) find the equation of the tangent line at the point P with t = 71.13, (c) find the equation of a plane orthogonal to the curve at P, (d) show that the curve lies in the surface
2: - 2;
+ 2 3 = 1.
18. For the following surfaces find the tangent planes and the normal lines at the points indicated: (a)
2::
(b)
z:
+ zi + zz = 6 at (1,1,2), + 2+ 2x: = 2 at (1, I, I ) , 5122
z22: -
19. Find the directional derivative of F ( 2 1 , 2 2 , 2 3 )=
22,2 +z,2
-
2 23
in the direction of the line from (1,2,3) to (3,5,1) at the point ( 1 , 2 , 3 ) .
20. For a general point, evaluate d F l d n for F = zyz, where n is the outer normal to the surface
x; + 22;
+ 42; = 4.
21. Determine the points and the directions for which the change in
f = 2xq
+ 2; + 5 3
is greatest if the point is restricted to lie on x: 22. Prove the following:
+ x ; = 2.
PROBLEMS
133
23. Prove the following properties of the divergence and the curl operators: (a)
d.(X+3)=V.X+V.Z,
(b)
$ . ( + x )= $ $ . x + $ $ . x ,
(c)
dx(X++VxX+$xZ,
(d)
dx
(42)
=4
7 x X + V +x
2.
24. Show that the following vector fields have zero curl and find a scalar function @ such that 3 = q@: (a)
(b)
+ 2yzz&, + y2zZz, 3 = ( 3 2 ’ ~+ z2y)Ez + (z3+ z2z)Zy + 2zxyZz
d
= y2zZz
25. Using the vector field + v = x 2 yze,- - 2x3y3Zv show that
+ xy2zZz,
9 .9 x 3 = 0.
26. If 7;’ is the position vector, show the following:
=.
7..=3,
VtX=O,
(3.7)3. 27. Using the following scalar functions, show that (a) (b)
CP
T X?@
= 0:
= exy cos z ,
1
= (z2
+ +z y2
y 2 .
28. An important property of the permutation symbol is given as 3
EijkEilm
= SjlSkrn
-
Sjrnbkl.
i=l
A general proof is difficult, but check the identity for the following specific values:
j=k=1, j=l=l,
k=m=2.
134
VECTOR ANALYSIS
29. Prove the following vector identities:
30. Write the gradient and the Laplacian for the following scalar fields:
+ = ln(x2 + y2 + z 2 ) ,
(a) (b)
=
(c)
@=
1
(x2
+ y2 + z2)1/2 ’
J2qT
31. Evaluate the following line integrals, where the paths are straight lines connecting the end points:
(b)
y dx
+ x dy.
32. Evaluate the line integral
I = L y 2 d x + x 2 dy over the semicircle centered a t the origin and with the unit radius in the upper half-plane.
33. Evaluate
I over the parabola y
=
=
x2.
34. Evaluate
over a circle of radius 2 .
i;;;)+ y dx
x2 dy
PROBLEMS
135
35. Evaluate the line integral
over the curve y = ex - ex5
36. Evaluate J J ,
+ 22.
2 .2do, where
and S is the portion of the plane 2x + 2y octant and 2 is the unit normal to S.
+ z = 6 included in the first
37. Evaluate
over y
=
x2 + 2x
-
2.
38. Evaluate
where C is the square with the vertices (1,l),(-1, l),(-1, -l), (1, -1).
39. Evaluate the line integral
I
=
y2dx
+x2dy
over the full circle x2 + y2 = 1.
40. Evaluate over the indicated paths by using Green’s theorem:
41. Evaluate
(a)
I = f c y 2 d z + x y dy, x 2 + y 2
(b)
I = jC(2z3 - y3) dx
(c)
1 = fc f(z) dx
= 1,
+ (x3+ 2y3) d y ,
+ g(y) d y ,
x2
+ y2 = 1,
any closed path.
136
VECTOR ANALYSIS
where 4
v =
( 2+ y2)Zz + 2xyZy
over y = x3 from (O,O) to ( I , 1).
42. Use Green’s theorem to evaluate
I = jhcvnds, where
2 = ( 2+ $)ZZ and C is the circle x2
+ 2zyZy
+ y2 = 2. +
43. Given the vector field 2 = -3yZZ 2zZy line integral by using Stokes’s theorem:
+ Z2,evaluate the following
where C is the circle x2 + y2 = 1, z = 1.
44. Using Stokes’s theorem, evaluate
+
h [ y 2 d z z2dy
+ x’dz],
where C is the triangle with vertices at (O,O, 0), (0, a,0) and (O,O, a) 45. Usc Stokes’s theorem to evaluate
I = {8xy2z dx
+ 8x2yz dy + (4x2y2
around the path x = cost, y = sint, z
= sint,
-
22) d z
where t E [ 0 , 2 ~ ] .
46. Evaluate the integral id*.?? for the surface of a sphere with radius R and centered at the origin in two different ways. Take 7? as (a)
+ v = zZZ +yey + z Z z ,
PROBLEMS
137
47. Given the temperature distribution 2 T(z1,52,53)= z 1
+ 22122 +
2;23,
(a) determine the direction of heat flow at ( 1 , 2 , l),
(b) find the rate of change of temperature at (1,2,2) in the direction of h
e2
+ Z3.
48. Evaluate f(22
-
y
+ 4) dx + (5y + 32
-
6) dy
around a triangle in the zy-plane with the vertices at (O,O), (3,0), ( 3 , 2 ) traversed in the counterclockwise direction. 49. Use Stokes’s theorem to evaluate
over x3 = 9
-
2 21 -
x; 2 0.
50. Obtain Green’s second identity:
51. Evaluate the following integrals, where S is the surface of the sphere z2 y2 z 2 = a2 :
+ +
f S
[z3 cos
oz,n + y3 c o ey,n ~ + 2 3 cos e,,,]
do
and
For the vector field in the second part plot on the zy-plane and interpret your result.
138
VECTOR ANALYSIS
52. Verify the divergence theorem for +
+
A = ( 2 2 ~ z)Ez
+ g2Ev
-
(X
+ 3y)E2
taken over the region bounded by the planes 2 ~ + 2 y + z = 6 , z=O, y=O, z = O . 53. Prove that
is a conservative field and find the work done between (3, -2,2) and -1).
54. Without using the divergence theorem, show that the gravitational force on a test particle inside a spherical shell of radius R is zero. Discuss your answer using the divergence theorem. 55. Without using the divergence theorem, find the gravitational field outside a uniform spherical mass of radius R. Repeat the same calculation with the gravitational potential and verify your answer obtained in the first part. Interpret your results using the divergence theorem. 56. Without using the divergence theorem, find the gravitational field for an internal point of a uniform spherical mass of radius R. Repeat the same calculation for the gravitational potential and verify your answer obtained in the first part. Discuss your results in terms of the divergence theorem.
57. Assume that gravitation is still represented by Gauss’s law in a universe with four spatial dimensions. What would be Newton’s law in this universe? Would circular orbits be stable? Note: You may ignore this problem. It is an advanced but fun problem that does not require a lot of calculation. However, if you want to attempt it, you may want to read Goldstein, Poole, and Safko on central forces first.
Hint: The surface area of a sphere in four dimensions is 2n2R3.In three dimensions it is 4nR2.
CHAPTER 3
GENERALIZED COORDINATES AND TENSORS
Scalar quantities are defined a t a point by just giving a single number. Hence they have only magnitude. Vector quantities are geometrically defined as directed line segments, which have both magnitude and direction. By assigning a vector to each point in space we obtain a vector field. Similarly, a scalar field is defined. Field concept is one of the most fundamental concepts of theoretical physics. In working with scalars or vectors, it is important that we first choose a suitable coordinate system. A proper choice of coordinates, one that reflects the symmetries of the physical system, simplifies the algebra and the interpretation of the solution significantly. In this chapter, we start with Cartesian coordinates and their transformation properties. We then show how a generalized coordinate system can be constructed from the basic principles and discuss general coordinate transformations. The definition of vectors with respect to their transformation properties brings new depths into their discussion and takes us beyond their geometric interpretation as directed line segments. This allows us t o introduce more sophisticated objects called tensors, where vectors and scalars appear only as special cases. We finally conclude with a detailed discussion of cylindrical and spherical coor-
Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
8. Selquk Bayin 139
140
GENERALIZED COORDINATES AND TENSORS
Figure 3.1
Orthogonal transformations.
dinate systems, which are among the most frequently encountered coordinate systems in applications. 3.1
TRANSFORMATIONS B E T W E E N CARTESIAN COORDINATES
Transformations between Cartesian coordinates that exclude scale changes,
x'
= kx, k = constant,
(3.1)
are called orthogonal transformations. They preserve distances and magnitudes of vectors. There are basically three classes of orthogonal transformations. The first class involves translations, the second class is rotations, and the third class consists of reflections (Fig. 3.1). Translations and rotations can be generated continuously from an initial frame; hence they are called proper transformations. Since reflection cannot be accomplished continuously, they are called improper transformations. A general transformation is usually a combination of all the three types. 3.1.1
Basis Vectors and Direction Cosines
We now consider orthogonal transformations with a common origin (Fig. 3.2). To find the transformation equations, we write the position vector in two frames in terms of their respective unit basis vectors as
and
TRANSFORMATIONS BETWEEN CARTESIAN COORDINATES
Figure 3.2
141
Orthogonal transformations with one point fixed.
Note that the point P that the position vector, 7 ,represents exists independent of the definition of our coordinate system. Hence in Equation (3.3) we have written 7instead of 7’. However, when we need to emphasize the coordinate system used explicitly, we also write 7’. In other words, Equations ( 3 . 2 ) and (3.3) are just two different representations of the same vector. Obviously, there are infinitely many choices for the orientation of the Cartesian axes that one can use. To find a mathematical dictionary between them, that is, the transformation equations, we write the components of ?;’ in terms of -i e, as
which, after using Equation ( 3 . 2 ) , gives
These are the transformation equations that allow us to obtain the coordinates in terms of the primed system given the coordinates in the unprimed system. These equations can be conveniently written as
c(q Zy) 3
2::=
.
j=1
zj,
i = 1,2,3.
(3.10)
142
GENERALIZED COORDINATES AND TENSORS
Figure 3.3
Direction cosines for rotations about the zs-axis.
The coefficients (< . Zj) are called the direction cosines and can be written as u 23- -- ( ~ ’ , Z j ) = c o s Q z ji,= l , 2 , 3 ,
(3.11)
where $ i j is the angle between the i t h basis vector of the primed system and the j t h basis vector of the unprimed system. For rotations about the z3-axis (Fig. 3.3), we can write a i j , i = 1,2,3, as the array 5’3 = aij,
=(
i = 1,2,3,
(3.12)
(Z1 . Zl) (Z1 .22)
(3. Z 1 ) 0
0
(& . Z 2 ) 0
(3.13)
0 (3.14)
0 cosQ
sin0
0 (3.15)
3.1.2 Transformation Matrix and the Orthogonality Relation General linear transformations between Cartesian coordinates can be written as (3.16)
TRANSFORMATIONS BETWEEN CARTESIAN COORDINATES
143
where the square array
s = uij, i = 1 , 2 , 3 ,
( it; ;;; ) a12
-
(3.17)
a13
@;
(3.18)
is called the transformation matrix. Let us now write the magnitude of ?;‘ in the primed system as (3.19) i=l
Using the transformation equations, we can write as
17-1
2
in the unprimed system
3 UijtXjl i=l 3
3
3
1
(3.20)
i=l j = 1 j’=l
Rearranging the triple sum, we write (3.22) Since the orthogonal transformations preserve magnitudes of vectors, the transformation matrix has t o satisfy 3
= tijj,, j , j ‘ = 1,2,3,
CUijUij‘ i= 1
(3.23)
which is called the orthogonality condition. Equation (3.22) now gives
Cz’..’. + x? + x’3” 3
7. 7=
3
3
=
(3.24)
j=1
cc 3
=
3
Sjj’XjXj!
= c x j x j = 2 21 j=1
+ x 2 + 2 32 .
(3.25)
(3.26)
144
GENERALIZED COORDINATES AND TENSORS
3.1.3
Inverse Transformation Matrix
For the inverse transformation we need to express xi in terms of x:. This can only be done when the determinant of the transformation matrix does not vanish:
(3.27) Writing the orthogonality relation explicitly as
( :t
a13
zzi z:i ) ( %:f a23
a33
a12
a13
a22
a23
a32
a33
) ( =
1 0 0 0 1 0 0 0 1
)
,
(3.28)
we can evaluate the determinant of S. The determinant of the left-hand side is the multiplication of the determinants of the individual matrices. Using the fact that interchanging rows and columns of a matrix does not change the value of its determinant, we can write (det S)2= I,
(3.29)
= fl.
(3.30)
which yields det S
The negative sign corresponds to improper transformations, hence the determinant of the transformation matrix provides a convenient tool t o test whether a given transformation involves reflections or not. For a formal inversion we multiply the transformation equation, 3
x:=
C u . . x . i = 1,2,3, a3
3,
(3.31)
j=1
with aiJf and sum over i to write
(3.32)
(3.33) Substituting the orthogonality relation for the sum inside the square brackets we get 3
i=l
3 j=1
= xjt, j ' = 1,2,3,
(3.35)
CARTESIAN TENSORS
145
which, when written explicitly, gives 21
= a112; f
a21.k
22
= a122; f
0422;
23
= a135:
+ a232;
+ + +
a315;,
(3.36)
a32xk,
(3.37) (3.38)
a334
We now write the inverse transformation matrix as
[ ::: 2: 2: 1 all
s-1 =
a21
a31
(3.39) '
Comparing with S in Equation (3.18), it is seen that
-
s-l = s,
(3.40)
s
where is called the transpose of S, which is obtained by interchanging the rows and the columns in S. In summary, the inverse transformation matrix for orthogonal transformations is just the transpose of the transformation matrix. For rotations about the q - a x i s [Eq. (3.15)],the inverse transformation matrix is written as cosd
-sin8
0 (3.41)
Note that S ,' corresponds t o a rotation in the opposite direction by the same amount, that is, S,-'(O) = S3(-O).
3.2
(3.42)
CARTESIAN T E N S O R S
So far we have discussed the transformation properties of the position vector + r . We now extend this to an arbitrary vector, 3, as -1
21
ST,
(3.43)
where S = u Z g ,i , j = 1 , 2 , 3 , is the orthogonal transformation matrix [Eq. (3.18)]. In other words, a given triplet of functions,
caririot he used to define a vector;
146
GENERALIZED COORDINATES AND TENSORS
unless they transform as
(3.46) Under the orthogonal transformations a scalar function, @(xi,x2,x3), transforms as
In the new coordinate systerrl, @ will naturally have a different functional dependence. However, the values that Q, assumes at each point of space remain the same. It is for this reason that in Equation (3.48) we have written @ instead of @ I . In order to indicate the coordinate system used, we may also write @ I . Since temperature is a scalar quantity, its value at a given point does not depend on the coordinate system used. A different choice of coordinate system assigns different coordinates (codes) to each point:
however, the numerical values of the temperature at each point remain the same. We now write the scalar product of two vectors, 2 and 3, in the primed coordinates by using the scalar product written in the unprimed coordinates, that is,
(3.50) = ZlYl
+ 52y2 + 23y3.
(3.51)
Using the orthogonal transformations,
(3.52) wc write 2
‘
3 as (3.53)
(3.54)
CARTESIAN TENSORS
Using the orthogonality relation [Eq. (3.23)]: s S
=
147
S s = I , this becomes (3.55)
(3.56) (3.57) (3.58) In other words, the orthogonal transformations do not change the value of a scalar product. Properties of physical systems that preserve their value under coordinate transformations are called invariants. Identification of invariants in the study of nature is very important and plays a central role in both special and general theories of relativity. In the previous chapter we have defined vectors with respect to their geometric and algebraic properties. Definition of vectors with respect to their transformation properties under orthogonal transformations brings new levels into the subject and allows us to free the vector concept from being just a directed line segment drawn in space. Using the transformation properties, we can now define more complex objects called tensors. Tensors of second rank, Tij , are among the most commonly encountered tensors in applications and have two indices. Vectors, vi, have only one index and they are tensors of first rank. Scalars, @, which have no indices, are tensors of zeroth rank. In general, tensors of higher ranks are written with the appropriate number of indices as T=Tijkl...,
i , j , k,... =1,2,3.
(3.59)
Each index of a tensor transforms like a vector:
(3.60) (3.61)
(3.62)
(3.63) etc.
148
GENERALIZED COORDINATESAND TENSORS
Tensors of second rank can be conveniently represented as 3 x 3 square matrices: (3.64) Definition of tensors can be easily extended to n dimensions by taking the range of the indices from 1 to n. As we shall see shortly, tensors can also be defined in general coordinates. For the time being, we confine our discussion to Cartesian tensors, which are defined with respect to their transformation properties under orthogonal transformations. 3.2.1
Algebraic Properties of Tensors
Tensors of equal rank can be added or subtracted term by term and the result does not depend on the order of the tensors: For example, if A and B are two second-rank tensors, then their sum is
A +B = B +A Cij
= Aij
= C,
+ Bij, i , j = 1 , 2 , 3 .
(3.65) (3.66)
Multiplication of a tensor with a scalar, a , is accomplished by multiplying all the component of that tensor with the same scalar. For a third-rank tensor, A, we can write
CYA = aAijk, i , j , k = 1 , 2 , 3 .
(3.67)
From the basic properties of matrices, second-rank tensors do not commute under multiplication. That is,
AB # BA,
(3.68)
A ( B C )= (AB)C,
(3.69)
however, they associate:
where A, B , C are second-rank tensors. Antisymmetric tensors satisfy
Aij = -Aji, i,j = 1 , 2 , 3 ,
(3.70)
or
-
A = -A,
(3.71)
where is called the transpose of A , which is obtained by interchanging the rows and columns. Note that the diagonal terms, All, A22, A33, of an
149
CARTESIAN TENSORS
antisymmetric tensor are necessarily zero. If we set i = j in Equation (3.70) we obtain (3.72) (3.73) (3.74) Symmetry and antisymmetry are invariant properties. If a second-rank tensor, A , is symmetric in one coordinate system,
Aij = Aji, i , j
=
(3.75)
1,2,3,
then A' is also symmetric. We first write 3
3
(3.76) Since the components, a i j , are constants or in general scalar functions, the order in which they are written in equations do not matter. Hence we can write Equation (3.76) as 3
3
A:j =
i , j = 1,2,3.
(3.77)
i'=l j ' = l
Using the transformation property of second-rank tensors [Eq. (3.62)], for a symmetric second-rank tensor, Aij , this implies i,j=1,2,3.
(3.78)
A similar proof can be given for the antisymmetric tensor. Any second-rank tensor can be written as the sum of a symmetric and an antisymmetric tensor:
+ Using the components of two vectors, d and b , we can construct a secondrank tensor A as
A=
(
albl
alb2
alb3
a2b1
ad2
a2b3
a361
a3b2
a3b3
1
,
(3.80)
which is called the outer product or the tensor product of d and and it is shown as A = Z T
3,
(3.81)
150
GENERALIZED COORDINATES AND TENSORS
or as
A = Z t T .
(3.82)
To justify that A is a second-rank tensor, we show that it obeys the correct transformation property; that is, it transforms like a second-rank tensor: A 2.7! . = afb'. z 3
(3.83) (3.84)
(3.85)
i'=l j'=]
One can easily check that the outer product defined as T@Z is the transpose of A. We remind the reader that even though we can construct a secondrank tensor from two vectors, the converse is not true. A second-rank tensor cannot always be written as the outer product of two vectors. Using the outer product, we can construct tensors of higher rank from tensors of lower rank:
(3.87) (3.88) (3.89)
where the indices take the values 1,2,3. For a given vector, there is only one invariant, namely, its magnitude. All the other invariants are functions of the magnitude. For a second-rank tensor there are three invariants, one of which is the spur or the trace, which is defined as the sum of the diagonal elements:
We leave the proof as an exercise but note that when A can be decomposed as the outer product of two vectors, the trace is the inner product of these vectors. We can obtain a lower-rank tensor by summing over pairs of indices. This operation is called contraction. Trace is obtained by contracting the two indices of a second-rank tensor as 3
trA
=
XAii. i=l
(3.91)
151
CARTESIAN TENSORS
Other examples of contraction are 3
(3.92) i= 1
(3.93) etc. We can generalize the idea of inner product by contracting the indices of a tensor with the indices of an other tensor: 3
bi =
C
i
1,2,3,
(3.94)
Tz-j k.Aj.k , i = 1 , 2 , 3 ,
(3.95)
Tijaj,
=
j=1
xx 3
ai =
3
j=1 k = l 2
2
(3.96) 3
a
3
=
(3.97)
~ i j ~ i j ,
2=1 J = l etc. The rank of the resulting tensor is equal to the number of the free indices, that is, the indices that are riot summed over. Free indices take the values 1, 2 , or 3. In this regard, we also write a tensor, say Tz3,i , j = I, 2,3, as simply T L JThe . indices that are summed over are called the dummy indices. Since dummy indices disappear in the final expression, we can always rename them.
3.2.2
Kronecker Delta and the Permutation Symbol
To check the tensor property of the Kronecker delta, we use the transformation equation for the second-rank tensors, 3
3
(3.98) with ZlJ/= 62fjfand use the orthogonality relation [Eq. (3.2311 to write 3
3
3
= saj.
3
(3.100)
152
GENERALIZED COORDINATES AND TENSORS
In ot,her words, the Kronecker delta is a symmetric second-rank tensor that transforms into itself under orthogonal transformations. It is also called the identity tensor, which is shown as I. Kronecker delta is the only tensor with this property. Permutation symbol, also called the Levi-Civita symbol, is defined as
EIJk
=
i
0 1 -1
when any two indices are equal. for even (cyclic) permutations: 123, 231, 312. for odd (anticyclic) permutations: 213, 321, 132.
(3.101)
Using the permutation symbol we can write a determinant as
(3.102)
(3.103)
(3.104) Interchange any two of the indices of the permutation symbol in [Eq. (3.103)], tlic determinant changes sign. This operation is equivalent to interchanging the corresponding rows and columns of a determinant. We now write the determinant of the transformation matrix, a i ~ j /as ,
Reiianiing the dummy indices: i -j,
j
+i,
(3.106) (3.107)
Equation (3.105) becomes det a2/3f= -
a2zaIja3kEtjk.
(3.108)
$3k
From Equation (3.30) we know that the determinant of the orthogonal transformation matrix is det az,jl = ~ 1hence , the component ~ 2 1 3transforms as (F1)&213 =
a2zalja3kEzjk. 2.7
k
(3.109)
CARTESIAN TENSORS
153
Similar arguments for the other components yields the transformation equation of &lmn as
(3.110) The niinus sign is for the improper transformations. In summary, & i j k transforms like a third-rank tensor for proper transformations, and a minus sign has to be inserted for improper transformations. Tensors that transform like this are called tensor densities or pseudotensors. Note that aside from the ~1 factor, E i j k has the same constant components in all Cartesian coordinate systems. Permutation symbol is the only third-rank tensor with this property. An important identity of & i j k is 3
Permutation symbol also satisfies
for the cyclic permutations of the indices. For the anticyclic permutations we write
Example 3.1. Physical tensors: Solid objects deform under stress to a certain extent. In general, forces acting on a solid can be described by a second-rank tensor called the stress tensor:
Components of the stress tensor represent the forces acting on a unit test area when the normal is pointed in various directions. For example, t i j is the ith component of the force when the normal is pointing in the j t h direction. Since the stress tensor is a second-rank tensor, it transforms as 3
3
k = l 1=1
The amount of deformation is also described by a second rank tensor, u i j , called the strain tensor. The stress and the strain tensors are related by the equation 3 tij
3
=
Cijklgkl, k = l 1=1
(3.116)
154
GENERALIZED COORDINATES AND TENSORS
where the fourth-rank tensor C i j k l represents the elastic constants of the solid. This is the most general expression that relates the deformation of a three-dimensional solid to the forces acting on it. For a long and thin solid sample, with cross section AA and with longitudinal loading F , Equation (3.116) reduces to Hook’s law: (3.117)
t
= Ycr,
where t is the force per unit area, Al/l, and Y is Young’s modulus.
(3.118)
is the fractional change in length,
Many of the scalar quantities in physics can be generalized as tensors of higher rank. In Newton’s theory, mass of an object is defined as the proportionality constant, m, between the force acting on the object and the acceleration as
Fi
= mai.
(3.119)
Mass is basically the ability of an object to resist acceleration, that is, its inertia. It is an experimental fact that mass does not depend on the direction in which we want to accelerate an object. Hence it is defined as a scalar quantity. In some effective field theories, it may be advantageous to treat particles with a mass that depends on direction. In such cases we can introduce effective mass as a second-rank tensor, mij, and write Newton’s second law as 3
Fi
=
1
mij aj
,
(3.120)
j=1
When the mass is isotropic,
mij
becomes
ma3. . - m&. a3 i
(3.121)
thus Newton’s second law reduces to its usual form.
3.3 3.3.1
GENERALIZED COORDINATES Coordinate Curves and Surfaces
Before we introduce the generalized coordinates, which are also called the curvilinear coordinates, let us investigate some of the basic properties of the Cartesian coordinate system from a different perspective. In a Cartesian coordinate system at each point there are three planes defined by the equations
x 1 = c1, x 2
= c2,
x3 = c3.
(3.122)
GENERALIZED COORDINATES
Figure 3.4
155
Coordinate surfaces and coordinate curves in Cartesian coordinates.
These planes intersect at the point ( e l ,c2, c 3 ) , which defines the coordinates of that point. In this section we start by writing the coordinates with an upper index as xi. There is no need for alarm: As far as the Cartesian coordinates are concerned there is no difference, that is, zz = xi. However, as we shall see shortly, this added richness in our notation is absolutely essential when we introduce the generalized coordinates. Treating c1, c2, c3 as parameters, the above equations define three mutually orthogonal families of surfaces, each of which is composed of infinitely many nonintersecting parallel planes. These surfaces are called the coordinate surfaces on which the corresponding coordinate has a fixed value (Fig. 3.4). The coordinate surfaces intersect along the coordinate curves. For the Cartesian coordinate system these curves are mutually orthogonal straight lines called the coordinate axes (Fig. 3.4). Cartesian basis vectors, E l , E 2 , E 3 , are defined as the unit vectors along the coordinate axes. A unique property of the Cartesian coordinate system is that the basis vectors point in the same direction at every point in space (Fig. 3.5). We now introduce the generalized coordinates, where the coordinate surfaces are defined in terms of the Cartesian coordinates ( x 1 , x 2 , x 3as ) three single-valued continuous functions with continuous partial derivatives: (3.123) (3.124)
(3.125) Treating ,&, Z3 as continuous variables, these give us three families of surfaces, where each family is composed of infinitely many nonintersecting surfaces (Fig. 3.6). Using the fixed values that these functions, Zi(xl, x2,z3),
156
GENERALIZED COORDINATES AND TENSORS
Figure 3.5 direction.
i
=
Basis vectors in Cartesian coordinates always point in the same
1,2,3, take on these surfaces, we define the generalized coordinates
(z' ,z2,z3)as (3.126) (3.127) (3.128)
Note that these equations are also the transformation equations between the Cartesian coordinates (zl,x2,z3)and the generalized coordinates (Z', Z 2 , T 3 ) . For the new coordinates to be meaningful, the inverse transformations, xz = Xi@):
(3.129) (3.130) (3.131)
should exist. In Chapter 1, we have seen that the necessary and the sufficient condition for the inverse transformation to exist, Jacobian of the transformation has t o be different from zero. In other words, for a one-to-one
GENERALIZEDCOORDINATES
Figure 3.6
157
Coordinate surfaces in generalized coordinates for T1
correspondence between (z', x2,x3) and (Z' ,z2, Z3)we need to have
J=
d ( d ,22,z3)
(3.132)
a(z',' 2 , 2 3 )
(3.133)
or since J K = 1.
(3.134)
For the coordinate surfaces given as Z1 =Zl(z',z2 , 23 )
x3) = c2,
2 2 = :2(z1,22, -3--3
x
-z
1
=c1,
2
3
(z ,z, 5 ) = z 3 ,
(3.135) (3.136) (3.137)
t,he intersection of the first two, Z1(x1,x2,x3) = and Z2(x1,x2,x3) = defines the coordinate curve along which Z3 varies (Fig. 3.7). We refer to this as the Z 3 curve, which can be parameterized in terms of z3as (2'(T3), x2(Z3), z3(Z3)) . Similarly, two other curves exist for the Z1and the x2 coordinates. These curves are now the counterparts of the coordinate axes in Cartesian coordinates. -
c2,
158
GENERALIZED COORDINATES AND TENSORS
Figure 3.7
Generalized coordinates
Wc now define the coordinate basis vectors, TI,?^, 2 3 , in terms of the Cartesian unit basis vectors (?I,&, &.) as the tangent vectors: (3.138)
8x1,
ax2,.
8x3,
z2 + 7ze 2 + -e3, z2 8x3, 8x1, ax2, e 3 = -el + -e2 + -e3. z3 E3 z3
j
e
2 = -el
j
(3.139) (3.140)
Note that TZare in general neither orthogonal nor unit vectors. In fact, their magnitudes,
(3.141)
as well as their directions depend on their position. We define unit basis vectors in the direction of as
(3.142)
3
Coordinate basis vectors, e i , point in the direction of the change in the position vector, when we move an infinitesimal amount along the 52 curve. In other words, it is the tangent vector to the ZZ curve at a given point. We can now interpret the condition, J # 0, for a legitimate definition of generalized
GENERALIZED COORDINATES
Figure 3.8
159
Covariant and contravariant components.
coordinates. We first write the Jacobian, J , as
8x1
ax2
ax3
1
J = det
Remembering that the triple product $1 . ($2 x ? 3 ) is the volume of the parallelepiped with the sides $1,$2, and $ 3 , the condition J # 0 for a legitimate definition of generalized coordinates means that the basis vectors have to be noncoplanar.
3.3.2 Why Upper and Lower Indices Consider a particular generalized coordinate system with oblique axis on the plane (Fig. 3.8). We now face a situation that we did not have with the Cartesian coordinates. We can define coordinates of a vector in two different ways, one of which is by drawing parallels t o the coordinate axes and the other is by dropping perpendiculars to the axes (Fig. 3.8). In general, these two methods give different values for the coordinates. Coordinates found by drawing parallels are called the contravariant components, and we write them with an upper index as ui.Now the vector 3 is expressed as + a = al a2 s2, (3.144)
s1 +
where g1 and 2 2 are the unit basis vectors. Coordinates found by dropping perpendiculars to the coordinate axes are called the covariant components. They are written with a lower index as ail and their values are obtained as Ul
=
i2 .&,
a2
=2
A
.&.
(3.145)
160 3.4
GENERALIZED COORDINATES AND TENSORS
GENERAL TENSORS
Geometric interpretation of the covariant and the contravariant components demonstrates that the difference between the two types of coordinates is, in general, real. As in the case of Cartesian tensors, we can further enrich the concept of scalars and vectors by defining them with respect to their transformation properties under general coordinate transformations. We write the transformation equations between the Cartesian coordinates, xi = (xl, x2,x3), and the generalized coordinates, T i = (T1,T2,T3), as
Similarly, we write the inverse transformations as xi =
xy7J+).
(3.147)
Note that each one of the above equations [Eqs. (3.146) and (3.147)] correspond to three equations for i = 1,2,3. Even though we write our equations in three dimensions, they can be generalized to n dimensions by simply extending the range of the indices to n. Using Equation (3.146), we can write the transformation equation of the coordinate differentials as (3.148) For a scalar function, @(xz), we can write the transformation equation of its gradient as (3.149) We now generalize these to all vectors and define a contravariant vector as a vector that transforms like d z j as (3.150) and define a covariant vector as a vector that transforms like the gradient of a scalar function: (3.151) Analogous to Cartesian tensors, a second-rank covariant tensor, Tij, is defined as (3.152)
GENERAL TENSORS
161
Tensors with contravariant and mixed indices are also defined with respect to their transformation properties as (3.153)
(3.154) Note that the transformation equations between the coordinate differentials [Eq. (3.148)] are linear, that is, El
dZ1 = - dx'
8x1
z2
z1dx2 + El +dx3, 8x2 8x3
(3.155)
E2 2 E2 3 +-dx +-dx, (3.156) 8x2 8x3 E3 E3 E3 dZ3 = - dx' - dx2 + - dx3, (3.157) 8x1 8x2 8x3 hence the elements of the transformation matrix, A , in V = Av [Eq. (3.151)] are given as 1
dZ2=-dx
8x1
+
m
A=A2=-=
(3.158)
8x3
- - -
8x1 8x2 8x3 If we apply this to orthogonal transformations between Cartesian coordinates defined in Equation (3.10), we obtain the components of the transformation matrix as
Ai. 3 = A a3 . . - S.. 22 - c osQ 23.
1
(3.159)
where d i j are the direction cosines and we have used the fact that for Cartesian coordinates covariant and the contravariant components are equal. Using the inverse transformation (3.160) in Equation (3.148), we write
3
(3.161) k=l
162
GENERALIZED COORDINATESAND TENSORS
to obtain the relation
(3.162)
In general, we write the transformation matrix, A, and the inverse transformation matrix, '21, as
(3.163)
respectively, which satisfy the relation
3
(3.164) j=1
One should keep in mind that even though for ease in comparison we have identified the variables xi as the Cartesian coordinates and we will continue to do so, the transformation equations represented by the transformation matrix in Equation (3.158) could represent any transformation from one generalized coordinate system into another. We can also write the last equation [Eq. (3.164)] as
(3.165)
thus showing that '21 is the inverse of A = A;.. If we apply Equation (3.163) to the orthogonal transformations between Cartesian coordinates [Eq. (3.31)] and their inverse [Eq. (3.35)],we see that
-
-
A = A.
(3.166)
GENERAL TENSORS
163
We can now summarize the general transformation equations as 3 j=1 3
(3.168)
51 = - p ( v , ,
T,
-2
3.4.1
=
cc 3
3
A$T$
(3.171)
Einstein Summation Convention
From the above equations, we observe that whenever an index is repeated with one up and the other one down, it is summed over. We still have not shown how to raise or lower indices but from now on whenever there is a summation over two indices, we agree to write it with one up and the other down and omit the summation sign. It does not matter which index is written up or down. This is called the Einstein summation convention. Now the above transformation equations and their inverses can be written as
(3.172)
A general tensor with mixed indices is defined with respect t o the transformation rule (3.173) To prove the tensor property of the Kronecker delta under general coordinate transformations, we use Equation (3.164) to write (3.174) = &A:.‘
(3.175)
= 6,; ..
(3.176)
164
GENERALIZED COORDINATES AND TENSORS
Hence 6; is a second-rank tensor and has the same components in generalized coordinates. It is the only second-rank tensor with this property. Algebraic propcrties described for the Cartesian tensors are also valid for general tensors. 3.4.2
Line Element
We now write the line element in generalized coordinates, which gives the distance between two infinitesimally close points. We start with the line element in Cartesian coordinates, which is nothing but Pythagoras’ theorem, which can be written in the following equivalent forms:
d.5’ = d 7 .d 7 = (dx’)2+ (dx’))”+ (dx3)’
(3.177)
3
=Cdxkdxk
(3.178) (3.179)
Using the inverse transformation (3.180) and the fact that ds is a scalar, we write the line element in generalized coordinates as 3
3
(ts’ =
ds2 = C d x k d z k=
axk C axk dEa--EJ dZJ
(3.181)
k=l
k=l
(3.182)
3.4.3
Metric Tensor
We now introduce a very important tensor, that is, the metric tensor, which is defined as
gij,
3
(3.183) k=l
Note that the sum over k is written with both indices up. Hence, even though we still adhere to the Einstein summation convention, for these indices we keep the sumniation sign. The metric tensor is the singly most important second-rank tensor in tensor calculus and general theory of relativity. Now the line element in generalized coordinates becomes
GENERAL TENSORS
165
Needless to say, components of the metric tensor in Equation (3.184) are all expressed in terms of the barred coordinates. Note that in Cartesian coordinates the metric tensor is the identity tensor, gzj = szj;
(3.185)
thus the line element in Cartesian coordinates becomes ds2 = 6ijdxadxj =
(dx’)’
(3.186)
+ (dx’)’ + ( d x 2 ) ’ .
(3.187)
3.4.4 How to Raise and Lower Indices Given an arbitrary contravariant vector
vj, let
us find how
[gijvj]
(3.188)
transforms. Using Equation (3.172), we first write (3.189) (3.190) and then substitute them into Equation (3.188) to get ., [ g i j d ] = A: A: A: [ijztjtVk] .I-’
= A:’ =
A:’
= A:’
(3.191)
[Aj’;iJ,] [gz,j,5k]
(3.192)
[s:]
(3.193)
[gi,j,~k]
[gzrk~jlc] .
(3.194)
Renaming the dummy variable Ic on the right-hand side as k+.i,
(3.195)
we finally obtain
[gijvj] = A:’ [?ji,jEj] .
(3.196)
Comparing with the corresponding equation in Equation (3.172), it is seen that gLjvJ transforms like a covariant vector. We now define the covariant component of vj as
vi = gajv3.
(3.197)
We can also define the metric tensor with the contravariant components as (3.198)
166
GENERALIZED COORDINATES AND TENSORS
where
(3.199) Note that in the above equations, in addition to the summation signs that come from the definition of the metric tensor. the Einstein summation convention is still in effect. Using the symmetry of the metric tensor, we can also write Slkgkl'
-
g11' = 6, 1' .
(3.200)
We now have a tool that can be used to raise and lower indices at will: T2,
= gz,,Ti1,
(3.201)
A'" 23
= gkk'Az3k',
(3.202)
c,, 3
-
g33 k z l g k k ) C ; : k t ,
(3.203)
etc. Metric tensor in Cartesian coordinates is 6", Using Equations (3.158) and (3.172), we can show that under the general coordinate transformations it transforms into the metric tensor:
(3.204) 3
=
C Ai A j -2
.
,
-2
.
I
(3.205)
(3.206) (3.207)
3.4.5 Metric Tensor and the Basis Vectors If we remember the definition of the basis vectors [Eqs. (3.138)-(3.140)],
GENERAL TENSORS
dXk 2 .- *z'
167
i = 1,2,3,
(3.208)
ax2ax3, + -e2 + rn zz
(3.209)
1 -
8x1,
= -el
which are tangents to the coordinate curves (Fig. 3.7),we can write the metric tensor as the scalar product of the basis vectors:
(3.210) (3.211) Note that the basis vectors 3 i are given in terms of the unit basis vectors of the Cartesian coordinate system Zi.Similarly, using the definition of the metric tensor with the contravariant components,
(3.212)
we can define the new basis vectors
$2
as
(3.2 13) which allows us to write the contravariant metric tensor as the scalar product
The new basis vectors, 22,are called the inverse basis vectors. Note that neither of the basis vectors, 2i or are unit vectors and the indices do not refer to their components. Inverse basis vectors are actually the gradients:
Ti,
Hence they are perpendicular t o the coordinate surfaces, while Ti are tangents to coordinate curves. Usage of the upper or the lower indices for the basis vectors is justified by the fact that these indices can be lowered or raised
168
GENERALIZED COORDINATES AND TENSORS
by the metric tensor as
(3.216) (3.2 17) 3
(3.2 18) k=l
(3.219) (3.220) Similar1y,
(3.221) (3.222)
(3.223) (3.224) (3.225) 3.4.6
Displacement Vector
In generalized coordinates the displacement vector between two infinitesimally close points is written as
(3.226) -
Ti&i
(3.22 7 )
+a
= &lTfl+c L z 2 z ) z
3 2 3
(3.228)
Using the displacement vector [Eq. (3.228)], we can write the line element as
(3.229) (3.230) (3.231)
GENERAL TENSORS
If we move along only on one of the coordinate curves, say covered is
?El,
169
the distance
Similarly, for the displacements along the other axes we obtain
For a general displacement we have to use the line element [Eq. (3.231)]. For orthogonal generalized coordinates, where
(Ti. Tj) = 0, 2 # j ,
(3.235)
the metric tensor has only the diagonal components and the line element reduces to
+ ds$ + ds$ (3.236) = 911 + g22 (a”)’ + 933 ( & 3 ) 2 (3.237) + ( 2 2 . 2 2 ) (&’))’+ (T3.2 3 ) (fi’))”.(3.238) = (21 . 21)
ds2 = ds$
3.4.7
Transformation of Scalar Functions and Line Integrals
As in orthogonal transformations, value of a scalar function is invariant under generalized coordinate transformations, hence we write Q ( x ~ , x ) ’ , x =~ )~ ( E ’ , E ~ , E or~ ) = @(21,2)’,23).
(3.239) (3.240)
+ The scalar product of two vectors, 3 and b , is also a scalar, thus preserving its value. In generalized coordinates we write it as
(3.241) (3.242) Using the transformation equations,
(3.243) (3.244)
170
GENERALIZED COORDINATES AND TENSORS
it is clear that it has the same value that it has in Cartesian coordinates: 3
.-
a . h =?Phi
(3.245) (3.246) (3.247) (3.248) (3.249)
In the light of these, a given line integral in Cartesian coordinates,
can be written in generalized coordinates as
(3.251) (3.252)
We can also write I as
In orthogonal generalized coordinates, only the diagonal components of the metric tensor are nonzero. hence I becomes
I
=
s
gI1v1&1
+ g22v2&2 + g 3 p 3 d z 3 .
7
(3.254)
It is important to keep in mind that a vector exists independent of the coordinate system used to represent it. In other words, whether we write in Cartesian coordinates as
d = v121+ v222 + v323 = v22,
7
(3.255)
or in generalized coordinates as
d
it is the same vector. Hence the bar on is sometimes omitted. We remind the reader that ??i are not unit vectors in general. Covariant components of
GENERAL TENSORS
J
171
are found as
(3.258) (3.259) (3.260) (3.261) J
Similarly, using the inverse basis vectors, e components as
z
, we
can find the contravariant
(3.262) (3.263) (3.264) (3.265) The two types of components are related by
V J= p V i .
(3.266)
We can now write the line integral [Eq. (3.250)] in the following equivalent ways:
I = = =
J' ? . J'
1.
(&'??1+
&'?)z
+ a3T3)
+ ( v . $ ~& 2 )+ (7.23) a 3
a 1
&'+V2
a2+V3
a3
(3.267) (3.268) (3.269) (3.270)
3.4.8 Area Element in Generalized Coordinates Using t h 2 expression for the area of a parallelogram defined by two vectors, * a and b , as area = 1 3x
71,
(3.271)
we write the area element in generalized coordinates defined by the infinitesimal vectors & E l 2 1 and &E2?2 (Fig. 3.9) as fiZlf2 =
x
Tz/&1a2,
(3.272)
172
GENERALIZED COORDINATES AND TENSORS
Figure 3.9
Area element in generalized coordinates.
Similarly, the other areas are defined:
A
In orthogonal generalized coordinates, where the unit basis vectors, i2i = Ti/ lT.;l,i = 1,2,3, satisfy (3.275) (3.276) (3.277) we can write
where the area element is oriented in the & direction. Similarly, we can write the area elements dZZ3?1 = =
and
I&
&fi3
s 2
(3.280)
g2,
(3.281)
173
GENERAL TENSORS
u and v coordinates defined on a surface.
Figure 3.10
3.4.9
Area of a Surface
A surface in three dimensional Cartesian space can be defined either as
x3 = f(X1,XZ) or in terms of two coordinates (parameters), u and v, defined on the surface as (Fig. 3.10)
x 1 = x y u ,v), x2 = x2(u,v), x3 = x3(u,v).
(3.284) (3.285) (3.286)
The u and v coordinates are essentially the contravariant components of the coordinates that a two-dimensional observer living on this surface would use, that is,
x x
-1
= u,
(3.287)
-2
= v.
(3.288)
We can write the infinitesimal Cartesian coordinate differentials, dx', dx2,dx3, corresponding to infinitesimal displacements on the surface, in terms of the surface coordinate differentials, d u and dv, as
dx'
dX = - du
dU
dX2
dX +dv, dV
dX2
dx2 = - du + - dv, dU dV ax3 8x3 dx3 = - du + - dv. dU dV
(3.289) (3.290) (3.291)
174
GENERALIZED COORDINATES AND TENSORS
We now write the distance d s between two infinitesimally close points on the surface entirely in terms of the surface coordinates u and v as
+ (dx2)’ + (dx’)’ 2 + +
d s 2 = (dx’)’ =
[(g)(g)2($3’1
+
(3.292)
du2
I‘):(
[(g)2 + (g)’+
dv2.
(3.293)
Comparing this with the line element for an observer living on the surface:
(3.294)
d s 2 = gij d u d v , i = 1 , 2 , = gZlu d u 2
+ 2g,,
dudv
+ guv d v 2 ,
(3.295)
we obtain the components of the metric tensor as 2
g u u = ( g ) guv =
+(g)2+(g), 2
(3.296)
ax1 ax1 + -ax2 ax2 8x3 8x3 -+ -d u dv
du dv ’
d u dv
(3.297)
2
guu=(g)2+(g)2+(!$)
(3.298)
Since the metric tensor can also be written in terms the surface basis vectors, 4 e z L and Tu, as
d s 2 = (2% . 3% dU2 )
+ 2 ( 2% . 7?u)d u d v + ( Zv. T u d)v 2 ,
(3.299)
we can read T Uand 7?ufrom Equations (3.296)-(3.298) as
-
ax1,
e
= -el
e
= -el
aU
ax1, dv
ax2, ax3+ -e2 + -e3, dU du dx2ax3 + 8v + -Z3. 8U -e2
(3.300)
(3.301)
Note that the surface basis vectors are given in terms of the Cart,esian unit basis vectors ( Z l , Z2,Z3). We can now write the area element of the surface in terms of the surface coordinates as
d 3 u u = T Ud u x
ZVd v ,
(3.302)
GENERAL TENSORS
175
which, after substituting Equations (3.300) and (3.301), leads to
db,,
=
f
[(--du dv
-
du dv
-d u dv
(3.303)
d u dv
which can also be written as
The signs f correspond to proper and improper transformations, respectively. Using Equation (3.303), we can write the magnitude of the area element as
/dZuvl = JEG where
-
F2 dudv,
E=(E)2+(Z)2+(E)
(3.305)
2
, (3.306)
(3.308) Integrating over the surface, we get the surface area (3.309)
(3.310)
which is nothing but the Equation (2.205) we have written for the area of a surface in the previous chapter. If the surface S is defined in Cartesian coordinates as z3- f(x1,x2) = 0, we can project the surface element, d 3 , onto the x1x2-plane as d x 1 d x 2 = ( E . z3)d a = cosyda, where 6 = 5,’1 5 1is the unit normal to the surface and integrate over the region RXlx2,which is the projection of S onto the
176
GENERALIZED COORDINATES AND TENSORS
A
.3
Figure 3.11
Projection of the surface area element
z1z2-plane (Fig. 3.11). Since the normal is given as
3=
8.f
8f
we write the surface area as
S
=//do
= /./nx,z2(l/cos7)dx'dx2
The two areas [Eqs. (3.310) and (3.311)] naturally agree. Example 3.2. Curvilinear coordinates o n the plane: Transformations from the Cartesian to curvilinear coordinates on the plane, say the z1x2plane, is accomplished by the transformation equations z1 = 2 1 ( U , V ) ,
(3.312)
z2= 2 ( U , V ) ,
(3.313)
x
3
= 0.
(3.314)
Metric tensor can easily be constructed by using Equations (3.296)-(3.298). Area element is naturally in the z3direction and is given as
(3.315)
GENERAL TENSORS
177
Taking the plus sign for proper transformations, we write the magnitude of the area element as
(3.316) In other words, under the above transformation [Eqs. (3.312)-(3.314)], the area element transforms as
(3.317) Notice that on the x1x2-plane
(3.318) Applying these to the plane polar coordinates defined by the transformation equations
where u = p and v
=
x1 = pcos4,
(3.319)
x2 = p sin 4,
(3.320)
4, we can write the line element ds2 = dp2
as
+ p2 dq5’.
(3.321)
Since
(3.322) the area element becomes dg 3.4.10
=p
dpdd.
(3.323)
Volume Element in Generalized Coordinates
In Cartesian coordinates the scalar volume element is defined as d r = C1 . (C2 x
Z3)
dzld~~d.~.
(3.324)
Since the Cartesian basis vectors are mutually orthogonal and of unit magnitude, the infinitesimal volume element reduces t o dr
= dz1dz2dz3.
(3.325)
In generalized coordinates we can write the scalar volume element dr’, which is equal to d7, as the volume of the infinitesimal parallelepiped with the sides defined by the vectors 2121, 2 2 2 2 , 2 3 2 3
(3.326)
178
GENERALIZED COORDINATES AND TENSORS
as
dr' =
(&&l).
(Z2&2x ?3&3)
=2 1 .( 2 2x
T3)&1&2&3.
(3.327) (3.328)
Using Equation (3.143), this can also be written as (3.329)
A tensor that transforms as
is called a tensor density or a pseudotensor of weight w. Hence the coorwhich transforms as dinate volume element, (3.331) is a scalar density of weight now transforms as
~
1. Volume integral of a scalar function p(Z1, T2, z3)
(3.332) In orthogonal generalized coordinates the volume element is given as dr'
3.4.11
=131
1 2 2
a21 1 2 3 a3/
=
1pq 1
=
&&& d z 1 d z 2 d z 3 .
~ 1 ~ ~ &l&2&3 1~ 1
(3.333) (3.334) (3.335)
lnvariance and Covariance
We have seen that scalars preserve their value under general coordinate transformations that do not involve scale changes. Magnitude of vectors and the trace of second-rank tensors are also other properties that do not change under such coordinate transformations. Properties which preserve their value under coordinate transformations are called invariants. Identification of invariants in natiirc is very important in understanding and developing new physical theories. An important property of tensors is that tensor equations preserve their form under coordinate transformations. For example, a tensor equation given as
DIFFERENTIAL OPERATORS IN GENERALIZED COORDINATES
179
transforms into
Even though the components of the individual tensors in a tensor equation change, the tensor equation itself preserves its form. This useful property is called covariance. Since the true laws of nature should not depend on the coordinate system we use, it should be possible to express them in coordinate independent formalism. In this regard, tensor calculus plays a very significant role in physics. In particular, it reaches its full potential with Einstein's special theory of relativity and the general theory of relativity. DIFFERENTIAL OPERATORS IN GENERALIZED COORDINATES
3.5 3.5.1
Gradient
We first write the differential of a scalar function @(Zi)as
a@
a@ a@ + - dZ2 + - dZ3. z1 z2 z3
d@ = - dZ1
(3.338)
Using the displacement vector written in terms of the generalized coordinates and the basis vectors 2 i as
+d Z 2 2 2 +dZ323,
d 7 = &'$I
(3.339)
we rewrite dQ, as d@ = T @ . d 7
(3.340)
+ dZ3$3)
(3.341)
to get
+
d@ = T@. (&El21 d Z 2 2 2 =
(?a.
21)
dZ1+ (T@. 2 2
)
dZ2
+
$3)
dZ3.
(3.342)
Comparing with Equation (3.261), this gives the covariant components of the gradient in terms of the generalized coordinates as (3.343) In orthogonal generalized coordinates, where the unit basis vectors are defined as el=,-h
3 e l
lesl
3 -
e l
3
-3
2
&'e2=-
e 2
d=,
2
e3=-
e 3
6'
(3.344)
Equation (3.343) gives the gradient in terms of the generalized coordinates and their unit basis vectors:
180
GENERALIZED COORDINATES AND TENSORS
Figure 3.12
3.5.2
Volume element used in the derivation of the divergence operator.
Divergence
To obtain the divergence operator in generalized coordinates, we use the integral definition [Eq. (2.309)] (3.346) where S is a closed surface enclosing a small volume of AV. We confine ourselves to orthogonal generalized coordinates so that the denominator can be taken as (3.347) For the numerator we consider the flux through the closed surface enclosing the volume element shown in Figure 3.12. We first find the fluxes through the top and the bottom surfaces. We chose the location of the volume element such that the bottom surface is centered at PI = (EA,Ei,O) and the top surface is centered at P2 = (?i$,zi, AT3).We write the flux through the bottom surface as (3.348) (3.349) -
where we used is.
x3for the component of 2 along the unit basis vector -
A3 =
+ -
A
.E3.
g3,
that
(3.350)
DIFFERENTIAL OPERATORS IN GENERALIZED COORDINATES
181
-
The minus sign is due to the fact that 2 3 and the normal - to the surface are in opposite directions. Note that @, &, and 713 are all functions of position. For the flux over the top surface we write
We now have a plus sign, since 2 3 and the normal are in the same direction. Since we use orthogonal generalized coordinates, the other components of + A do not contribute to the flux through these surfaces. Since the righthand side of Equation (3.352) is t o be evaluated at ($j,zg,AT3) , we expand
( 4 5 7 6 2 3 )in Taylor series about
($,,?i$,O)
and keep only the first-order
terms: -
&&A3
--2
,x ,AZ3
=
fi&Z
Substituting this into Equation (3.352), we obtain
Since the location of the volume element is arbitrary, we drop the subscripts and write the net flux through the top and the bottom surfaces as
Similar terms are written for the other two pairs of surfaces, giving
(3.356)
182
GENERALIZED COORDINATES AND TENSORS
4 x3
/==&+;* cll -X1 Figure 3.13 Closed path used in the definition of curl, where AT2 and AT3 represent the change in coordinates between the indicated points.
Substituting this into Equation (3.346) with Equation (3.347), we obtain the divergence in orthogonal generalized coordinates as
L
(3.357)
3.5.3 Curl We now find the curl operator in orthogonal generalized coordinates by using the integral definition [Eq. (2.310)]:
(3.358)
where C is a small closed path bounding the oriented surface AS. The outward normal to d d is found by the right-hand rule. We pick a single component of x A by pointing d d in the desired direction, say gl.In Figure 3.13 we show the outward unit normal 6 found by the right-hand rule, pointing in the direction of Z1, that is, 6 = gl.We now write the complete line integral
(a'
->
DIFFERENTIAL OPERATORS IN GENERALIZED COORDINATES
183
over C as +
J
i x - d f - ; Ca+Cb+Cc+Cd
A .d7
( 2 .&)
&I
= L.+C*+cc+cd
q
+ (2.
&2
+ ( 2 .T3)&3, (3.359)
where we have used Equation (3.268). We first consider the segments C, and C,. Along C, we write
+ where 2 2 = A . & . We now write the Taylor series expansion of about Po = (zi,?E;,Ti) with only the linear terms:
(z3-
zi).
6
3
2
(3.361)
(FA ,Z;,Zg)
Along C, we have T3 = T i and TI
= Ti;hence
Equation (3.360) becomes
(3.362) Next, to evaluate
+
jCC A . d 7 , we write
184
GENERALIZED COORDINATES AND TENSORS
-
We again use the Taylor series expansion [Eq.(3.361)]of 6 x 2 about Po = (zA,Ti,zi) with only the linear terms. Along the path C,, we have T3 = Ti AT; and z1= zA,which gives
+
(3.364)
Using this n Equation (3.363), we obtain
k x+
(3.365)
This allows us to combine the integrals in Equations (3.362) and (3.365) to yield
Since our choice of the point (3$, Tg,T i ) is arbitrary, we write this for a general point as (3.367)
A similar equation will be obtained from the other two segments as (3.368)
DIFFERENTIAL OPERATORS IN GENERALIZED COORDINATES
185
Addition of Equations (3.367) and (3.368) yields
Using this result in Equation (3.358) gives the component of direction of 21 as
3 x 2 in the
A similar procedure yields the other two components as
and
The final expression for the curl of a vector field in orthogonal generalized coordinates can now be given as
(3.373) which can also be expressed conveniently as
. (3.374)
186
GENERALIZED COORDINATES AND TENSORS
3.5.4
Laplacian
Using the results for the gradient and the divergence operators [Eqs. (3.345) and (3.357)], we write the Laplacian for orthogonal generalized coordinates as (3.375)
3.6
ORTHOGONAL GENERALIZED COORDINATES
The general formalism we have developed in the previous sections can be used to define new coordinate systems and t o study their properties. Depending on the symmetries of a given system, certain coordinate systems may prove to be a lot easier to work through the mathematics. In this regard, many different coordinate systems have been designed. To name a few, Cartesian, cylindrical, spherical, paraboloidal, elliptic, toroidal, bipolar, and oblate spherical coordinate systems can be given. Among these Cartesian, cylindrical, and spherical coordinate systems are the most frequently used ones, which we are going to discuss in detail. Historically, the Cartesian coordinate system is the oldest and was introduced by Descartes in 1637. He labeled the coordinate axes as x,y,and z :
x1 = 5 , x2 = y, x3 = z A
A
(3.376) (3.377) (3.378)
A
and used i , j , k for the unit basis vectors: A
A
.
(3.379)
e l = 2,
A
-32
h
(3.380)
=j, A
h
e3
= k.
(3.381)
In Cartesian coordinates motion of a particle is described by the radius or the position vector, 7 ( t ) ,as -+ T ( t )= x(t$+ y(t)T
+ z(t)Z,
(3.382)
ORTHOGONAL GENERALIZED COORDINATES
187
where the parameter t is usually the time. The velocity, T ( t )and , the acceleration, i T ( t ) ,are obtained as
(3.383)
z i+$ h
= ~
h
+z
A
(3.384)
j k, d2T
a ( t ) = - d= -3 dt
(3.385)
dt2
A
h
A
(3.386)
=xi+yj+zlc.
Example 3.3. Circular motion: Motion of a particle executing circular motion can be described by the parametric equations
x ( t ) = a0 coswt;
(3.387)
y ( t ) = a0 sinwt,
(3.388) (3.389)
z ( t ) = zo. Using the radius vector + T ( t )= a0 cos wt i
h
+ a0 sin wt j + zo k , h
A
(3.390)
we can obtain the velocity T ( t )as + 2, ( t ) = -uow sin w t i
h
+ aow cos wt j
A
(3.391)
and the acceleration Z ( t )as + a ( t ) = -aow
A
2
= -w”(t).
3.6.1
A
cos wt i - uow2 sin wt j
(3.392) (3.393)
Cylindrical Coordinates
Cylindrical coordinates are defined by -1
(3.394)
x =P,
-2
x x
-3
=4,
(3.395)
z.
(3.396)
=
They are related to the Cartesian coordinates by the transformation equations (Fig. 3.14)
x
= p cosq5,
y
= p sin
z
=
z,
4,
(3.397) (3.398) (3.399)
188
GENERALIZED COORDINATES AND TENSORS
Figure 3.14
Cylindrical coordinates: Coordinate surfaces for p, 4, z and the
unit basis vectors.
where the ranges are given as p E [ O , o o ] , $ E [0,2n],2 E [ O , o o ] . Inverse transformation equations are written as p=
VGqF,
(3.400) (3.401)
2
(3.402)
= 2.
We find the basis vectors [Eq. (3.208)], -=,
e
i =
axk
-,
(3.403)
zz
as f
e
+ sin 4 j , I X,,l = 1, 4 = - p s i n 4 i + p c o s 4 j , lZ41= p,
Z p= cos 4 i
3
e
h
h
1=
h
2 = 2
f
h
e3=Zz=k,
(3.404)
A
/+ ezI=l,
(3.405)
(3.406)
The unit basis vectors are now written as h
h
h
e l = ep = cos$ i + s i n $ j ,
-
-
A
A
h
e2
h -
= E4 =
-
sin $ i
(3.407)
+ cos $ j , A
(3.408)
h A
e3 =
e, = k .
(3.409)
ORTHOGONAL GENERALIZED COORDINATES
Figure 3.15
189
Infinitesimal displacements in cylindrical coordinates.
It is easy to check that the basis vectors are mutually orthogonal; hence they satisfy the relations (Fig. 3.14)
(3.410)
It is important to note that the basis vectors, ?i, are mutually orthogonal; however, their direction and magnitude depends on position. We now write the position vector, 7, and the infinitesimal displacement vector d 7 [Eq. (228)l as
7= p T p + $ 2 6 d
+z T z , 7 = d p T 0+d$T+ +d z T z .
(3.411) (3.4 12)
From the line element ds2 = d 7 . d 7
(3.413)
+
= dp2 ( T f . Z f ) d$' = dp2
(2+ .?+)
+ p2dq52 + dz2
= Q f f dP2
+ g+dJd42 + g z z
+ dz2 (2,. T z )
(3.414) (3.415)
dZ2,
(3.416)
we obtain the metric tensor: (3.417)
190
GENERALIZED COORDINATES AND TENSORS ..
We construct the contravariant metric tensor, g z J , by using the inverse basis vectors [Eq. (3.213)],
(3.418) (3.4 19) (3.420) which are found by using the inverse transformation equations [Eqs. (3.400)-(3.402)] as
dpi+ -
-3 dp, + -dpk dy dz X Y (x2 y2)1/Zi (x2 y 2ye = --z -3
2 1-Z P =
(3.421)
ax
h
+
P
+
(3.423)
P h
h
-2e
(3.422)
+
+
= C O S ~i + s i n $ j ,
(3.424)
84, dqk + -3 -- 3 4 = -t dx dy
(3.425) (3.426) (3.427) (3.428)
-3
e
- e--fz = k . A
(3.429) . .
We can now write the contravariant metric tensor, 9'3, as ..
gZ3= -ie
=(
. -ej
1
(3.430) 0
0
$-.
0
0
0 0 ) .
(3.431)
1
Note that
(3.432) (3.433) (3.434)
ORTHOGONAL GENERALIZED COORDINATES
191
Line integrals in cylindrical coordinates are written as
(3.435) (3.436) where
(3.437) (3.438) (3.439) Area elements in cylindrical coordinate (Fig. 3.15) are given as
(3.440) (3.441) (3.442) while the volume element is d r = dp(pd4)dz
(3.443) (3.444)
= pdpd4dz.
Applying our general results to cylindrical coordinates, we write the following differential operators: Gradient [Eq. (3.345)]:
(3.445) Divergence [Eq. (3.357)]:
(3.446) where --$
A,=A.Z,,
+
+ -
A,=A.Z$, A z = A . k .
(3.447)
Curl [Eq. (3.373)] :
(3.448)
192
GENERALIZED COORDINATES AND TENSORS
+
where A, = A . C,, A4 = expressed as
+
A .Z4, A,
=
+ A . k . Curl can also be conveniently
(3.449)
Laplacian [Eq. (3.375)] 1d
[
d@
P @ ( p , 4 , z ) = -- ppap iip]
1 d2@ d2@ + -p2 a@ dz2'
(3.450)
+
Example 3.4. Acceleration i n cylindrical coordinates: In cylindrical coordinates the position vector is written as ---f
r
= p cos
4 ;+
p sin 4
3 + zZ.
(3.451)
Using the basis vectors Z,,Z$,z [Eqs. (3.407)-(3.409)], we can also write this as
(7t.K)Z
+ T = (?.EP)E,+(?;).2$)z?4+
+ zk.
(3.452)
A
= pZ,
Since the basis vector particle is written as
(3.453)
Z, changes direction with position,
velocity of a
(3.454) A
Using Equation [3.407], we write the derivative of the basis vector, e,, as A
h
e , = 4(- s i n 4 i =
+ cos4 j ) h
$Z&
(3.455) (3.456)
thus obtaining the velocity vector + v = pZ,
+ p&4 + Zk. h
(3.457)
To write the acceleration, we also have t o consider the change in the direction of Ed:
ORTHOGONAL GENERALIZED COORDINATES
Figure 3.16 basis vectors.
Spherical coordinates: Coordinate planes for
T,
193
8, q5 and the unit
Using Equation (3.456) and h
&- cos 4 i
h
k4 = =
A
-
sin 4 j )
(3.459)
-4zp,
(3.460)
we finally obtain
Zp + (&
3.6.2
+ 2b&)z4+ Zk. h
(3.461)
Spherical Coordinates
Spherical coordinates (r,8,4) are related to the Cartesian coordinates by the transformation equations (Fig. 3.16)
4, y = r sin 0 sin 4,
(3.462)
z
(3.464)
x
= r sin 0 cos
(3.463)
= r cos 0,
whcre the ranges are given as
r E [ 0 , 4 , 0 E [0,7d, 4
E
[0,27d.
(3.465)
194
GENERALIZED COORDINATES AND TENSORS
The inverse transformations are (3.466) (3.467) (3.468) We write the radius vector as + r =zi+yj+zk A
h
-
= r sin B cos
.
(3.469)
4T+ r sine sin 4 3+ r cos 0 X.
(3.470)
Calling -1
x = r,
we write the basis vectors, 3
e
1=
3
$i
z2= 8, z3 = 4, dXj
= -,
as
zz
h
A
A
-f?,=sinBcosq5i+sin0sin$j+cose h
e2 =2 0
(3.471)
(3.472)
k, h
A
=rcosecosdi+rcosOsin4j-rsinB
3
A
k,
(3.473)
A
(3.474)
e3=-f?4=-rsinBsin$i+rsinBcos4j.
Dividing with their respective magnitudes,
\Tr1= 1,
I ~ Q \= r,
(3.475)
1241 = rsin8,
gives us the unit basis vectors:
Z,= sin 0 cos 4 T+ sin 0 sin $ 3+ cos 0 k , h
(3.476)
h A
ee =cosQcos$;+cosBsinq53-sintl A
24 =
-
sin
i
+ cos $ j ,
k,
(3.477)
A
(3.478)
which satisfy the relations A
e,
-
A
A
= ee x E d ,
(3.479) Using the basis vectors, we construct the metric tensor
(3.480) (3.481)
ORTHOGONAL GENERALIZED COORDINATES
Figure 3.17
195
Infinitesimal displacements in spherical coordinates.
which gives the line element as
(3.482)
ds2 = gijdTi&? = dr2
+ r2 de2 + r2sin28 dd2.
(3.483)
The surface area elements (Fig. 3.17) are now given as
dore = r drd8, dgr4 = r sin 8 drdq5,
(3.484) (3.485)
doe4 = r2sin 8 dedd,
(3.486)
d r = r2 sin8 drdedq5.
(3.487)
while the volume element is
Following similar steps to cylindrical coordinates, we write the contravariant components of the metric as
1
0
0
..
(3.488) r2 sin2 0
Using the metric tensor, we can now write the following differential operators for spherical polar coordinates:
196
GENERALIZED COORDINATES AND TENSORS
G r a d i e n t [Eq. (3.345)]:
a@, + --eQ Id@,. 1 d@, +r 80 rsine
?@(r, 0 , 4 )= -e, dr
(3.489)
Divergence [Eq. (3.357)]: +
d ( r 2sin OA,) d ( r sin OAQ) r2 sin 0 ae 1 d(r2AA,) 1 d(sin6'Ao) 1 dA, - -~ fr2 dr rsin6' d0 rsin6' 84 '
?.A=-
(3.490)
+
+--
(3.491)
where A,
--+
---t
=
A GT,AQ= A .EQ, A, =
(3.492)
dr
which can also be conveniently expressed as h
e,
A,
I,
r 20 rsin6' &$,
asd
?xA+ = - d e r2 t [ sin $1 0
(3.493)
t
rAQ rsinBA6
---t + + where A, = A . E,, A0 = A . EQ, A, = A . e+. Laplacian [Eq. (3.375)] :
(3.494)
(3.495) -
1
d2@
(3.496) E x a m p l e 3.5. Motion i n spherical coordinates: the position vector,
In spherical coordinates
ORTHOGONAL GENERALIZED COORDINATES
197
t'
Figure 3.18
Basis vectors along the trajectory of a particle.
is written as + r = rsinecos45+rsinOsin4 ?+rsino
Z,
+ r = re,.
(3.497) (3.498)
For a particle in motion the velocity vector is now written as h
u
--f
= G,
+re,.
(3.499)
Since the unit basis vectors also change direction with position (Fig. 3.18), we write the derivatives e , , e e , and e+ as h
h
6, = (cosocos+b-sinesin4$):+ -
=
sine
(cososin4b+sinecos4Q)T
;4
(3.500)
bz8 + sin e &z@,
(3.501)
A
ee = (-sinecos4 i - c o s e s i n 4 -
A
coso
= -I%,
ex
ed = -cos@
= -sin
(3.503) .,-.
4i - s i n 4 $j $2, - cos e &. . A
4)~
b+cos~cos~
(3.502)
+ cos 0 &4,
h
$)a+ (-sinesin4
(3.504) (3.505)
198
GENERALIZED COORDINATES AND TENSORS
Velocity [3.499] is now written as
which also leads to the acceleration
PROBLEMS
1. Show that the transformation matrices for counterclockwise rotations through an angle of 0 about the x1-,x2-,and x3-axes, respectively, are given as
1
0
0 -sin0 cosd
cos0
2 . ShowtJhat z’@b .
0
cos0 -sin0
sin0
the tensor product defined as
0
;5’ @ 3 is the transpose of
3. Show that the trace of a second-rank tensor,
is invariant under orthogonal transformations.
4. Using the permutation symbol, justify the formula
PROBLEMS
199
5 . Convert these tensor equations into index notation:
6)
(ii)
6. Write the components of the tensor equation u,
= vJvJwLl
i,j
=
1,2,3,
explicitly.
7. What are the ranks of the following tensors: (i) (ii) (iii)
KZJk1DkBrnAZJ, AZBB,WJkuk, A ~ J A ~ ~ B ~ .
8. Write the following tensors explicitly, take i, j = 1 , 2 , 3 :
(i) (ii)
9. Let
A i j , Bij,
AiBiWjkuk, AijAijBk.
and Ci be Cartesian tensors. Show that
is a first-rank Cartesian tensor. 10. Show that the following matrices represent proper orthogonal transformations and interpret them geometrically: (i)
200
GENERALIZED COORDINATES AND TENSORS
(ii)
cos30O -sin30°
-sin3O0 -cos30° 0
0 0 -1
11. Show that
12. Show that in cylindrical coordinates the radius vector is written as + r - p'Zp+zk.
h
13. Parabolic coordinates, (T1,Z2,Z3), usually called (7, [, 4 ) ,are related to the Cartesian coordinates, (2,y, z),by the transformation equations
x1 = x
x
2
= q 1x1 < 1. 1 . 2 . . .72 1
+ +
'
+
+
,
,
12. Using the binomial expansion, show the expansions
and
13. Find the Maclaurin expansions of sin2 x, cos2 x, and sinhx. Find their radius of convergence and show that they converge uniformly within this radius.
PROBLEMS
329
14. Find the Maclaurin expansion of 1/ cos x and find its radius of convergence.
15. Find the limit lim
Z+O
xe" sin 22 cos x - 2ex '
Using the series expansions of elementary functions about x = 0, find an approximate expression good to the fourth order in the neighborhood of x = 0.
16. Find the interval of convergence of the following series:
17. Evaluate the following definite integral to three decimal places:
18. Expand t,he following functions:
6)
1
(ii)
f(x) = G>
in Taylor series about x
=
f(x) =
x+2 (x 3)(x + 4) '
+
1 and find their radius of convergence.
19. Expand the following function in Maclaurin series:
20. Obtain the Maclaurin series
Show that the series converges for x = -1 and hence verify
c 00
log2
=
(- 1)"+' n n=l
330
SEQUENCES AND SERIES
21. Expand 1/x2 in Taylor series about x = 1. 22. Find the first three terms of the Maclaurin series of the following functions: (i)
f ( x ) = cos[ln(x
(ii)
sin x f ( x ) = -.
+ I)].
X
(iii)
1
f(x) =
JiT7G.
23. Using the binomial formula, write the first two nonzero terms of the series representations of the following expressions about x = 0:
6)
f(x) =
+d r n.
(4x - 1)* 1+ x 3
+7x4
24. Use the L’HBpital’s rule to find the limit lim
2-0
d2T-G 4x3 - 3x2
and then verify your result by finding an appropriate series expansion.
25. Find the limit lim . z+o 5x3
+ 2x2
by using the L’H6pital’s rule and then verify your result by finding an appropriate series expansion.
26. Evaluate the following limits and check your answers by using Maclaurin series: (i)
limx,o
1 - ex -
(ii)
limx-o
:[
X
-
1-
ex
1
-
1
.
CHAPTER 7
COMPLEX NUMBERS AND FUNCTIONS
As Gamow mentioned in his book, One Two Three ... Infinity: Facts and Speculations of Science, the 16th-century Italian mathematician Cardan is the first brave scientist t o use the mysterious number called the imaginary i with the property i2 = -1 in writing. Cardan introduced this number t o express the roots of cubic and quartic polynomials, albeit with the reservation that it is probably meaningless or fictitious. All imaginary numbers can be written as proportional to the imaginary unit i . It is also possible to define hybrid numbers, a ib, which are known as complex numbers. Complex analysis is the branch of mathematics that deals with the functions of complex numbers. A lot of the theorems in the real domain can be proven considerably easily with complex analysis. Many branches of science and engineering, like control theory and signal analysis, make widespread use of the techniques of complex analysis. As a mathematical tool, complex analysis offers tremendous help in the evaluation of series and improper integrals encountered in physical theories. With the discovery of quantum mechanics, it also became clear that complex numbers are not just convenient computational tools but also have a fundamental bearing on the inner workings of the
+
Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
3. SelGuk Bayin 331
332
COMPLEX NUMBERS AND FUNCTIONS
universe. In this chapter, we introduce the basic elements of complex analysis, that is, complcx numbers and their functions.
7.1 T H E ALGEBRA OF C O M P L E X N U M B E R S A wcll-known result from mathematical analysis states that a given quadratic equation,
ax2 always has two roots,
x1
+ bx + c = 0, a , b, c E R,
and
52,
(7.1)
and can be factored as
(x - q ) ( x - x2) = 0.
(7.2)
A general cxpression for the roots of a quadratic equation exists, and it is given by the well-known formula
When the coefficients satisfy the inequality b2 - 4ac 2 0, both roots are real. However, when b2 - 4ac < 0, no roots can be found within the set of real numbers. Hence, the number system has to be extended to include a new kind of number, the imaginary i with the property
Now the roots can be expressed in terms of complex numbers as x1,2 =
-b & i d2a
, 4ac - b2 > 0.
(7.5)
In general a complex number, z , is written as 2 =
x +iy,
where .c and y are two real numbers. The real and the imaginary parts of z arc written, respectively, as R e z = x and I m z = y.
(7.7)
Complcx numbers are also written as ordered pairs of real numbers:
z = (x,y).
(7.8)
When y = 0, that is, I m z = 0, we have a real number, z = x,and when x = 0, that is, R e z = 0, we have a pure imaginary number, z = i y . Two complex nurnbers:
THE ALGEBRA OF COMPLEX NUMBERS
333
and
(7.10)
2 2 = (X2,Y2),
are equal if and only if their real and imaginary parts are equal:
(7.11) (7.12)
51 = x 2 ,
y1
= y2.
Zero of the complex number system, = 0, means x = 0 and y = 0. Two complex numbers, zland 2 2 , can be added or subtracted by adding or subtracting their real and imaginary parts separately as 21
+ iy1) f ( 2 2 + i y 2 ) = (51 * 2 2 + i ( Y 1 f Y2)).
(7.13) (7.14)
f 2 2 = (51
Two complex numbers, zland 2122
can be multiplied by first writing
22,
=
(21
+
iYl)(X2
+
(7.15)
iy2)
and then by expanding the right-hand side formally as 2122 = 2 1 x 2
+
i51y2
+
iy122
+
i2YlY2.
(7.16)
Using the property i2 = -1, we finally obtain the product of two complex numbers as 2122 = ( 5 1 5 2 - YlY2)
+
i(x1y2
+
y122).
(7.17)
The division of two complex numbers,
(7.18) can be performed by multiplying and dividing the right-hand side by 21 - 2 1 22
22
+ iy1 +
-iy2
(7.19)
iy2 ' 5 2 - iy2
+ y1y2 + i(YlX2 21y2) 4 + Y22 2 1 x 2 + YlY2 Y1X2 x1y2
-
21x2
-
(
-
22
(x2-iy2):
-
-
4+y;
x;+y;
(7.20)
)
(7.21)
Division by zero is not defined. The following properties of algebraic operations on complex numbers can be shown to follow from the above properties: 21
+
22 = z 2
21,552 21
+
Zl,
= 2221,
+ ( 2 2 + 23) = (21 + 2 2 ) + 23, 21 ( 2 2 2 3 ) = ( 2 1 Z 2 ) z3i 21(Z2
+
23) = 2 1 z 2
+
Z123.
(7.22) (7.23) (7.24) (7.25) (7.26)
334
COMPLEX NUMBERS AND FUNCTIONS
The complex conjugate, z * , or simply the conjugate of a complex number, z , is defined as
z* = x
-
iy,
(7.27)
In general the complex conjugate of a complex quantity is obtained by reversing the sign of the imaginary part or by replacing i with -2. Conjugation of complex numbers has the following properties:
(z1
+ z2)* = z; + z;,
(z1z2)* = z r z ; ,
(7.28) (7.29) (7.30)
z + z* = 2 R e z = 22, z - z* = 2 i I m z = 2iy.
(7.31) (7.32)
The absolute value, ( z (, also called the modulus, is the positive real number
(7.33) (7.34) Modulus has the following properties:
121
= ->
(7.36)
IzI = Iz*I 1
(7.37)
2
(z(
= zz*.
(7.38)
Triangle inequalities, 121
+ z21 I bll + 1221
(7.39)
and Iz1 - z21
2 I l Z l l - Iz211,
(7.40)
derive their name from the geometric properties of triangles and they are very useful in applications. The set of all complex numbers with the above algebraic properties forms a field and is shown as @. Naturally, the set of real numbers, R,is a subfield of @. A geometric representation of complex numbers is possible by introducing the complex z-plane, where the two orthogonal axes, x- and y-axes, represent the real and the imaginary parts of a complex number, respectively (Fig. 7.1).
THE ALGEBRA OF COMPLEX NUMBERS
335
Yf X
Figure 7.1
Complex z-plane.
From the z-plane it is seen that the modulus of a complex number, IzI = T , is equal to the length of the line connecting the point z to the origin 0. Using the length of this line, T , and the angle that it makes with the positive x-axis, 8,which is called the argument of z , usually written as argz, we introduce the polar representation of complex numbers as
z
= r(cos8+
The two representations, z(x, y) and equations
Z(T,
isin8).
(7.41)
O ) , are related by the transformation
x = T cos 0,
(7.42)
y = rsine
and with the inverse transformations r=
Jm',
e = tan-'
(z)
.
(7.43)
Using the polar representation, we can write the product of two complex numbers, z1 = T I (cos 01
+ i sin 81)
(7.44)
and z2 =
r2(cos02+ i s i n Q 2 ) ,
(7.45)
as
ziza = T ~ T Z [ ( C O CS ~O ~S ~ Z- sin81 sin&) = rlrZ[cos(81
+ 8 2 ) + isin(81 + &)I.
+ i(sin81 cos02 + cosel sinez)] (7.46) (7.47)
336
COMPLEX NUMBERS AND FUNCTIONS
In other words, the modulus of the product of two complex numbers is equal to the product of the moduli of the multiplied numbers, lz1z2l
(7.48)
= lz1lIz21 = 7-17-2,
and the argument of the product is equal to the sum of the arguments: argzlz, = argzl
+ argz2 = 81 + 82.
(7.49)
In particular, when a complex number is multiplied with i , its effect on z is to rotate it by 7r/2 : (7.50) (7.51) Using Equation (7.47), we can write
. -t &)I. (7.52) Consequently, if z1 = z2
= ...=
z,
= r(cos8
+ i s i n Q ) ,we obtain
zn = rn [cosn8 + i sin 1281. When
7- =
(7.53)
1, this becomes the famous DeMoivre’s formula:
[cos6
+ i sin 81, = cos 71.8+ i sin no.
(7.54)
The ratio of two complex numbers can be written in polar form as z1 7-1 = -[cos(81 22
7-2
- 6,)
+ i sin(O1 - Q2)],
7-2
# 0.
(7.55)
As a special case of this, we can write z-l
= r-l [cos8 - i sin 81, 7-
# 0,
(7.56)
which leads to z - n = r P T[cos L n8 - i
sin nQ],r
# 0,n > 0;
thus DeMoivre’s formula is also valid for negative powers.
7.2
ROOTS OF A COMPLEX NUMBER
Consider the polynomial
(7.57)
ROOTS OF A COMPLEX NUMBER
zn which has n roots, z sentation of z ,
~
zo = 0,
= z : ' ~ , in
z
n = positive integer,
337
(7.58)
the complex z-plane. Using the polar repre-
= r(cosQ+isinQ),
(7.59)
zo = rO(cos&+isinQO),
(7.60)
we can write Equation (7.58) as
+
rn (cos nQ+ i sin no)= T O(cos 80 i sin 0,) ,
(7.61)
which offers the solutions 1/ n r=rO , nQ+27rk=Qo, k = 0 , 1 , 2,... .
(7.62)
The first equation gives the all equal moduli of the n roots, while the second equation gives their arguments. When k is an integer multiple of n,no new roots emerge. Hence we obtain the arguments of the n roots as (7.63) These roots correspond to the n vertices, (7.64) of an n-sided polygon inscribed in a circle of radius r;ln. Arguments of thesc roots are given as
("",((""."'>.(""+-) n
n
n
2T 2
n
,...,
n
("n
27r. ( n - 1)
n
which arc separated by
ae,
27r
= -.
n
(7.66)
In Figure 7.2 we show the 5 roots of the equation z5
-
1 = 0,
(7.67)
338
COMPLEX NUMBERS AND FUNCTIONS
Figure 7.2
where n
=5
Roots of '2
-
1 = 0.
and 1 = cos 0
+ i sin 0.
(7.68) (7.69)
60 = 0.
(7.70)
20 =
Hence TO
=
1 and
Using Equation (7.66), this gives the moduli of all the roots as 1 and their arguments: arg zi = arg zi-l i = 1,. . . , 5 , as
+ F,
argzl
= 0,
271 + -, 5 271 271. 471 argz3 = - + - = 5 5 5' 4~ 27l 67l argZ4 = - + - = 5 5 5 ' 67l 27l 8.ir argz5 = - + - = -. 5 5 5 argz2
If rn,and
12
=O
(7.71)
are positive integers with no common factors, then we can write
where k = 0 , 1 , 2 , . . . , n - 1.
INFINITY AND THE EXTENDED COMPLEX PLANE
339
7.3 INFINITY AND THE EXTENDED COMPLEX PLANE In many applications we need to define a number,
z
+
00
00,with
the properties
+ z = 00, for all finite z ,
= 00
(7.74)
and
z
‘00= 00
The number
03,
.z
z # 0 but including z = 00.
= 00, for all
(7.75)
which represents infinity, allows us to write
5=0O, z f 0 , 0
(7.76)
and
z
00
=o,
z#O0.
(7.77)
In the complex z-plane, @, there is no point with these properties, thus we introduce the extended complex plane, which includes this new point, 03, called the infinity:
c,
@. = c +{m}.
(7.78)
A geometric model for the members of the extended complex plane is possible. Consider the three-dimensional unit sphere S :
+ x; + xi = 1.
x:
(7.79)
For every point on S, except the north pole N at (O,O,l), we associate a complex number X I +ix2
z=
1 - x3 This is a one-to-one correspondence with the modulus squared,
(7.80)
(7.81) which, after using Equation (7.79), becomes
IzI 2 = -.1+x3
1 - x3
(7.82)
Solving this equation for x3 and using Equations (7.79) and (7.80) t o write and 5 2 , we obtain
XI
z 1=
+ z* 1 + jzI2 ’ z
~
z - z* x2
=i
23 =
~
(1 + lz12)
1zI2 - 1 1zI2 1’
+
(7.83) (7.84)
(7.85)
340
COMPLEX NUMBERS AND FUNCTIONS
T'
1N
Figure 7.3
Riemann sphere and stereorgraphic projections.
This is a one-to-one correspondence with every point of the z-plane with every point, except (0,0, l),on the surface of the unit sphere. The correspondence with the extended z-plane can be completed by identifying the point ( O , O , 1) with m. Note that from Equation (7.85) the lower hemisphere, 2 3 < 0, corresponds to the disc IzI < 1, while the upper hemisphere, 5 3 > 0, corresponds to its outside > 1. We identify the z-plane with the zlsz-plane, that is, the equatorial plane, and use 2 1 - and the 22-axes as the real and the imaginary axes of the z-plane, respectively. In function theory the unit sphere is called the Riemann sphere. If we write z = 2 iy and use Equations (7.83)-(7.85), we can establish the ratios
(zI
+
(7.86) which with a little help from Figure 7 . 3 shows that the points z , 2,and N lie on a straight line. In Figure 7 . 3 and Equation (7.86) the point N is the north pole, (O,O,l), of the Riemann sphere, the point 2 = ( Z ~ , Q , I C ~is) the point at which the straight line originating from N pierces the sphere and finally, z is the point where the straight line meets the equatorial plane, which defines the z-plane. This is called the stereographic projection. Geometrically a stereographic projection maps a straight line in the z-plane into a circle on S , which passes through the pole and vice versa. In general, any circle on the sphere corresponds to a circle or a straight line in the z-plane. Since a circle on the sphere can be defined by the intersection of a plane,
+ bxz +
U Z ~
C Z ~=
d, 0 5 d
< 1,
(7.87)
INFINITY AND THE EXTENDED COMPLEX PLANE
341
with the sphere
x:
+ + 2 52
2 23 =
(7.88)
1,
using Equations (7.83)-(7.85), we can write Equation (7.87) as U(Z
+ z*)
-
bi(z - z * )
+
~ ( 1 21 ~1) =
d(lzI2
+ 1)
(7.89)
or as
+
(d - c ) ( x 2 y2) - 2ax
-
2by
+ d + c = 0.
(7.90)
For d # c this is the equation of a circle, and it becomes a straight line for d = c. Since the transformation is one-to-one, conversely, all circles and straight lines on the z-plane correspond to stereographic projections of circles on the Riemann sphere. In stereographic projections there is significant difference between the distances on the Riemann sphere and their projections on the z-plane. Let ( x 1 , 2 2 ,x3) and (x:,xi,z j ) be two points on the sphere, that is,
x: xi2
+ x2 + 2
2 23 =
+ xi2 + x:
1,
(7.91)
= 1.
(7.92)
We write the distance, d ( z , z’), between these points as [ d ( z , z’)I2 = (XI =2 -2 ( x 4
+(
~ 2 +(23 + x2h. + 2 3 2 ; ) . -
-
(7.93) (7.94)
Using the transformation equations [Eqs. (7.83)-(7.85)] we can write the corresponding distance in the z-plane as (7.95)
If we take one of the points on the sphere as the north pole, N , that is, z’ Equation (7.95) gives
= 03,
(7.96)
+
*
Note that the point z = x iy, where x and/or y are infinity, belongs to the z-plane. Hence it is not the same point as the 00 introduced above.
342
COMPLEX NUMBERSAND FUNCTIONS
Figure 7.4
Graph of f(z)
7.4 COMPLEX FUNCTIONS We can define a real function, f , as a mapping that returns a value, f ( x ) , for each point, x , in its domain of definition:
f :x
---f
f(x).
(7.97)
Graphically, this can be conveniently represented by introducing the rectangular coordinates with two perpendicular axes called the x- and the y-axes. By plotting the value, y = f ( x ) , that the function returns along the y-axis directly above the location of x along the x-axis, we obtain a curve as shown in Figure 7.4 called the graph of f(x). Complex-valued functions are defined similarly as relations that return a complex value for each z in the domain of definition:
f :z
+ f(z).
(7.98)
Analogoiis to real functions, we introduce a dependent variable w and write a complex function as w =f(z).
(7.99)
Since both dependent and independent variables have real and imaginary parts, z=z+iy, w =u+iv,
(7.100) (7.101)
it is generally simpler to draw w and z on separate planes. Now the function w = f ( ~ )which , gives the correspondence of the points in the z-plane to the points in the w-plane, is called mapping or transformation. This allows us to view complex functions as operations that map curves and regions in their
343
COMPLEX FUNCTIONS
Figure 7.5
The w-plane.
domain of definition to other curves and regions in the w-plane. For example, the function
w=d m + i y
(7.102)
+
maps all points of a circle in the z-plane, x2 y2 = c2, c 2 0, to u = c and v = y in the w-plane. Since the range of y is -c 5 y 5 c, the interior of the circle is mapped into the region between the lines -u 5 T: 5 u and u = c in the w-plane (Fig. 7.5). The domain of definition, D , of f means the set of values that z is allowed to take, while the set of values, w = f ( z ) ,that the function returns is called the range of w. A function is called single-valued in a domain D if it returns a single value, w, for each z in D. From now on, we use the term function only for the single-valued functions. Multiple-valued functions like z1I2 or logz can be treated as single-valued functions by restricting them to one of their allowed values in a specified domain of the z-plane. Domain of definition of all polynomials,
f ( z ) = a,zn
+ an-lzn--l +
' ' '
+ ao,
(7.103)
is the entire z-plane, while the function
1 f(z) =
is undefined at the points z = 35,. Each function has a specific real, u(x,y), and an imaginary,
w =f = u(2,y)
+ i U ( Z , y).
(7.104)
~ ( 2 y ),,
part:
(7.105) (7.106)
344
COMPLEX NUMBERS AND FUNCTIONS
Consider f ( z ) = z 3 . We can write
(7.107)
f ( z ) = z3
= z 2z = ((7: = =
(7.108)
+iy)2(z + i y )
[(x2- y2)
(7.109)
+ i(2zy)](z+ iy)
(7.110)
[(2- y2)z - 2xy2]+ i[y(xc”- y2) + 2z2y],
(7.111)
u ( z ,y) = ( x 2- y2)z - 2xy2
(7.112)
+ 2Z2Y.
(7.113)
thus obtaining
and u ( 5 ,y) = y(z2 - v 2 )
For w = sinz, the u((7:,y) and the v(x,y) functions are simply obtained from the expression w = sin(x + iy) as w = sin x cosh y i cos x sinh y.
+
7.5
LIMITS AND CONTINUITY
Since a complex function can be written as
w
=
4 2 , Y)
+
iV(Z,
v),
(7.114)
its limit can be found in terms of the limits of the two real functions u(x, y) and u(z,y). Thus the properties of the limits of complex functions can deduced from the properties of the limits of real functions. Basic results can be summarized in terms of the following theorems: Theorem 7.1. Let f ( z ) = u ( z ,y )
The limit of f ( z ) at
+ iu(lc,y), z = + i y and zo = zo + iyo,
20
(7:
(7.115)
exists, that is, lim f ( z ) = uo Z-ZO
+ iwo,
(7.116)
if and only if
lirn
u ( x , y )= uo,
(7.117)
(s>Y)-(.o,Yo)
(7.118) Theorem 7.2. If fl(z)and exist at 20:
f2(z)
are two complex functions whose limits
(7.119) (7.120)
DIFFERENTIATION IN THE COMPLEX PLANE
345
then the following limits are true: = w l + w2,
(7.121)
= w1w2,
(7.122) (7.123)
The continuity of complex functions can be understood in terms of the continuity of the real functions u and u. Theorem 7.3. A given function f (2) is continuous at zo if and only if all the following three conditions are satisfied:
(i) (ii) (iii)
f(z0)
lim,,,, lim,,,,
exists, f ( z ) exists, f ( z ) = f(zo).
(7.124)
This theorem implies that f ( ~is)continuous if and only if u(z, y) and u(x,y) are continuous.
7.6
DIFFERENTIATION I N T H E COMPLEX PLANE
As in real analysis, the derivative of a complex function at a point, z , in its domain of definition is defined as (7.125) Nevertheless, there is a fundamental difference between the differentiation of complex and real functions. In the complex z-plane a given point z can be approached from infinitely many different directions (Fig. 7.6). Hence a meaningful definition of derivative should be independent of the direction of approach. If we approach the point z parallel to the real axis, AZ = A x , we obtain the derivative
(7.126)
du --+i--. ax -
dv dx
(7.128)
346
COMPLEX NUMBERS AND FUNCTIONS
Z
Az iAy
w
0
Figure 7.6
X
Differentiation in the complex plane.
On the other hand, if z is approached parallel t o the imaginary axis, iAy, the derivative becomes
=
-2-
.du+ dv dY dY
Az =
(7.131)
or
df
-
.du
dv
- - -- 2dz dy dy
(7.132)
For a meaningful definition of derivative, these two expressions should agree. Hence giving us the conditions for the existence of derivative at z as
--ax d y ’
du
dv
(7.133)
dv dX
-%-.
(7.134)
--
.du dY
These are called the Cauchy-Riemann conditions. Note that choosing the direction of approach first along the x- and then along the y-axes is a matter of calculational convenience. A general treatment will also lead t o the same conclusion. Cauchy-Riemann conditions shows that the real and the imaginary parts of a differentiable function are related. In summary, the Cauchy-Riemann conditions have t o be satisfied for the derivative t o exist at a given point. However, as we shall see, in general they are not the sufficient conditions.
DIFFERENTIATION IN THE COMPLEX PLANE
347
Example 7.1. Cauchy -Riemann conditions: Consider the following simple function: f(z)=z
2
,
(7.135)
We can find its derivative as the limit (7.136) =
lim
6-0
{ f (2z + 6))
(7.137) (7.138)
= 22.
If we write the function, f ( z ) = z 2 , as
+ i229,
f ( z ) = ( 2- y2)
(7.139)
we can easily check that the Cauchy-Riemann conditions are satisfied everywhere in the z-plane: dU
dV
dX
dYdU
dX We now consider the function
dY
- = 22 = -,
(7.140)
dV - 2y = --
f(.)
’
2
(7.141)
= IZI
and write the limit (7.142)
+ S)(z* + S*)
-
zz*]
(7.143) (7.144)
6-0
At the origin, z = 0, regardless of the direction of approach, the above limit exists and its value is equal to 0; thus we can write the derivative (7.145) For the other points, if 6 approaches zero along the real axis, S = t, we obtain
dz
(7.146) E+O
=z*+z
(7.147)
348
COMPLEX NUMBERS AND FUNCTIONS
and if 6 approaches zero along the imaginary axis, 6 = i e , we find dz
(7.148) iE-O
= z* - z .
(7.149) 2
Hence the derivative of f ( z ) = IzI does not exist except at z = 0. In fact, the Cauchy-Riemann conditions for f ( z ) = 1zI2 are not satisfied,
dv
dU
-=2x#-=O,
dY dv - = 0 # - - =dU 2 dX dY
(7.150)
dX
unless z
(7.151)
Y>
= 0.
Example 7 . 2 . Cauchy- Riemann conditions: Consider the function IZI
z
# 0,
(7.152)
= 0.
At z = 0 we can easily check that the Cauchy-Riemann conditions are satisfied: (7.153) (7.154) However, if we calculate the derivative using the limits in Equations (7.127) and (7.130), we find
dfo = dz
lim
u ( A x 0, ) - u ( O , O )
Az-0
+i
AX, 0) - v(0,O) AX
]
(7.155)
1
+ i-=l+i
(7.156) (7.157)
and df (0) = lim dz
iAy-0
~ ( 0iAy) , - u(0,O)
I -
= ~A lim Y-o = -1
+i.
(
+i
~ ( 0i A, y ) - v(0,O)
ZAY
1
( i ~ l y ) 1~ ( ~ A Y 1) ~ 2-i ~ i A~y ) ( ~~ A Yi A) y~
+
]
(7.158) (7.159) (7.160)
ANALYTIC FUNCTIONS
349
In other words, even though the Cauchy-Riemann conditions are satisfied at z = 0, the derivative, f ’ ( O ) , does not exist. That is, the Caiichy-Riemann conditions are necessary but not sufficient for the existence of derivative. The following theorem (for a formal proof see Brown and Churchill) gives the sufficient condition for the existence of f ’ ( z ) : Theorem 7.4. If u ( z ,y) and ~ ( zy), are real- and single-valued functions with continuous first-order partial derivatives at ( 2 0 ,yo), then the CauchyRiernann conditions at ( 2 0 ,yo) imply the existence of f ’ ( z 0 ) . What happened in Example 7.2 is that in order to satisfy the CauchyRieniann conditions at ( O , O ) , all we needed was the existence of the firstorder partial dcrivatives of u ( z ,y) and u(z,y ) at (0,O). However, Theorem 7.4 not only demands the existence of the first partial derivatives of u and ‘u at a given point but also needs their continuity at that point. This means that thc first-order partial derivatives should also exist in the neighborhood of a given point for the function to be differentiable. 7.7
ANALYTIC FUNCTIONS
A function is said to be analytic at zo if its derivative, f ’ ( z ) , exists not only at zo but also at every other point in some neighborhood of 2 0 . Similarly, if a function is analytic at every point of some domain D , then it is called analytic in D . All polynomials,
+ a12 + ’ . . + anzn,
f ( z ) = a0
(7.161)
are analytic everywhere in the z-plane. Functions analytic everywhere in the z-plane are called entire functions. Since the derivative of
f
=
1H2
(7.162)
does not exist anywhere except a t the origin, it is not analytic anywhere in the z-plane. If a function is analytic at every point in some neighborhood of a point zo, except the point itself, then the point zo is called a singular point. For example, the function
(7.163) has a singular point at z = 2 . If two functions are analytic, then thcir sum and product are also analytic. Their quotient is analytic except at the zeros of the denominator. If we let f l ( z ) be analytic in domain D1 and let f 2 ( z ) be analytic in domain D2, then the composite function (7.164)
350
COMPLEX NUMBERS AND FUNCTIONS
is also analytic in the domain D1. For example, since the functions f l ( z ) = z2
+ 2 and f i ( z ) = exp(z) + 1
(7.165)
are entire functions, the composite functions
and
are also entire functions.
7.8
HARMONIC FUNCTIONS
+
Given an analytic function, f ( z ) = u iv,defined in some domain, D , of the z-plane, the Cauchy-Riemann conditions,
dv ax dy’ dv du - -ax dy’ du
--_
(7.168) (7.169)
are satisfied at every point of D. Differentiating the first condition with respect to x and the second condition with respect t o y, we get (7.170) For an analytic function the first-order partial derivatives of u and v are continuous, hence the mixed derivatives, d2v/dxdy and d2v/aydx, are equal, and thus we obtain
d2U 8% -+-=o. 8x2 dy2
(7.171)
That is, the real part of an analytic function, u(x, y), satisfies the two-dimensional Laplace equation in the domain of definition D. Similarly, differentiating Equation (7.168) with respect to y and Equation (7.169) with respect to x and then by adding the results, we obtain
d2v d2v -+ - = o . 8x2 dy2
(7.172)
In other words, the real and the imaginary parts of an analytic function satisfy the two-dimensional Laplace equation. Functions that satisfy the Laplace equation in two dimensions are called harmonic functions. They could be
HARMONIC FUNCTIONS
351
used either as the real or the imaginary part of an analytic function. Pairs of harmonic functions, (u, v), connected by the Cauchy-Riemann conditions are called the conjugate harmonic functions.
Example 7.3. Conjugate harmonic functions:
f(x)
=
x3
-
Given the real function
3y2x,
(7.173)
it can be checked easily that it satisfies the Laplace equation
(7.174) Hence it is harmonic and can be used to construct a n analytic function. Using it as the real part, u = x3 - 3y2x, we can find its conjugate pair
du dv as follows: Using the first Cauchy-Riemann condition, - = ax dg' we write
dv
- = 3x2 - 3y2,
dY
(7.175)
which can be integrated immediately t o get
v(x,y) = 3 2 y - y3
+ @(.),
(7.176)
where @(x) is arbitrary at this point. We now use the second Cauchy-
dv
du
to obtain an ordinary differential equaRiemann condition, - = --, dx dy tion for @(x):
+ @'(x) = 6yx,
~ X Y
@'(z)= 0 , solution of which gives @(z) = (7.176) yields v(z, y ) as
v(x, y)
CO.
(7.177) (7.178)
Substituting this into Equation
= 3x2y - y3
+ co.
(7.179)
It can easily be checked that v is also harmonic.
Example 7.4. C - R conditions in polar coordinates: In polar representation a function can be written as
Using the transformation equations
x = r c o s e and y
= rsin0,
(7.181)
352
COMPLEX NUMBERS AND FUNCTIONS
we can write the Cauchy-Riemann conditions as
d u - _1dv _ -
(7.182)
dr r 80' 1 d u dv - _-- rd0 dr'
(7.183)
Example 7.5. Derivative in polar representation: Let us write the derivative of an analytic function, f ( z ) = u(r,0) +iv(r,O ) , in polar coordinates as
df - d u d r - - -dr d z
dz
d u d 0 +z--. d v d r +i-- d u d 0 + -80 d z drdz dOdz'
(7.184)
Substituting the Cauchy-Riemann conditions [Eqs. (7.182) and (7.183)] in Equation (7.184) we write
df dudr d u d 0 .dv dr dud0 - -- T-+ z-+ ir-dz
dr d z
drdz
= -d u [dr -+irg]
dr d z
Since z
= reie, we
drdz
dr d z
+ i K dv [,+ir$]. dr
(7.185) (7.186)
can write (7.187)
Hence the expression inside the square brackets in Equation (7.186) is (7.188) which, when substituted into Equation (7.186), gives (7.189) Following similar steps in rectangular coordinates, we obtain
df - d u .dv _ - - +z-.
dz
7.9
dx
dx
(7.190)
BASIC DIFFERENTIATION FORMULAS
If the derivatives w', wi,and w; exist, then the basic differentiation formulas can be given as
(1) dc dz
- = 0, c E
dz
C,and - = 1 dz
(7.191)
ELEMENTARY FUNCTIONS
dw dz
d(cz) dz
-=c--.
(7.192)
d dwl d ( w 1 + w2) = -+ -. dz
dz
d(WlW2)
dz
d
= w1-
~
dz
353
~
2
dz
+2 dwl dz -202.
(7.193)
(7.194)
(7.195)
(7.196)
d
-zn dz and z
= nzn-l, n
> 0,
(7.197)
# 0 when n < 0 integer.
7.10
7.10.1
ELEMENTARY FUNCTIONS Polynomials
The simplest analytic function different from a constant is z . Since the product and the sum of analytic functions are also analytic, we conclude that every polynomial of order n,
P,(z) = a0
+ a1z + . . . + anzn, a , # 0,
(7.198)
is also an analytic function. All polynomials are also entire functions. The fundamental theorem of algebra states that when n is positive, P,(z) has at least one root. This simple-sounding theorem, which was the doctoral dissertation of Gauss in 1799, has far-reaching consequences. Assuming that z1 is a root of P,, we can reduce its order as
Pn(.) = (2 - Z l ) P n - l ( Z ) .
(7.199)
354
COMPLEX NUMBERS AND FUNCTIONS
Similarly, if to writ,e
z2
is a root of Pn-l(z), we can reduce the order one more time
pn(z) = ( z - z~)(z- .2)Pn-z(z).
(7.200)
Cascading like this we eventually reach the bottom of the ladder as
PTL(Z) = (2 - z1)(z
- 22)
' . . ( z - zn).
(7.201)
In other words, a polynomial of order n has n, not necessarily all distinct, roots in the complex plane. Significance of this result becomes clear if we remember how the complex algebra was introduced in the first place. When equations like z2+1=0
(7.202)
are studied, it is seen that no roots can be found among the set of real numbers. Hence the number system has to be extended to include the complex numbers. We now see that in general the set of polynomials with complex coefficients do not have any other roots that are not included in the complex plane, @, hence no further extension of the number system is necessary. 7.10.2
Exponential Function
Let us consider the series expansion of the exponential function with a pure imaginary argument as (7.203) We write the even and the odd powers separately: (7.204)
Recognizing the first and the second series as cosy and sin y> respectively, we obtain eZy= cos y
+ i sin y,
(7.206)
which is also known as Euler's formula. Multiplying this with the real number e.', we obtain the exponential function
ez
= e"(cosy
+isiny).
(7.207)
355
ELEMENTARY FUNCTIONS
Since the functions u = e" cosy and v = ex sin y have continuous first partial derivatives everywhere and satisfy the Cauchy-Riemann conditions, using Theorem 7.4, we conclude that the exponential function is a n entire function. Using Equation (7.190), namely
du dv df - - +i-
dx
dz
(7.208)
dx'
we obtain the derivative of the exponential function as the usual expression de" dz
-- ez.
(7.209)
Using the polar representation in the w-plane, u write
w = ez
= p
cos 4 , v
= p(cosq5+isin4).
= p sin
4.we
(7.210)
Comparing this with
e z = e"(cosy
+ isiny),
(7.211)
4 = y,
(7.212)
we obtain p = ex and
that is,
lez( = e" and arge" = y.
( 7.2 13)
Using the polar representation for two points in the w-plane, ez'
-
p1 (cos 41
- p2 (cos 4 2 -
+ i sin h ) , + i sin h),
(7.214) (7.215)
we can easily establish the following relations: ez1e"2
-
ezl+"z
,
(7.216)
( 7.2 17) (ez)n = e T L Z .
(7.218)
In terms of the exponential function [Eq. (7.206)]; the polar representation of 2,
z=r(cosO+isinO),
(7.219)
z = r eiQ ,
(7.220)
can be written as
356
COMPLEX NUMBERS AND FUNCTIONS
which is quite useful in applications. Another useful property of ez is that for an integer n we obtain 2x2 n
-
-(e
-
1-12
(7.221)
z 2n7ri
(7.222) (7.223)
hence we can write ez+2nai
-e
e = ez.
In other words, ez is a periodic function with the period 27r. Series expansion of ez is given as
ez
=
z
z'
1+ - + - + . .. l! 2!
(7.224) (7.225)
n=O
7.10.3
Trigonometric Functions
Trigonometric functions are defined as ,it
cosz
=
sinz
=
+ e-iz
2 ,iz
(7.226)
'
- e-iz
(7.227)
22
Using the series expansion of ez [Eq. (7.225)], we can justify these definitions as the usual series expansions: cosz
=
z2
1- 2!
z4 ++." 4!
,
(7.228) (7.229)
Since eiz and e P i z are entire functions, cos z and sin z are also entire functions. Using these series expansions, we obtain the derivatives:
d
- sinz = cosz,
dz d cosz = -sinz. dz
(7.230) (7.231)
The other trigonometric functions are defined as sin z cos z t a n z = -, cot z = cos z sin z ' 1 1 sec z = -, cscz = -. cos 2 sin z
(7.232) (7.233)
357
ELEMENTARY FUNCTIONS
The usual trigonometric identities are also valid in the complex domain: sin 2 z +cos 2 z = 1, sin(z1 f 2 2 ) = sin z1 cos z2 fcos z1 sin z2, cos(z1 f 2 2 ) = cos z1 cos 22 T sin z1 sin 2 2 ,
(7.234) (7.235) (7.236)
sin(-z) = - sin z , cos(-2) = cosz,
(7.237) (7.238)
z)
= cosz,
(7.239)
sin 22 = 2 sin z cos z ,
(7.240)
sin
(4
-
cos2z
7.10.4
= cos2
z
-
sin2 z .
(7.241)
Hyperbolic Functions
Hyperbolic cosine and sine functions are defined as (7.242) sinhz
(y - e-z
(7.243) 2 ‘ Since ez and e-’ are entire functions, coshz and sinhz are also entire functions. The derivatives d (7.244) - sinh z = cosh z , dz d - coshz = sinhz (7.245) =
dz
and some commonly used identities are given as cosh 2 z -sinh 2 z = 1, sinh(z1 f z 2 ) = sinh z1 cosh z2 f cosh z1 sinh z 2 , cosh(z1 2 2 ) = cosh z1 cosh 2 2 f sinh z1 sinh 2 2 , sinh(-z) = - sinh z , cash( -2) = cash Z ,
(7.246) (7.247) (7.248)
sinh 22 = 2 sinh z cosh z .
(7.251)
*
(7.249) (7.250)
Hyperbolic and trigonometric functions can be related through the formulas 1 (7.252) cos z = cos(x iy) = -(ezz-Y ePiz+Y) 2 1 1 (7.253) = -e Y ( c o s x + i s i n x ) + - e Y ( c o s z - i s i n x ) 2 2
+
=
(
eY
= cos x
+ e-Y
+
eY
)cosx-i(
cosh y
-
i sin x sinh y.
-
e-Y
)sin,
(7.254)
(7.255)
358
COMPLEX NUMBERS AND FUNCTIONS
arid similarly, sin z
= sin x cosh y
+ i cos zsinh y.
(7.256)
Froin these formulas we can deduce the relations sin(iy) = i sinh y; cos(iy) = coshy. 7.10.5
(7.257) (7.258)
Logarithmic Function
Using the polar representation, z
= reie, we
w
can define a logarithmic function,
= logz,
(7.259)
as
logz
= Inr
+ iQ,
r > 0.
(7.260)
Since is real and positive, an appropriate base for the l n r can be chosen. Since the points with the same T but with the arguments Q3=2n7r,n = 0 , 1 , . . . , correspond to the same point in the z-plane, log z is a multivalued function, that is, for a given point in the z-plane, there are infinitely many logarithms, which differ from each other by integral multiples of 27ri : w,, = log z = In jzI
+ i arg z
= l n r + i ( B f 2 n n ) , n=0,1, ... , 0 5 0 < 2 i 7 .
(7.261) (7.262)
The value, ‘ ~ ~ 1 corresponding 0, to n = 0 is called the principal value or the principal branch of logz. For n # 0, w, gives to the n t h branch value of log z . For example, for z = 5, z = -1, and 2 = 1 + i we obtain the following logarithnis:
w, = log5 = In5 + i a r g 5 = In5 + i ( O f 2nn), w,,= log(-1) = In 1 + z(i7 f 2n7r) = i(7r f 2n7r), w,,= l o g ( l + i) = In ~ ‘ 3 +i
(7.263) (7.264) (7.265) (7.266) (7.267)
For a given value of n,the single-valued function UJ, =
log z = In IzI
= Inr
+ i arg z
+ i (0 i2n7r), 0 5 0 < 27r,
(7.268) (7.269)
359
ELEMENTARY FUNCTIONS
with the
11
and the v functions given as
u = lnr, v = 0 f 2n7r,
(7.270) (7.271)
has continuous first-order partial derivatives, (7.272) which satisfy the Cauchy-Riemann conditions [Eqs. (7.182) and (7.183)]; hence Equation (7.269) defines an analytic function in its domain of definition. Using Equation (7.189), we can write the derivative of logz as the usual expression d
(7.273) (7.274) (7.275)
Using the definition in Equation (7.269), one can easily show the familiar properties of the log z function as log z1 z2
+ log
= log z1
21
log-
22,
(7.276)
= logal - logz2.
(7.277)
22
Regardless of which branch is used, we can write the inverse of w = logz as ew
(7.278)
= ,lnz -
e(ln~+iO)
-
,lnr
(7.279)
iB
e
(7.280) (7.281) (7.282)
= re20 = z.
Hence elogz
(7.283)
= 2;
that is, the exp and the log functions are inverses of each other. 7.10.6
Let
7n
Powers of Complex Numbers
be a fixed positive integer. Using Equation (7.269), we can write mlog z = m l n r
+ im(8 f 2n7r),
R
= 0,1,...
.
(7.284)
360
COMPLEX NUMBERS AND FUNCTIONS
Using the periodicity of ez [Eq. (7.223)] and Equation (7.262), we can also write log Z m = log[&(o*2"") ] , n = 0 , 1 , . . . , = In rm im(8 f 2n7r).
(7.285) (7.286)
+
Comparing Equations (7.284) and (7.286), we obtain rnlogz
= logzm.
(7.287)
Similarly, for a positive integer p we write (7.288) = Inrl/p
i + -(Of a h ) , k = 0,1,... ,( p
P
-
1).
(7.289)
We can also write log Z'/P = log T'/Pe('/P)('*2nT)
[
= in+
1,
i + -(e f2 k 4
P
n = 0,1,.. . ,
k = o , i , . . . , ( p - 1).
(7.290) (7.291)
Note that due to the periodicity of the exponential function e(i/P)(o*2nx),no new root emerges when n is an integer multiple of p . Hence in Equations (7.289) and (7.291), we have defined a new integer k = 0 , 1 , . . . , ( p - 1). Comparing Equations (7.289) and (7.291) we obtain the familiar expression (7.292) In general we can write (7.293)
or p/P
= ,(m/p) 1%
2,
(7.294)
In other words, the p distinct values of logzm/p give the number zrn/P. For example, for the principal value of z ~ /that ~ , is, for k = 0, we obtain *5/3
= J5/3)
1% z
(7.295)
= ,(5/3)(lnr+iO)
(7.296)
- r5/3ei(5/3)o -
(7.297)
361
ELEMENTARY FUNCTIONS
All three of the branches are given as
z5/3 - T5/3ei(5/3)(Q*2k.rr),
= 0,1,2.
(7.298)
We now extend our discussion of powers t o cases where the power is complex: w = zc or w = z-",
(7.299)
where c is any complex number. For example, for i-2 we write i-2 = exp(-i log i) = exp = exp
{ -i
(7.300)
[ln 1
(5
* 2nn)]}
(7.301)
, n = 0,1,.. . .
(7.302)
+ i (;
5 2nn)
Replacing m / p in Equation (7.294) with c, we write *c
= ,clog=
(7.303)
Using the principal value of log z , we can write the derivative
(7.304) (7.305) -
ce(c- 1)log =
(7.306)
The right-hand side is nothing but czC-l, hence we obtain the formula
(7.307) which also allows us to write zc
= c, log z > z f 0 .
(7.308)
Example 7.6. C o m p l e x exponents: Let us find ii for the principal branch: zi = ,i log i
l+i.rr/2]
-
,i[ln
-
,-.rrP.
As another example we find the principal branch of (1+ i)Z : (1 + i)i = ,ilog(l+i) - e i [ ~ 4+i.rr/4] n
(7.309) (7.310) (7.311)
(7.312) (7.313)
-
,i In d e - . r r / 4
-
2i/2,-7r/4
(7.315)
-
e(i/2)In 2e -.rr/4 .
(7.316)
(7.314)
362
COMPLEX NUMBERS AND FUNCTIONS
7.10.7
Inverse Trigonometric Functions
Using the definition eiw - e-iw
z
= sinw =
(7.317)
2i
along with
we solve for ezzOto obtain ezw = i z
+ (1
-
zy2,
(7.319)
which allows us to write w = -ilog[iz
+ (1
-
z 2 ) 1121.
(7.320)
Thus the inverse sine function is defined as sin-lz
=
+ (1
-ilog[iz
-
z 2 ) 1/21,
(7.321)
which is a multiple-valued function with infinitely many branches. Similarly, one can write the inverses cos-1 z
=
+ (2
-ilog[z
-
1)’/2],
(7.322)
i i+z tan-’ z = - log -. 2 i-z
(7.323)
PROBLEMS
1. Evaluate the following complex numbers:
(A+i ) + i(1 + id),(iv)
(i) (ii)
4 (1 i ) ( l + i)(2 - 2 ) ’ 2 3i (3 - 2 i ) ( l i ) ’ ~
(iii)
2.
(2,1)(1, -21,
(v)
(1 - i I 4 ,
(vi)
(2, -1)(1,3)(2,2)’
+
+ Evaluate the numbers z1 + 2 2 , z1 -
(1,1) 22,
and
z1z2
and show them graphi-
cally when (i) (ii)
ZI = 21 =
(1, I ) , 2 2 = (3, -I), (iii) z1 = (1,3), z2 = (4, -I), (xi, y i ) , zz = (21, -m), (iv) z1 = (1 - i12, z2 = 1 + 2i.
3. Prove the commutative law of multiplication:
zlzz
= 2221.
PROBLEMS
4. Prove the following associative laws:
5. Prove the distributive law
+
zl(22
23)
+ zlz3.
= zlz2
6. Find
(z* + 2i)*,
(i)
(ii)
(2iz)*,
(iii)
2 (1- i ) ( l +i)*’
(iv)
[(I - i ) ‘ ] * .
7. Use the polar form to write
+ 24,
(i)
(z*
(ii)
(1 - i)(l i)*
(iii)
(iz+ I)*,
(iv)
(1 - i ) * ( 2 i).
5
+
+
8. Prove
and (24)*
= (2*)4
9. Prove the following: 121221 =
lz1l 1x21 ,
’
363
364
COMPLEX NUMBERS AND FUNCTIONS
10. Prove and interpret the triangle inequalities:
11. Describe the region of the z-plane defined by (i) (ii) (iii)
1< Imz
< 2,
Iz - I1 2 2 Iz / z - 41 > 3.
+ 11,
12. Show that the equation
describes a circle. 13. Express l+i
[-]
in terms rectangular and polar coordinates. 14. Firid the roots of z4
+ i = 0.
15. If z 2 = i , find z .
16. Show that
17. Show that tanh( 1
+
TZ)
e2 - 1 e2 + 1’
=-
+ 2i) with respect
18. Fiiid the complex number that is symmetrical to (1 t,o the line a r g z = a0.
19. Find all the values of (i)
(ii) (iii) (iv) (v) (vi)
z
=
(3~)l/~,
z = (1 + i3)3/2, z = (-1) 1/3 , z = (1- z). 1/3 , z = (-8)1/3, z = (1p4.
PROBLEMS
365
and show them graphically.
20. Derive Equations (7.83)-(7.85)’ which are used in stereographic projections:
21. Show the following ratios [Eq. (7.86)] used in stereographic projections: y ---z 1
1
.
1-53
22
Verify that the points z,2,and N lie on a straight line. 22. Derive Equation (7.95):
used in stereographic projections to express the distance between two points on the Riemann sphere in terms of the coordinates on the z-plane.
23. Establish the relations
(e’)”
= en’.
24. Establish the sum
and then show
n8 sin [ ( n+ 1)8/2] + cos8 + cos 28 + . . . + cosn8 = cos (5) sin(Ql2) ’
(i)
1
(ii)
sin 8
+ sin 28 + . . . + sin n8 =
sin
(+)
+
sin [(n 1)8/2] sin(8/2) ’
366
COMPLEX NUMBERS AND FUNCTIONS
where 0
< 0 < 27r
25. Show that the following functions are entire: (i) (ii)
f(z) = 22
-
z),
f(z) = -sinycoshz+icosysinhz,
(iii) f ( z ) (iv)
+ y + i(2y
+ isinrc),
= epY(cosI(:
f ( z ) = ezz2.
26. Find the singular points of
(ii)
3.2 z (z 2
+1 + 2) ’
and explain why these function are analytic everywhere except at these points. What are the limits of these functions a t the singular points?
27. Show that the functions
f ( z ) = 22y
+ iy
and
f ( z ) = e2Y(cos17:
+ i sin 2y)
are analytic nowhere.
28. Show that for an analytic function, f ( z ) harmonic, that is,
=u
+ iv,the imaginary part is
29. Show that f ( 2 , 1J)
= y2 - x2
+ 22
and
f(z, y)
= cosh II: cos y
are harmonic functions and find their conjugate harmonic functions.
PROBLEMS
367
30. In rectangular coordinates show that the derivative of an analytic function can be written as
.dv
df - d u ---22-
ax ax
dz
or as
31. Show that eli3.rri
-
e0
=
-e, 1.
32. Find all the values such that ez
=
ez
=1
e(2z+l) =
-2,
+ ifi,
1.
33. Explain why the function
is entire. 34. Justify the definitions cosz sinz
,iZ
+
eiz
-
=
e-iz
2
=
’ e-iz
2i
and find the inverse functions c0s-l z and sin-’ z .
35. Prove the identities sin(z1 + 2 2 ) = sin z1 cos z2 + cos z1 sin z2, cos(z1 + z2) = cos z1 cos z2 - sin z 1 sin z2.
36. Find all the roots of (i) (ii)
cosz = 2 , sinz = cosh2.
37. Find the zeros of sinhz and coshz.
368
COMPLEX NUMBERS AND FUNCTIONS
38. Evaluate (i) (ii) (iii)
(1 i i ) i , (-2)'IT, (1+ i&i)'+i.
39. What are the principal values of (1- i ) " , i2',( - i ) 2 f i ? 40. In polar coordinates, show that the derivative of an analytic function can be written as
CHAPTER 8
COMPLEX ANALYSIS
Line integrals, power series, and residues constitute a n important part of complex analysis. Theorems of complex integration are usually concise but powerful. Many of the properties of analytic functions are quite difficult to prove without the use of these theorems. Complex contour integration also allows us to evaluate various difficult proper or improper integrals encountered in physical theories. Just as in real analysis, in complex integration we distinguish between definite and indefinite integrals. Since differentiation and integration are inverse operations of each other, indefinite integrals can be found by inverting the known differentiation formulas of analytic functions. Definite integrals evaluated over continuous, or at least piecewise continuous, paths are not just restricted to analytic functions and thus can be defined exactly by the same limiting procedure used t o define real integrals. Most complex definite integrals can be written in terms of two real integrals. Hence, in their discussion we heavily rely on the background established in Chapters 1 and 2 on real integrals. One of the most important places, where the theorems of complex integration is put t o use is in power series representation of analytic functions. In this regard, Laurent series play an important part in applications, which also allows us t o classify singular points. Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
S. Selsuk
Bayin
369
370
8.1
COMPLEX ANALYSIS
CONTOUR INTEGRALS
Each point in the complex plane is represented by two parameters, hence in contrast to their real counterparts, ~ ~ f(x) 1 * dx,complex definite integrals are defined with respect to a path or contour, C , connecting the upper and the lower bounds of the integral as
+
If we write a coniplex function as f ( z ) = ~ ( zy), i u ( z , y ) , the above integral can be expressed as the sum of two real integrals:
Furthermore, if the path C is parameterized in terms of a real parameter t :
where the end points; tl and
t2,
are found from
t,he complex integral in Equation (8.2) can also be written as
In the above equations we have written
and dz =
[ ~ ' (+t )iy'(t)] d t .
371
CONTOUR INTEGRALS
Integrals on the right-hand sides of Equations (8.3) and (8.7) are real; hence from the properties of real integrals, we can deduce the following: (8.10)
The two inequalities
(8.13)
and (8.14) where I f ( z ) l 5 M on C and L is the arclength, are very useful in calculations. When z is a point on C, we can write an infinitesimal arclength as
( d z (= (z’(t)+ iy’(t)l d t =
(8.15)
4-.
(8.16)
Length of the contour C is now given as
L
ldzl
(8.17)
= L.
If we parameterize a point on the contour C as z
=
z ( t ) , we can write
(8.18) In another parametric representation of C , where the new parameter r is related to t by t = t ( 7 ) ,we can write the integral [Eq. (8.18)] as (8.19)
372
COMPLEX ANALYSIS
which is nothing but
(8.20) Hence an important property of the contour integrals is that their value is independent of the parametric representation used.
8.2
TYPES OF CONTOURS
We now introduce the types of contours or paths that are most frequently encountered in the study of complex integrals. A continuous path is defined as the curve
where z ( t ) and y(t) are continuous functions of the real parameter t. If the curve does not intersect itself, that is, when no two distinct values of t in [tl,t2]correspond to the same (z,y), we call it a Jordan arc. If
but no other two distinct values of t correspond t o the same point (x,y), we have a simple closed curve, which is also called a Jordan curve. A piecewise continuous curve like y = t 2 ,2 y = t3,2
= t, =t,
t E [1,2], t E (2,3],
(8.21)
is a Jordan arc. A circle with the unit radius
x2 + y2 = 1,
(8.22)
which can be expressed in parametric form as
x
t E [0,27r],
= cost, y = sint,
(8.23)
is a simple closed curve. If the derivatives d ( t ) and y ' ( t ) are continuous and do not vanish simultaneously for any value of t , we have a smooth curve. For a smooth curve the length exists and is given as
L
=
l:'
, / ~ / ( t+) ~ ' ( td t). ~
(8.24)
In general a contour C is a continuous chain of finite number of smooth curves, C1,C2,.. . , Cn. Hence,
C = C1+ C,
+ ' . + Cn. '
373
TYPES OF CONTOURS
Figure 8.1
Contours for Example 8.1.
A contour integral over C can now be written as the sum of its parts as
c
c 2
Contour integrals over closed paths are also written as jCf ( z ) dz,where by definition the counterclockwise direction is taken as the positive direction.
Example 8.1. Contour integrals: Using the contours, C1, C2, and C3, shown in Figure 8.1, let us evaluate the following integrals:
ICl
s
=
c 1[y=s21
We first write I
I =
J c
1
z 2 dz, Ic2 = =
z 2 dz, Ic3 =
c 2 [Y=Ol
J
z 2 dz. (8.26)
c 3 [x= 11
sc z2 dz as the sum of two real integrals [Eq. (8.3)]:
[(x2- y2) dx - 2x7~dy]+ i
J
c
[2xydx + ( x 2 - y2) dy]. (8.27)
For the path C1 we have y = x2 and dy = 22 dx; hence the above integral is evaluated as I,,
=
s
+
[(x2- x4) dx - 2x32x dx] i
P l
=I0
J
[2x3dx + (x2- x4)2x dx]
C 1
C1
(8.28)
(x2- 5x4) dx + i
Pl
(4x3 - 2x5)dx
(8.29)
I0
2
= --
3
+ i-.2
3
(8.30)
374
COMPLEX ANALYSIS
Figure 8.2
For the path Cz we set y obtain
=0
Semicircular path.
and dy
Ic2 =
Jd
=0
1
[xzdx] c:!
in Equation (8.27). Hence we
+i[O]
(8.31)
(8.32) Finally, on the path C, we write x = 1 and d x = 0 t o obtain
Ic,
=
-6’,
dy
+i
c3
1
i (1 - y2) dy
(8.33)
c3
2 = -1 +i-. 3
(8.34)
Example 8 . 2 . Parametric representation of the contour: We now consider the semicircular path C1 in Figure 8.2 for the integral 1 ~ ~ :
I,,
= Jz2 dz
(8.35)
c1 =
/[ud - vy/’] d t c 1
+i
(8.36) C1
with
u = x 2 - y 2, v = 2xy.
(8.37) (8.38)
TYPES OF CONTOURS
375
When we use the parametric form of the path,
x ( t ) = cost, x'(t) = - s i n t , y(t) = sint, y'(t) = c o s t , t Equation (8.36) becomes
[ -3 cos2 t sin t + sin3 t] d t + i
(8.39) E
[o,~],
I"
[-3 sin2 t cos t
(8.40)
+ c0s3 t] d t (8.41)
For the path along the real axis, parameter:
C2
(Fig. 8.2), we can use x as a
x = t , y = o , u = t2 , v = o , hence Equation (8.36) yields
Ll +ill
(8.42)
1
I,,
=
t2 d t
0 dt
(8.43) (8.44) (8.45)
Example 8.3. Simple closed curves: For the combined path in Example 8.2, that is, C = C1 C2 (Fig. 8.2), which is a simple closed curve, the integral IC = ,$ z2 dz becomes
+
(8.46)
(8.47) = 0.
(8.48)
Similarly, for the closed path C in Figure 8.3 we can use the results obtained in Example 8.1 to write
+ I,, + I,, 2 2 1 2 = -(-+ i - ) + - + (-1 + i-) 3 3 3 3
(8.49)
= 0.
(8.51)
1, = ICl
(8.50)
Note that in the complex plane, geometric interpretation of the integral as the area is no longer true. As we shall see shortly, the fact that we have obtained zero for the integral $ z 2 dz for two very different closed paths is by all means not a coincidence. In the next section we elaborate this point.
376
COMPLEX ANALYSIS
Figure 8.3
8.3
Closed simple path
T H E CAUCHY-GOURSAT T H E O R E M
We have seen that for a closed contour, C, the complex contour integral of f ( z ) can be written in terms of two real integrals [Eq. (8.3)] as
f ’ f ’ c
f(2)
dz =
C
[U
dz
-
v d ~+] Z
f’ c
[U d~
+ u dy].
(8.52)
Let us now look at this integral from the viewpoint of Green’s theorem introduced in Chapter 2, which states that for two continuous functions, P(z,y) and Q(z,y ) , defined in a simply connected domain, D , with continuous firstorder partial derivatives within and on a simple closed contour, C , we can write the integral (8.53) where the positive sense of the contour integral is taken as the counterclockwise direction and R is the region enclosed by the closed contour C. If we apply Green’s theorem to the real integrals defining the real and the imaginary parts of the integral in Equation (8.52), we obtain (8.54)
(8.55)
THE CAUCHY-GOURSAT THEOREM
Figure 8.4 We stretch CZ into
C2 =
L1
377
+ L2.
From the properties of analytic functions [Theorem 7.4.1, we have seen that a given analytic function,
f(.)
=
4 5 , Y) + iv(x,Y),
(8.56)
defined in some domain D , has continuous first-order partial derivatives, u T ,u y ,u s , vy, and satisfies the Cauchy-Riemann conditions: (8.57) (8.58) Hence for an analytic function, the right-hand sides of Equations (8.54) and (8.55) are zero. We now state this result as the Cauchy-Goursat theorem, a formal proof of which can be found in Brown and Churchill. Theorem 8.1. Cauchy-Goursat theorem: If a function f ( z ) is analytic within and on a simple closed contour C in a simply connected domain, then
ff
(2)
(8.59)
dz = 0.
C
This is a remarkably simple but powerful theorem. For example, to evaluate the integral
I =
lr
(8.60)
f ( z ) dz
c1
+
over some complicated path, C1, we first form the closed path: C1 C2 (Fig. 8.4, left). If f ( z ) is analytic on and within this closed path, C1 C2, we can
+
378
COMPLEX ANALYSIS
Figure 8.5
Definite integrals.
use the Cauchy-Goursat theorem to write (8.61) which allows us to evaluate the desired integral as
The general idea is to deform Cz into a form such that the integral I can be evaluated easily. The Cauchy-Goursat theorem says that we can always do this, granted that f ( z ) is analytic on and within the closed path: C1 C2. On the right-hand side in Figure 8.4, C, is composed of two straight line segments, L1 and L2. In Example 8.3, for two different closed paths we have explicitly shown that & z 2 d z is zero. Since z2 is an entire function, the Cauchy-Goursat theorem says that for any simple closed path the result is zero. Similarly, all polynomials, P,(z), of order n are entire functions; hence we can write
+
P,(.)
dz = 0,
(8.63)
where C is any simple closed contour.
Example 8.4. Cauchy- Goursat theorem: Let us evaluate the integral (8.64)
INDEFINITE INTEGRALS
379
over any given path, C1, as shown on the left in Figure 8.5. Since the integrand, f ( z ) = 3z2 + 1, is an entire function, we can form the closed path on the right and use the Cauchy-Goursat theorem to write
(3z2
+ 1) dz = 0,
(8.65)
which leads t o
(3z2
+ 1) dz = -
(3z2 + 1) dz.
(8.66)
From f ( z ) = 3z2 =
+1
[3(z2- y2)
+ 11 + i ( 6 ~ y ) ,
(8.67)
we obtain the functions u = [3(z2 - y2) + 11 and ‘u = 6zy, which are needed in the general formula [Eq. (8.3)]. For L2 we use the parameterization z = z, y = 1; hence we substitute u ( z , l ) = 3z2 - 2 and u ( z , 1) = 6z into Equation (8.3) to obtain
=5+9i.
(8.68)
Similarly, for L1 we use the parameterization z = 2, y = y; hence we substitute u(2, y) = -3y2 13, ~ ( 2y), = 12y into Equation (8.3) t o get
+
-
s,,
(-3y2 = -18
+ 13) dy
+ 62.
(8.69) (8.70)
Finally, using Equation (8.68) and (8.70) in Equation (8.66) we obtain
(3z2 JG
8.4
+ 1) dz = (5 + 92) + (-18 + 6i) = -13 + 15i.
(8.71) (8.72)
INDEFINITE INTEGRALS
Let zo and z be two points in a simply connected domain D , where f ( z ) is analytic (Fig. 8.6). If C1 and C2 are two paths connecting zo and z , then by using the Cauchy-Goursat theorem we can write
J’
f(d)dz’
c2
-
J’
c1
f(z’) dz’ = 0.
(8.73)
380
COMPLEX ANALYSIS
Figure 8.6
Indefinite integrals.
In other words, the integral
F ( z )=
l:s(.’)
(8.74)
dz’
c has the same value for all continuous paths (Jordan arcs) connecting the points zo and z . In general we can write (8.75) That is, the integral of an analytic function is an analytic function of its upper limit, granted that the path of integration is included in a simply connected domain D , where f ( z ) is analytic. Example 8.5. Indefinite integrals: An indefinite integral of f ( z ) = 32’ 1 exists and is given as
lc,
(32’
+ 1) dz = z3 +
+
(8.76)
Z.
+
Since ( z 3 + z ) is an entire function with the derivative (3z2 l ) ,for the integral in Equation (8.66) we can write (8.77) where C1 is any continuous path from (1,l) to ( 2 , 2 ) . Substituting the numbers in the above equation, we naturally obtain the same result in Equation (8.72):
Ll
f ( z ) dz = z(z’
+ 1)\!:1;;
= -13
+ 152.
(8.78)
SIMPLY AND MULTIPLY CONNECTED DOMAINS
Figure 8.7
8.5
381
Multiply connected domain between two concentric circles.
SIMPLY A N D MULTIPLY CONNECTED D O M A I N S
Simply and multiply connected domains are defined the same way as in real analysis. A simply connected domain is an open connected region, where every closed path in this region can be shrunk continuously to a point. An annular region between the two circles (Fig. 8.7) with radiuses R1 and R2, R2 > R1, is not simply connected, since the closed path Co cannot be shrunk to a point. A region that is not simply connected is called multiply connected. The Cauchy-Goursat theorem can be used in multiply connected domains by confining ourselves t o a region that is simply connected. In the multiply connected domain shown in Figure 8.7 we have (8.79) however, for C1 we can write
(8.80) where f ( z ) is analytic inside the region between the two circles.
8.6
T H E CAUCHY INTEGRAL FORMULA
The Cauchy-Goursat theorem [Eq. (8.59)] works in simply connected domains, D , where the integrand is analytic within and on the closed contour
382
COMPLEX ANALYSIS
Figure 8.8
Singularity inside the contour.
C included in D. The next theorem is called the Cauchy integral formula. It is about cases where the integrand is of the form (8.81)
where zo is a point inside C and f ( z ) is an analytic function within and on C. In other words, the integrand in fc F ( z ) dz has an isolated singular point in C (Fig. 8.8). Theorem 8.2. Cauchy integral formula: Let f ( z ) be analytic at every point within and on a closed contour C in a simply connected domain D. If zo is a point inside the region defined by C , then (8.82) where C[O]means the contour C is traced in the counterclockwise direction. This is another remarkable result from the theory of analytic functions with far-reaching applications in pure and applied mathematics. It basically says that the value of an analytic function, f ( z o ) , a t a point, 20, inside its domain D of analyticity is determined entirely by the values it takes on a boundary C, which encloses zo and which is included in D. The shape of the boundary is not important. Once we decide on a boundary, we have no control over the values that f ( z ) takes outside the boundary. However, if we change the values that a function takes on a boundary, it will affect the values it takes on the inside. Conversely, if we alter the values of f ( z ) inside the boundary, a corresponding change has to be implemented on the boundary to preserve the analytic nature of the function. Proof: To prove this theorem, we modify the path C as shown in Figure 8.9, where we consider the contour co in the limit as its radius goes to zero.
THE CAUCHY INTEGRAL FORMULA
383
'T'
Figure 8.9
Modified path for the Cauchy integral formula.
Now the integrand, f ( z ) / ( z - z ~ )is, analytic within and on the combined path
C [ O ]= L1[1]+L2[T]+C[(3]+co[O].By the Cauchy-Goursat theorem we can write
(8.83) The two integrals along the straight-line segments cancel each other, thus leaving (8.84) Evaluating both integrals counterterclockwise, we write
(8.85) We modify the integral on the right-hand side as
(8.86) = 11
+ 12.
(8.87)
For a point on co we can write
z
-
zo = roei0, dz
= iroei0d8;
(8.88)
384
COMPLEX ANALYSIS
thus the first integral, II,on the right-hand side of Equation (8.86) becomes (8.89) (8.90) (8.91) For the second integral, 1 2 , when considered in the limit as when z + 2 0 , we can write
T O -+
0, that is,
(8.92) (8.93) The limit (8.94) is nothing but the definition of the derivative of f ( z ) at zo, that is, (8.95) Since f ( z ) is analytic within and on the contour C O , this derivative exists with a finite modulus ldf(zo)/dzI ; hence we can take it outside the integral to write (8.96) Since 1 is an entire function, using the Cauchy-Goursat write
j
co [::1 [ T O -01
dz = 0 ,
theorem, we can (8.97)
thus obtaining 1 2 = 0. Substituting Equations (8.91) and (8.97) into Equation (8.87) completes the proof of the Cauchy integral formula. 8.7
DERIVATIVES OF ANALYTIC F U N C T I O N S
In the Cauchy-integral formula location of the point, 20, inside the closed contour is entirely arbitrary; hence we can treat it as a parameter and differentiate with respect to it to write (8.98)
COMPLEX POWER SERIES
385
A formal proof of this result can be found in Brown and Churchill. Successive diffcrentiation of this formula leads to
f'"'(.o)
=
.I
27rk n!
C [\ ]
f(.) d z , n = 1 , 2 , .. . . ( z - zo)n+l
(8.99)
Asstlining that this formula is true for any value of n, say n = k , one can show that it holds for n = k + 1. Based on this formula, we can now present an important result about analytic functions: Theorem 8.3. If a function is analytic at a given point 20, then its derivativcs of all orders, f ' ( z o ) , f " ( z o ) , . . . , exist at that point. In Chapter 7 [Eq. (7.190)] we have shown that the derivative of an analytic function can be written as
au av f'(z) = - + 2ax dy
(8.100)
or as
av dY
f'(z) = --
du
-.
aY
(8.101)
Also, Theorem (7.4) says that for a given analytic function, the partial derivatives I L , ~ , u,, ~ ~ and , uy exist and they are continuous functions of x and y. Using Theorem 8.3, we can now conclude that in fact the partial derivatives of all orders of u and u exist and are continuous functions of x and y at each point where f ( z ) is analytic.
8.8
C O M P L E X P O W E R SERIES
Applications of complex analysis often require manipulations with explicit analytic expressions. To this effect, power series representations of analytic functions are very useful.
8.8.1 Taylor Series with the Remainder Let f ( z ) be analytic inside the boundary B and let C be a closed contour inside B (Fig. 8.10). Using the Cauchy integral formula [Eq. (8.82)], we can write (8.102)
386
COMPLEX ANALYSIS
..A z
Figure 8.10
Taylor series: ( z - z01 = T , Iz'
-
z01 = T'
where z' is a point on C and z is any point within C. We rewrite the integrand as
where zo is any point within C that satisfies the inequality
/ z - zo/
2,
(8.176)
1+2"
n=O
which is valid outside the circle Iz( = 2. Note that the series representations given above are all unique in their interval of convergence. Example 8.11. Power series representations: Let us find the power series representation of 1 = z2 cosh z
(8.177) '
Substituting the series expansion of coshz [Eq. (8.124)], we write 1
f ( z )=
22[1
+ 9 / 2 ! + z4/4! + . . . ]
(8.178) (8.179)
Hence f ( z ) is analytic everywhere except at z = 0. Since the series in the denominator of Equation (8.179) does not vanish anywhere except at the origin, we can perform a formal division of 1 with the denominator to obtain 1 1 5 1 - - - - + -z2 z2coshz z2 2 24 hence z = 0 is a pole of order 2.
-
...
?
zf0;
(8.180)
397
RESIDUE THEOREM
Figure 8.13
8.11
Isolated poles.
RESIDUE T H E O R E M
Cauchy integral formula [Eq. (8.82)] deals with cases where the integrand has a simple pole within the closed contour of integration. Armed with the Laurent series representation of functions, we can now tackle contour integrals,
where the integrand has a finite number of isolated singular points of varying orders within the closed contour C (Fig. 8.13). We modify the contour as shown in Figure 8.14, where f ( z ) is analytic in and on the composite path C': n
IL
j=1
j=1
+ CIj[/] + l j [ J ] + CCZ[O].
C"3] = C[O]
(8.182)
We can now use the Cauchy-Goursat theorem to write (8.183)
Integrals over the straight line segments cancel each other, thus leaving (8.184) where all integrals are to be evaluated counterclockwise. Using Laurent series expansions [Eq. (8.149)] about the singular points, we write
398
COMPLEX ANALYSIS
(8.186) where the expansion coefficients for the j t h pole, a k j and b k j , are given in Equations (8.150) and (8.151). Since ( z - z . ? ) is~ analytic within and on the contours cj for all j and k, the first set of integrals vanish:
f I' CJ
1
k (z-2,)
d z = 0 , j = 1 , 2 , . . . , n, k = O , l , . .
(8.187)
For the second set of integrals, using the parameterization 2 -
zj
(8.188)
=TJtP,
we find
r3ie20 d0
,J
.
=
1 , 2,... , n , k = 1,
,
(8.189) (8.190) (8.191)
In other words, n
(8.192) The coefficient of the l / ( z - z j ) term, that is, b l j , is called the residue of the pole z j . Hence, the integral $c,r:>l f ( z ) d z is equal to 27i-i times the sum of the residues of the n isolated poles within the contour C. This important result is known as the residue theorem: Theorem 8.6. If we let f ( z ) be an analytic function within and on the closed contour C, except for a finite number of isolated singular points in C, then we obtain (8.193)
RESIDUE THEOREM
399
Modified path for the residue theorem.
Figure 8.14
where b1j is the residue of the j t h pole, that is, the coefficient of -in the z - zj Laurent series expansion of f ( z ) about z j . Integral definition of b l j is given as (8.194) Integrals in Equations (8.193) and (8.194) are taken in the counterclockwise direction.
Example 8.12. Residue theorem: Let us evaluate the integral 32 - 1 dz
(8.195)
where C is the circle of radius 2 . Since both poles, 0 and 1, are within the contour, we need to find their residues at these points. For the first pole, z = 0, we use the expansion
- 32 - = (-3 1- i ) ( & ) z ( z - 1) =
(
3-;
(8.196)
'>
(-)(l+z+22+.-)
(8.197)
1 -2+ - - 2 z - 2 z 2
(8.198) + . ' . , 0 < 1x1 < 1, z which yields the residue at z = 0 from the coefficient of 1/z as b I ( 0 ) = 1. For the pole at z = 1 we need to expand 1/z in powers of ( z - l),which is given [Eq. (8.129)] as =
1 -=l-(z-l)+(z-l)2--. z
,
121 0 2 , by a noninteger, then the two linearly independent solutions are given as
454
ORDINARY DIFFERENTIAL EQUATIONS
2. If (cyl - ~112) = N , where N is a positive integer, then the two linearly independent solutions are given as co
Yl(Z) =
12 -
QIQ1
C U k ( Z -Z k=O
O K
a0
# 0,
(9.456)
and 00
y2(x) = 1%
-
C
20/Q2 b k ( z - ~ k=O
0
+ c) ~~ ~In( IIZ c ZO) I , -
bo # 0. (9.457)
The second solution contains a logarithmic singularity, where C is a constant that niay or may not be zero. Sometimes a2 will contain both solutions; hence it is advisable to start with the smaller root with the hopes that it might provide the general solution. 3. If the indicia1 equation has a double root, a1 = ~ 2 then , the Frobenius method yields only one series solution. In this case the two linearly independent solutions can be taken as
(9.458) where the second solution diverges logarithmically as z 20. In the presence of a double root, the Frobenius method is usually modified by taking the two linearly independent solutions as ---f
(9.459)
and y2(.c) =
1 2-
cE0bk(x
-~
0
+ Y ~ ( zIn) Iz )
~
-
20
I.
In all these cases the general solution is written as Y ( 2 ) = Alyl(z)
where Al and
A2
+ AZYZ(2)I
(9.460)
are integration constants.
Example 9.21. A case with distinct roots: Consider the differential equation z2y”
+ ();
y’
+ 2 2 y = 0.
(9.461)
Using the Frobenius method, we try a series solution about the regular singular point, 20 = 0, as
c 03
Y(Z) =
/ZIT
n=O
anzn1 a0
# 0.
(9.462)
SERIES SOLUTIONS: FROBENIUS METHOD
455
Assuming that x > 0, we write
c 03
y(x)
unxn+r,
=
(9.463)
n=O
which gives the derivatives, y’ and y”, as 03
y/ =
C(n+ r)u,xn+r-l,
(9.464)
n=O 03
y” =
- y ( n+ r ) ( n+
T -
(9.465)
l)unxn+r-2.
n=O
Substituting y, y’, and y” into Equation (9.461), we get w
03
n=O
n=O
n=O
(9.466) We express all the series in terms of z ~ + ~ : w
w
+
where we have made the variable change n 2 4n’ in the last series and dropped primes at the end. To start all the series from n = 2, we write the first two terms of the first two series explicitly:
(9.468) This equation can only be satisfied for all z, if and only if all the coefficients vanish, that is,
[.
(r
[(r
-
+ 1) ( r +
31
= 0,
(9.469)
a1 = 0,
(9.470)
uo
31
456
ORDINARY DIFFERENTIAL EQUATIONS
The coefficient of the first term [Eq. (9.469)] is the indicia1 equation and with the assumption a0 # 0 gives the values of r as r1 =
1 2
- and 7-2 = 0.
(9.472)
The second equation [Eq. (9.470)] gives a1 = 0 for both values of r and finally, the third equation [Eq. (9.471)] gives the recursion relation an
=
We start with
-
2an-2 [(n r ) (n r
+
+
-
i)], n = 2 , 3 , . . . .
(9.473)
= 1 / 2 , which gives the recursion relation
a,
=
-
2an-2 n n(n +) ’
+
=
2,3,... ,
(9.474)
and hence the coefficients
(9.475)
The first solution is now obtained as
Similarly, for the other root,
a,
=-
7-2
= 0, we obtain the recursion relation
2an-2 n(n -
i)’ n = 2 , 3 , . . . ,
(9.477)
457
SERIES SOLUTIONS: FROBENIUS METHOD
and the coefficients a1 =
0,
2a0 a2 = -3 ’ a3 = a4
a5
=
0,
-,2a0
(9.478)
21 = 0,
4a0 a6 = --
693 ’
which gives the second solution as y2 = a.
2x2 2x4 4x6 1- - + - - - + . . . 3 21 693
[
(9.479)
We can now write the general solution as the linear combination zX2 2x4 4x6 1 - - + - - -+ . . . 5 45 1755 2x2 2x4 4x6 +c2 I--+---+... 3 21 693
y = QXI
[ [
1
1 (9.480)
Example 9.22. General expression of the nth term: In the previous example we have found the general solution in terms of infinite series. We now carry the solution one step further. That is, we write the general expression for the n t h term. The first solution was given in terms of the even powers of x as y1=x2
2x2 zX4 I--+---+... 5 45
[
1
4x6 1755
03
= X1l2
(9.481) (9.482)
a2kx2k.
k=O
Since only the even terms are present, we use the recursion relation [Eq. (9.474)] to write the coefficient of the k term in the above series as a2k = -
We let k
+k -
2a2(k - 1) k = 1,2,. . . 2k(2k $) ’
(9.483)
+
1: U2k-2
=
-
1 ( k - 1)(2k - 2
+ i)a2k -4
(9.484)
458
ORDINARY DIFFERENTIAL EQUATIONS
and use the result back in Equation (9.483) to write a2k =
-
1 k(2k +
3) ( k
1 1)(2k - 2 1 9(2k-2
k ( k - 1)(2k +
We iterate once more. First we let k =
a2k-2
+ +) a 2 k - 4
-
1 ( k - l)(k - 2)(2k - 2
(9.485)
+ i)U 2 ( k P 2 ) ’
+k
- 1 in the above equation:
+3(2k
-
4
+ 3)a 2 ( k - 3 )
(9.486)
and then substitute the result back into Equation (9.483) to write a2k
1 a2(k-3) k ( 2 k $ ) ( k - l ) ( k - 2 ) ( 2 k - 2 ;)(ark - 4 $) 1 - a 2 (k- 3 ) 2.2.2.k(k - 1)(k - 2 ) ( k i ) ( k - 1 z)( k - 2 + $ ) 1 -a2 ( k - 3 ) . 2 3 q k - i)(k - 2)(k i ) ( k - 1 + ) ( k- 2 (9.487)
=
-
+
+
+
+
+
+
After k iterations we hit ao. Setting U2k
= 2k
+
a0 =
+ 2)
1, we obtain
(-Ilk [ k ( k - l ) ( k - 2 ) . . .2.1] ( k + i ) ( k - 1
+ T)(k - 2 + + ) . . . ( 1 + a ) (9.488)
We now use the gamma function:
r ( x + 1) = zr(x), z > 0 ,
r(i)= 1,
(9.489)
which allows us t o extend the definition of factorial to continuous and fractional integers as
qn+ 1) = qn) = n(n -
i ) q n- 1)
= n(n - i)(n- 2 ) r ( n - 2)
This can also be written as
n(n - l ) ( n- 2) ’ . . ( n- k ) = r ( n + l ) n - k > O . r(n- k ) ’
(9.491)
SERIES SOLUTIONS: FROBENIUS METHOD
459
Using the above formula, we can write
Substituting Equation (9.492) and k ( k Equation (9.488), we write a 2 k as
-
l)(k
-
2).-.2.1
=
k ! into
(9.493) which allows us to express the first solution [Eq. (9.481)] in the following compact form: 00
y1(z) = x1'2
k=O
(-1)kr(5/4) 22k 2"!r(k 5/4)
(9.494)
+
Following similar steps we can write y 2 ( x ) as (9.495)
Example 9.23. W h e n the roots difler by an integer: Considerthedifferential equation y(z) = 0,
dx Since
20 =
3:
2 0.
(9.496)
0 is a regular singular point, we try a series solution
c 00
y=
unxn+T,a0 # 0 ,
(9.497)
n=O
where the derivatives are given as 00
yl =
C(n+
(9.498)
T)unxn+r-l,
n=O M
C(n+ .)(n + ~~
y" =
(9.499)
T - l)CLnxn+r-2.
n=O
Substituting these into the differential equation [Eq. (9.496)] and rearranging terms as in the previous example, we obtain
[,.(,
+ 2) + -
UOXT+1
431
+
[
(r
+ 1 ) ( T + 3) + -4
+ n=2 C {[( n+ r ) ( n+ + 2) + -431 an + un-2 T
31
1
ulxr+2
x n f r + l = 0.
(9.500)
460
ORDINARY DIFFERENTIAL EQUATIONS
Orice again we set the coefficients of all the powers of
[
T(T
[
(T
[
+
( n .)(n
5
to zero:
+ 2) + -431 a0 = 0, a0 # 0,
+ 1 ) ( T + 3 ) + -431 a1 = 0 ,
+ + 2) + -31 a, + an-2 4
(9.502)
= 0 , 12
T
(9.501)
2 2.
(9.503)
The first equation [Eq. (9.501)] is the indicia1 equation and with the assumption a0 # 0 gives the values of T as (9.504) Let. us start with the first root, (9.502)] gives
TI
= -1/2.
The second equation [Eq.
(9.505)
[3
a1 = 0,
(9.506)
0.
(9.507)
a1 =
The remaining coefficients are obtained from the recursion relation: a7,= -
-
1
+ +3
an-2,
[ ( n- i)(n $) k - 2
(2n - l)(2n
+ 3) + 3'
n
n
2 2,
2 2.
(9.508)
All the odd terms are zero and the nonzero terms are obtained as a 2 = - - a0 a 4 = - a0 6' 120'"' '
a2n =
Hence the solution corresponding t o
T
a0 (-1y (2n+ . l)!?...
=
(9.509)
- l / 2 is obtained as (9.510)
We can write this as (9.511) = a 0 ~ - 3 / sin 2 x.
(9.512)
SERIES SOLUTIONS: FROBENIUS METHOD
For the other root, comes
7-2
461
= -3/2, the second equation [Eq. (9.502)] be-
[-; (i)+ i]
a1 = 0 ,
(9.513)
0 a1 = 0,
(9.514)
thus a1 # 0. The remaining coefficients are given by the recursion relation
a,
=
-
4
~
2
(2n - 3)(2n + 1) + 3'
n 2 2,
(9.515)
as
Now the solution for
7-2
= -3/2 becomes
+alx-3/2
( -xi 3
+ -x5 120
2 -
-
...
)
(9.517)
We recognize that the first series in Equation (9.517) is nothing but cos x and that the second series is sin x;hence we write this solution as
y = aOz-3/2 cos x
+ u ~ x - sin ~ /x.~
(9.518)
However, this solution may also contain a logarithmic singularity of the form y1(x) In 121:
y
= agz-3/2
sinz
+ u ~ x - sinx ~ / + ~ Cyl(x) In 1x1.
(9.519)
Substituting this back into Equation (9.496), we see that c(2x1I2cos x
-
x - ' / ~sin x) = 0.
(9.520)
the quantity inside the brackets is in general different from zero, hence we set C to zero. Since Equation (9.518) also contains the solution obtained for 7-1 = we write the general solution as
-3,
y=
COX-^'^ cos x + c ~ x - sin ~ /2, ~
(9.521)
where co and c1 are arbitrary constants. Notice that in this case the difference between the roots, (9.522) is an integer. Since it also contains the other solution, starting with the smaller root would have saved us some time. This is not true in general. However, when the roots differ by an integer, it is always advisable to start with the smaller root hoping that it yields both solutions.
462
ORDINARY DIFFERENTIAL EQUATIONS
9.8.1 Frobenius Method and First-Order Equations It is possible to use Frobenius method for a certain class of first-order differential equations that could be written as
+ p(z)y = 0.
y’
(9.523)
A singular point of the differential equation, zo, is now regular, if it satisfies
(x - zo)p(lcO) 4finite.
(9.524)
Let us demonstrate how the method works with the following differential equation: zy’
Obviously, ICO
=0
+ (1
-
z)y = 0.
(9.525)
is a regular singular point. Hence we substitute 03
(9.526) n=O
and its derivative,
c 03
un(n
y/ =
+
(9.527)
7-)Zn+r-1,
n=O
into Equation (9.525) to write 03
n=O
00
00
n=O
n=O
Renaming the dummy variable in the first two series as n dropping primes, we obtain
-+
n’+ 1 and then
which, after rearranging, becomes 00
(7-
+ l)zruo + C[(n + 7- + 2)un+1
-
u,]zn+r+l = 0.
(9.530)
n=O
Indicia1 equation is obtained by setting the coefficient of the term with the lowest power of to zero:
+
(1 r)ao = 0, uo
# 0,
(9.531)
463
PROBLEMS
which gives
r
=
-1.
(9.532)
Using this in the recursion relation
(9.533) we write
(9.534) and obtain the series solution as
(9.535) -
a0
-ex
(9.536)
5
This can be justified by direct integration of the differential equation. Applications of the Frobenius method to differential equations of higher than second order can be found in Ince. PROBLEMS
1. Classify the following differential equations:
dy dx
+ x2y2= 5xex.
(i)
-
(iii)
d4y d3y - + 5 - - x2y dx4 dx3 d2U
-
8x2
d2u -+8x2
(ii)
=
dY
0,
d2U ++ - = f(x,y, dy2 dz2 d2U
d2U
dydx
+ - ddz2 =2oU.
d3Y + x2y2 = 0. dx3
2).
(vi) (viii
+ x2y = 5.
-=$+-@. dr d2r ds
d4y d3y - + 4- 7y = 0. dx4 dx3 x3dx + y2dy = 0 .
2. Show that the function
y(x) = (8x3+ C)e-6x satisfies the differential equation
_ dy -- -6y + 24x2e-6x dx
464
ORDINARY DIFFERENTIAL EQUATIONS
3. Show that the function y(x) = 2
+ ce-*z2
is the solution of the differential equation
dY + 16x3 = 322. dx
4. Find the solution of
,
2x-y+9 =x-3y+2
5. Given the differential equation
show that its general solution is given as y = (4x2
+ C)ep2"
6. Show that [Eq. (9.72)]
satisfies the differential equation y'
+ a(x)y = b(x).
7. Solve thc following initial value problem:
*+ dx
2 1 =~ 1~6 ~ ~ e -~ ~( 0~=), 2.
8. Show that the following initial value problems have unique solutions:
(ii)
dY - 2Y2 -
(iii)
-
dx
dy dx
2-2'
+ 2y = Sxe-'",
y(1) = 0. y(0) = 2.
PROBLEMS
465
9. Which one of the following equations are exact:
(i) (ii) (iii) (iv) (v)
+ + + + + + +
(3x + 4y) dx (4z 4y) dy = 0. (2x74 2) dx (x2 4y) dy = 0. (y2 1)cosx dx 2y(sinx) dy = 0. (3s 2y) dx (2x y) dy = 0. (4xy - 3) dx (x2 49)dy.
+ + +
+ +
10. Solve the following initial value problems: (i) (ii)
(iii)
+ (4x2 + 4y) dy = 0, (2ye“ + 2e” + 4y2) dx + 2(e” + 4xy) dy = 0,
(8xy - 6) dx
9dx
2y - 2 -3+2y-222’
Y(2) = 2 y(0) = 3.
y(-1) = 2.
11. Solve the following first-order differential equations:
+ +
+ + +
16xy dx (4x2 1) dy = 0. x(4y2 1) dx (x4 1) dy = 0. tan 6’dr 27- d% = 0. (iv) (x 2y) dx - 2 2 dy = 0. 2xy y2) dx ( x 2 2xy - y2) dy = 0. (v) (22’ (i)
(ii) (iii)
+
+
+
+
+
+
12. Find the conditions for which the following equations are exact: (i)
(Aox+ A1y) dx + (Box+ B l y ) dy
= 0.
13. Solve the following equations:
+ +
y’ (3/x)y = 482’. (i) (ii) xy’ [(4x 1)/(2x I)] y = 2 2 - 1. (iii) y’ - ( l / x ) y = -(1/x2)y2. (iv) 2xy’ - 4y = 2x4. (v) 2y’ + (8y - l/y3)x = 0.
+
+
14. The generalized Riccati equation is defined as
Y’(4
=
f(z)+ d Z ) Y
+
+W ) Y 2 .
Using the transformation y(x) = y1 u,show that the generalized Riccati equation reduces t o the Bernoulli equation:
466
ORDINARY DIFFERENTIAL EQUATIONS
+
15. If an equation, Ad dx N dy = 0, is exact and if M(x,y) and N ( z , y ) are homogeneous of the same degree different from -1, then show that t,he solution can be written without the need for a quadrature as
16. If there are two independent integrating factors, 11and 12, that are not constant multiples of each other, then show that the general solution of M ( r ,y) dx N ( x ,y) dy = 0 can be written as
+
11 (x,y )
= C12(2,Y).
17. Solve by finding an integrating factor:
+ 16y2 + 1) dx + (2x2 + 8xy) dy = 0. (8x7~’+ 2y) dx + (2y3 - 22) dy = 0. (1Ozy
(i)
(ii)
18. Show that the general solution of 2
( D - T ) y(x) is givcn as y(2)
=
D
= 0,
cOerZ + clxerZ.
=
d
-,
dx
19. Which one of the following sets are linearly independent:
{ eZ,e2Z,e-22}. {x,x2,x3}. {x,22, e32}. {sinx,cosz,sin2x} {ez,xex,x2eZ}.
(i) (ii) (iii) (iv) (v)
20. Given that y
= 22
is a solution, find the general solution of
(2+
i) 2
-
22-dY
dx
+ 2y = 0
by reduction of order.
21. Given the solution y = 2 2
+ 1, solve
22. Find the general solution for (i) (ii) (iii) (iv) (v)
+ 4y = 0. + 2y’ + 3y = 0. 2y’ + 15y = 0. y(2”) + 6y“ + 9y = 0.
y” y” y”
y/l’
-
-
2y”
+ y’
-
2y = 0.
PROBLEMS
467
23. Verify that the expression
satisfies the differential equation
+ P(z);Y'+ Q ( x ) =~ 0,
7~"
where y1 is a special solution.
24. Use the method of undetermined coefficients to write the general solution of y"
+ y = co sin x + c1 cos x.
25. Show that a particular solution of the nonhomogeneous equation d2Y + n2y -
dx2
= n2f(x)
is given as y=nsinnx
.c
f(x)cosnx dx-ncosnz
26. Use the method of undetermined coefficients to solve 5y' - 3y = x2ex. 2y' - 3y = sinx. (ii) y" 2y' - 3y = xe". (iii) y" 2y' - 3y = 2 sin x 52e" (iv) ~ ( 2 " ) + y' = x2 + sinx + ex. (v) (i)
2y"
y"
-
-
+
27. Show that the transformation x differential equation,
= et reduces
dn-1 d"Y uOxn u12"-1dz" dzn-
+
the nth-order Cauchy-Euler
+ ' . + any = F ( x ) , '
into a linear differential equation with constant coefficients.
Hint: First show for n = 2 and then generalize. 28. Find the general solution of d2Y dx2
X2 -+ x -
dy + n 2 y = xm dx
468
ORDINARY DIFFERENTIAL EQUATIONS
29. Solve the following Cauchy-Euler equations: (i)
2x2-d2Y - 52-dY dx2 dx
(ii)
d2Y x2dx2
-
d2Y (iii) 2x2dx2 d2Y (iv) x2-dx2 d3y x3dx3
(v)
dY 2xdx
-
-
3y = 0.
3y = 0.
dY + y = 0. + 32-dx
dY - 6y = 0. + x-dx dy + 22- - 2y = Inx. dx2 dx d2y
-
z2-
d2Y dY (vi) 22’- dx2 - 52-dx
+ y = x3.
30. Classify the singular points for the following differential equations:
+
+ + + + + + + + + +
(x2 3x - 9)y” (x 2)y’ - 3x2y = 0. (i) (ii) x 3 ( i - x)y” x2 sinxy’ - 3xy = 0. (iii) (x2 - 1)y” 2xy’ y = 0. (iv) (x2 - 4)y” (x 2)y’ 4y = 0. (v) (x2 - 2 ) ; ~ ” (X - 1 ) ~-’ 6y = 0. (vi) x2y” - 2xy’ 2y = 0. 31. Find series solutions about x ential equations: (i) (ii) (iii) (iv) (v)
=0
and for x > 0 for the following differ-
+
x(x3 - 1)y” - (3x3 - 2)y’ 2x2y = 0. xy” + 49’ - 4y = 0. (4x2 1)y” 4xy’ 16xy = 0, y(0) = 2, y’(0) 3xy” (2 - x)y’ - y = 0. xy” 4y’ - 4xy = 0.
+ + +
+
+
= 6.
Discuss the convergence of the series solutions you have found.
32. Find the general expression for the series in Example 9.21: 2x2 + 2x4 - 4x6 + . . . yz(x) = 1 - 3 21 693 where the recursion relation is given as an = -
2an-2 n = 2,3, . . . n ( n- +) ’
CHAPTER 10
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
Applications of differential equations are usually accompanied by boundary or initial conditions. In the previous chapter, we have concentrated on techniques for finding analytic solutions to ordinary differential equations. We have basically assumed that boundary conditions can in principle be satisfied by a suitable choice of integration constants in the solution. The general solution of a second-order ordinary differential equation contains two arbitrary constants, which requires two boundary conditions for their determination. The needed information is usually supplied either by giving the value of the solution and its derivative a t some point, or by giving the value of the solution at two different points. As in chaotic processes, where the system exhibits instabilities with respect t o initial conditions, the effect of boundary conditions on the final result can be drastic. In this chapter, we discuss three of the most frequently encountered second-order ordinary differential equations of physics: Legendre, Laguerre, and Hermite equations. We approach these equations from the point of view of the Frobenius method and discuss their solutions in detail. We show that the boundary conditions impose severe restrictions on not just the integration constants but also on the parameters that the differential equation itself includes. Restrictions on such parameters Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
3. Selquk
Bayin
469
470
SECOND-ORDERDIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
may have rather dramatic effects like the quantization of energy and angular momentum in physical theories.
10.1 LEGENDRE EQUATION Legendre equation is defined as (1 - x 2 )-d2Y
dx2
-
2x-dY +Icy = 0,
(10.1)
dx
where Ic is a constant parameter with no prior restrictions. In applications, I; is related to physical properties like angular momentum, frequency, etc. Range of the independent variable x is [ - 1,1].
10.1.1 Series Solution Legendre equation has two regular singular points at the end points x = f l . Since x = 0 is a regular point, we use the Frobenius method and try a series solution of the form 00
y(x) = C a n x n + s , a0
# 0.
(10.2)
n=O
Our goal is to find a finite solution in the entire interval [-1,1]. Substituting Equation (10.2) and its derivatives, (10.3)
n=O
c 03
y” =
un(n
+ s ) ( n+ s
-
1)Zn+s--2,
(10.4)
n=O
into the Legendre equation, we obtain
-2
03
03
n=O
n=O
C an(n+ s ) ~ c +~ k+C ~ a n ~ n + s 0, -
(10.5)
03
C a n ( n + s ) ( n+ s - l)xn+s-Z
n=O 03
+ C a, [-(n +
S)(TZ
+s
-
1) - 2(n
+ S) + k] z
~ =+0. ~
(10.6)
LEGENDRE EQUATION
471
To equate the powers of x, we substitute n-2=n'
(10.7)
into the first series and drop primes to write
n=-2
+
c 03
a, [-(n
+ s ) ( n+ s + 1)+ k] xn+s = 0.
(10.8)
n=O
Writing the first two terms of the first series explicitly, we get ao(-2
+
c 03
+ s + 2)(-2 + s + 1 y 2+ a1(-l+ s + 2)(-1+ s + 1)xS-l z ~ =+0. ~ + s + 2)(n + s + 1)+ an [-(n+ + s + 1) + 1~11
{~7i+2(12
,n=o
(10.9) Since this equation can be satisfied for all x only when the coefficients of all powers of x vanish simultaneously, we write
an+2
aos(s - 1) = 0, a0 # 0) a l s ( s 1) = 0) -(n s ) ( n s 1) k , n = 0 , 1 , 2 ,... . = -an ( n s 2)(n s 1)
+ + + + + + + + +
(10.10) (10.11) (10.12)
The first equation [Eq. (lO.lO)] is the indicia1 equation and its roots give the values of s as SI = 0
Starting with the second root, s
and
s2 =
1.
= 1, Equation
(10.13)
(10.11) gives
a1 = 0.
(10.14)
From Equation (10.12) we obtain the recursion relation for the remaining coefficients as an+2 = an
+
+ 2) k + 2)(n+ 3) , n = 0 , 1 , 2 , . . . ,
(n I)(. (n
-
(10.15)
472
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
which gives the coefficients 1.2 - k a0 7 2.3 2.3 - k ag = a ] = 0, 3.4 3.4 - k 1.2 - k a4 = a2 = 2.3 4.5 4.5 - k 2.3 - k a5= 3.4 5.6
a2 =
( 10.16)
~
(10.17)
~
[
~
~
[
][
~
and the series solution y(z) = aon: [l
3.4 - k
][ i ]
]
~
x 2+
( 10.19)
= O’
I,.[
+2
( 10.18)
1.2 - k
3.4- k
x4 +
...
]
.
(10.20)
For the time being, we leave this solution aside and continue with the other root,. s = 0. The recursion relation and the coefficients now become an+2 = a n
n(n+ 1) - k n = 0 , 1 , 2 , ... ( n l ) ( n 2) ’
+
(10.21)
+
(10.22)
# 0, a1 # 0, a0
( 10.23)
-k -ao, 1.2 1.2 - k a3 = ___ a1, 2.3 2.3 - k a4 = 3.4 3.4 - k 1.2 - k a 5 = - ___ 2.3 4.5
(10.24)
a2 =
(10.25) (10.26)
~
[
][
]
( 10.27)
This gives the series solution
k y(n:) = a0 1 - -x2 1.2
[
-
k (2.3 - k ) x4 1.2.3.4
+
.
,
.
1
]
(10.28)
( 10.29)
473
LEGENDRE EQUATION
Note that this solution contains the previous solution [Eq. (10.20)]. Hence we can take it as the general solution of Legendre equation, where y1 and y2 are the two linearly independent solutions and the coefficients, a0 and a l , are the integration constants. In the F'robenius method when the roots of the indicia1 equation differ by an integer, it is always advisable t o start with the smaller root with the hopes that it will give both solutions. 10.1.2
Effect of Boundary Conditions
To check the convergence of these series, we write Equation (10.28) as (10.30) and consider only the first series with the even powers. Applying the ratio test with the general term, uzn = ~ 2 ~and x the ~ recursion ~ , relation C2n+2
=
+
2n(2n 1) - k ~ (an l ) ( 2 n 2)
+
+
2 ri ~= 0, , 1 , 2 , .
.. ,
(10.31)
we obtain (10.32)
=I
+
2n(2n 1) - k (2n 1)(2n 2)
+
(10.33)
+
For convergence we need this limit to be less than 1. This means that the series converges for the interior of the interval [-1,1], that is, for 1x1 < 1. For the end points, z = 51, the ratio test is inconclusive. We now examine the large n behavior of the series. Since limn-m C Z ~ + ~ / C 1,~ we ~ can write the high n end of the series as ---f
y1=
[
1-
kx2 1.2
C2nx2n(
1
+ x2 + x4 + . . .
)I
,
(10.34)
which diverges at the end points as (10.35) The conclusion for the second series with odd powers is exactly the same. Since we want finite solutions everywhere, the divergence at the end points is unacceptable. A finite solution can not be obtained just by fixing the integration constants. Hence we turn to the parameter, k , in our equation. If we restrict the values of k as k = L ( l + l ) , Z=O,1,2 , . . . ,
(10.36)
474
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
one of the series in Equation (10.28) terminates after a finite number of terms, while the other series continues to diverge at the end points. However, we still have the coefficients, a0 and a l , at our disposal [Eq. (10.29)]; hence we set the coefficient in front of the divergent series to zero and keep only the finite polynomial solution as the meaningful solution in the entire interval [-1,1]. For example, if we take 1 = 1, hence k = 2 , only the first term survives in the second series, y2, [Eqs. (10.28) and (10.29)], thus giving 2 y(2) = a0 1 - --a: 1.2
[
2 -
+ a1x.
2(2.3 - 2) 2 4 + . . . 1.2.3.4
1
(10.37)
Since the remaining series in Equation (10.37) diverges at the end points, we set the integration constant a0 to zero, thus obtaining the finite polynomial solution as (10.38)
y'=l(z) = a1-a:. Similarly, for 1 = 2, hence k = 6, Equation (10.28) becomes y(x)
=a0
We now set
[
1 - --a:
a1 = 0
162
2]
to obtain the polynomial solution (10.40)
In general, the solution is of the form (10.41)
10.1.3
Legendre Polynomials
To find a general expression for the coefficients, we substitute k write [Eq. (10.21)],
an
= -%+2
+
+ + +
(n 2)(n 1) (1 - n)(l n 1)'
= 1(1+
1) and
(10.42)
as an-2 =
-a,
n(n - 1)
(1 - n
+ 2)(1+ n
-
1).
(10.43)
LEGENDRE EQUATION
475
Now the coefficients of the decreasing powers of x can be obtained as
an-4
(n- 2)(n - 3) (1 - n 4)(1+ n - 3) ’
= -an-2
(10.44)
+
Starting with the coefficient of the highest power, coefficients in the polynomials as
al,
we write the subsequent
a1 ’ (2-2
(2-4
(10.45)
Z(1 - 1) 2(21 - 1)’ (1 - 2)(I - 3) Z(1 - 1)(1- 2)(1 - 3) = -Ul-2 = a1 2.4(21 - 1)(21 - 3) ’ 4(21 - 3)
= -al
(10.46) (10.47)
Now a trend begins to appear, and after s iterations we obtain al-2s = al(-l)S
+
1)(1 - 2 ) . . . (1 - 2s 1) 2.4 . . . (2 ~ ) ( 2 1 - 1 ) ( 2 1 - 3 ) . . . ( 2 1 - 2 ~ + 1 ) ’ Z(1
-
(10.48)
The general expression for the polynomials can be written as
(10.49) s=o
where [&I stands for the greatest integer less than or equal to values, and the number of terms in yl are given as
[i]
1
r31
# ofterms
0 1 2 3 4 5 6 7 8 0 0 1 1 2 2 3 3 4 . 1 1 2 2 3 3 4 4 5
To get a compact expression for (10.48) as Z(1 - 1)(1- 2) .
’ ’
a2lPs,
(1 - 2s
6.
For some 1
( 10.50)
we write the numerator of Equation
+ 1) ((1I
-
2s)! I! 2s)! (1 - 2s)!.
(10.51)
476
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
Similarly, by multiplying and dividing with the appropriate factors the denominat,or is written as
+ 1)
2.4.. . ( 2 ~ ) ( 2I l)(2l - 3 ) . . . (21 - 2s -
(1.2)(2.2)(2.3).. . ( 2 . ~ ) ( 2 1 l)[21 - 2](2I - 3)[2I - 41 . . . (2l - 2s [2l - 2][2l - 41.. . [2I - 2s f a ] . [2l - 2s]!
+ 1) . [2l - as]! (10.52)
- 2SS!-
-
[all (2l 2"
-
sI!
l)[2I - 2](2I - 3)[2l - 41 . . . (2l - 2s + 1) . [2l - 2 ~ ].![I [I(I - 1)(l - 2 ) . ' . (1 - s + 1)[l - s ] ! ] [21 - as]! (10.53) '
s!(2l)!(l - s ) ! I!(2I - 2s)!
(10.54)
Combining Equations (10.51) and (10.54) in Equation (10.48), we obtain 1! l!(2l - 2s)! (I - as)! s!(2l)!(I - s ) ! (l!)2(2I - 2s)! al(-l)s (I - 2S)!S!(21)!(1- s ) !'
ul-2s = U l ( - l ) S
(10.55) (10.56)
which allows us to write the polynomial solutions as
(10.57) (1!)22l
= a[-
(2l)!
c
(-1y 21 (I
-
s=o
(2l- 2s)! &ZS 2s)!s!(l - s)!
(10.58)
-
Legendre polynomials are defined by setting Ul =
(2l)! (1!)221'
-
(10.59)
as rf1
c 21
p1(2= ) s=o
(-1y
(2I - 2s)! 51-2s (I - 2s)!s!(I - s ) !
(10.60)
These are the finite polynomial solutions of the Legendre equation IEq. ( l O . l ) ] in the entire interval [-1, 11:
LEGENDRE EQUATION
477
Legendre Polynomials
Po ).( = 1, Pl(.) = 5 , P2
(i)
(x)=
(;)
P3 (x)=
(i) P5(z) (i) P ~ ( x )=
(k)
(10.61)
\5x3 - 3x1,
[35x4 - 30x2
[63x5 - 70z3
=
p6(x)=
[3x2 - 13 >
+ 31 ,
+ 15x1,
+
[231x6 - 3 1 5 ~ 1052' ~ - 51.
10.1.4 Rodriguez Formula Legendre polynomials are also defined by the Rodriguez formula
PZ(2) =
1 d' 2l1! dx
(10.62)
1)Z.
-
To show its equivalence with the previous formula [Eq. (10.60)], we use the binomial formula and expand ( x 2- 1)' as
(10.63) where the binomial coefficients are defined as
(a)
I!
(10.64)
= s!(l- s)!'
We now write Equation (10.62) as Z
1 dz P1(x)= -C(-l)" 211! dxz s=o
1
=
I!
1 C(-1)" 211! s=o
1! s!(Z
s!(Z
-
-
%2(Z-s)
s)!
d' -x2(z-s) s ) ! dxl
(10.65) (10.66)
When the order of the derivative is greater than the power of x, we get zero:
(10.67)
478
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
Hence we can write
r41
tn n! =
c o n n=O j = o c o n
(-1)jy-j
,.ra-jtn+j
(n-j)!j!
(10.206)
(10.207) n=O
j=o
where the second equation is multiplied and divided by n!. We rearrange Equation (10.207) as
The second series on the right-hand side is the binomial expansion (10.209) hence Equation (10.208) becomes
c 00
C 00
Hn(x)tn = tn(2x - t)" n! n! n=O n=O
( 10.210)
HERMITE EQUATION
495
Furthermore, the series on the right-hand side is nothing but the exponential function with the argument t(2x - t ) :
(10.211) thus giving us the generating function, T ( z ,t ) , of H n ( z ) as (10.212)
10.2.6
Special Values
Using the generating function, we can find the special values at the origin as
T ( 0 , t )= e-t2 =
c O0
n=O
Hn(0)tn n! '
(10.213)
( 10.214) which gives
( 10.215) ( 10.216) 10.2.7
Recursion Relations
By using the generating function, we can drive the two basic recursion relations for the Hermite polynomials. Differentiating T ( z , t )with respect to x we write
(10.217) Substituting the definition of the generating function t o the left-hand side, we write this as
(10.218) (10.219) Making a dummy variable change in the first series:
n+n'-I
(10.220)
496
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
and dropping primes, we write
(10.221) (10.222) which gives the first recursion relation as
2nHn-1(x)
= Hk(X),
n = 1,2,... .
(10.223)
Note that HE,= 0. For the second recursion relation we differentiate T ( 5 , t ) with respect to t and write
(10.224) (10.225) (10.226)
( 10.227) To equate the powers oft, we let n -+n'-1 in the second series and n in the third series and then drop primes t o write 03
tn
C [ 2 x H n - 2nHn-1 - H n + i ] , r = 0. n=O
+ n"+l
(10.228)
Setting the coefficients of all the powers of t t o zero gives us the second recursion relation:
2xHn
-
2nHn-1
= Hn+l.
(10.229)
10.2.8 Orthogonality We write the Hermite equation [Eq. (10.147)] as
Hl
-
2xH; = -2nHn,
(10.230)
where we have substituted 2 n = E - 1. The left-hand side can be made exact (Chapter 9) by multiplying the Hermite equation with e-"' as
e - x Z Hnf f- 2 x e P x 2 Hnf = - 2 n e P x 2 H n ,
(10.231)
HERMITE EQUATION
497
which can now be written as
(10.232) and integrate over [-m, co]to write
We now multiply both sides by H,(x)
x2dH,
d
H,(x)Hn(x)e-x2dx,
lcoHmz [ep dx ]
(10.233)
which, after integration by parts, becomes
H:,H:,e - x 2
co
H:, H;epx2dx = -2n
Hm(x)Hn(x)e-x2dx. (10.234)
Since the surface term vanishes. we have
H k HAe-x2dx = 2 n
Hm(x)H,(z)e-x2dx.
(10.235)
Interchanging n and m gives another equation:
L
00
03
H:, H:,e-x2dx
=
2m
L
Hm(x)H,(x)e-x2dz,
(10.236)
which, when subtracted from the original equation [Eq. (10.235)], gives 00
2(m
-
Hm(x)Hn(x)e-x2dx
n)
= 0.
(10.237)
J-CO
We have two cases:
which shows that the Hermite polynomials are orthogonal with respect to the weight factor e-”’. To complete the orthogonality relation we need to calculate the normalization constant. Replace n by n - 1 in the recursion relation (10.229) to write
22Hn-1 Multiply this by
-
2(n - l)Hn-z
= H,.
(10.239)
H, :
2xHnH,-1
-
2nHnH,-2
+ 2HnHn-z
= H:.
( 10.240)
498
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
We also multiply the recursion relation (10.229) by H,-1 equation:
2xHn-1H,
2nH,_, 2
-
t o obtain a second
= Hn-1H,+1.
(10.241)
Subtracting Equation (10.241) from (10.240), we get
2xHnH,-1 =
Hi
+ 2H,H,-2
2nH,H,-2
-
-
+ 2nH?-,
2zH,-IH,
H,-lH,+I
-
(10.242)
or
+ 2HnHn-2 + 2nH,-1 + 2
-2nH,H,-2
Hn-lH,+l
= H?,
(10.243)
which, after multiplying by the weight factor e P x 2and integrating over [-m, 001, becomes 00
33
-
[
2n
+2 [
dxePx2H,H,-2
dxePx2H,H,_2
J -CC
J-CC
J-CC
J-Oc
(10.244) J
-00
Using the orthogonality relation [Eq. (10.238)],this simplifies t o 33
2n
[
CC
dxe-x2H:-1
[
=
dxePx2H:,
(10.245)
n = 1,2,3,. ..
(10.246)
J-02
J--03
2nN,-1
= N,,
Starting with N,, we iterate this formula to write
N , = 2nN,_1 = 2n2(n - 1)N,-2 = 2122(n- 1 ) 2 ( n- 2)N,-3
= 2j+'n(n
-
1 ) . . . ( n - j ) Nn-j-1.
(10.247)
We continue until we hit j = n - 1 , thus
N,
= 2,n!No.
(10.248)
We evaluate No using HO= 1 as
lCC 00
NO =
e - x 2 H i ( x ) dx
(10.249) (10.250)
J
=
-33
A,
(10.251)
HERMITE EQUATION
499
which yields N,, as
N , = 2 n n ! f i ,n = 0 , 1 , 2 , .. . .
(10.252)
10.2.9 Series Expansions in Hermite Polynomials A series expansion for any sufficiently smooth function, f ( x ) ,defined in the infinite interval (-00,oo)can be given as M
(10.253) n=O
Using the orthogonality relation,
(10.254) J -0
we can evaluate t,he expansion coefficients, cn, as
Convergence of this series is assured, granted that the real function, f(x),defined in the infinite interval (-co,co) is piecewise smooth in every subinterval [-a, a] and the integral
is finite. At the points of discontinuity the series converges to
(10.257)
A proof of this theorem can be found in Lebedev. Example 10.4. Expansion of f ( z ) = e a x , a is a constant: Since f ( z ) is a sufficiently smooth function, we can write the convergent series:
(10.258) n=O
where the coefficients are
(10.259)
500
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
Using the Rodriguez formula [Eq. (10.201)], we write (10.260) (10.261) (10.262) (10.263) We have used n-fold integration by parts in Equation (10.261) and completed the square in the last integral. Now the final result can be expressed as (10.264)
10.3
LAGUERRE EQUATION
Laguerre equation is defined as d2Y dx2
2-
+ (1
-
dY
x)-
dx
+ ny = 0,
x
E
[O, 001,
(10.265)
where n is a real continuous parameter. It is usually encountered in the study of single electron atoms in quantum mechanics. The free parameter is related to the energy of the atom. 10.3.1 Series Solution Since x = 0 is a regular singular point, we can use the Frobenius method and attempt a series solution of the form (10.266) with the derivatives y'(z, s) =
c
ar(T
+ s)zr+s-l,
(10.267)
r=O 00
f ( x , s) =
Car(.+ r=O
S)(T
+s
-
(10.268)
501
LAGUERRE EQUATION
Substituting these into the Laguerre equation, we get
c M
+ s ) ( r+ s
a,(?-
T=o
l)xT+s-l
-
c
c
a, ( r
M ._
a,(?-
+ s)xT+s-l
T=o
00
-
c c +
+n
a,(?- 4-s)xT+s
+ s)22T+s-1
-
c
a,(r
00
0,
(10.269)
= 0.
( 10.270)
aTIcT+s-
+s-
n)xT+S
In the first series we let r - 1 = r’ and drop primes at the end t o write 00
00
c 03
~
o
~
+~
+ +
x[ u ~~+ ~-( Ts~ 1)2- U,(T
+ s - n ) ]xT+’
(10.272)
= 0.
T=o
Equating all the coefficients of the equal powers of x to zero we get a092
= 0, a0
aT+l
=
# 0,
( 10.273)
+
(?- s - n ) r s 1)2 ’
(7-
+ +
= 0,1,
(10.274)
In this case the indicia1 equation (10.273) has a double root, s = 0. The recursion relation becomes a,+1
=
-a,-
n-r
(r
+
’
r=O,l,...
,
(10.275)
which leads to the series solution
n(n - 1 l x 2 + . . . + (-I), n(n - 1).. . ( n - r (2!)2 (r!)2
+ 1)x T +
...
( 10.276) This can be written as 00
r=O
n(n - 1 ) . . . ( n - r + 1) (r!)2
(10.277)
Laguerre equation has also a second linearly independent solution. However, it diverges IogarithmicalIy as x + 0, hence we set its coefficient in the general solution to zero and continue with the series solution given above.
1
502
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
10.3.2
Laguerre Polynomials
As we add more and more terms, this series behaves as
[::
y(x) = a0 1 - -x+
. ' . +arxr
hence it diverges as e x as x 403. For a finite solution everywhere in [O, co] we have no choice but to restrict n to integer values, the effect of which is to terminatc the series [Eq. (10.276)] after a finite number of terms, thus leading to the polynomial solutions:
r=O
n(n - I ) . . . ( n - r (r!)2
+ 1)
(10.279)
(10.280) Polynomials defined as (10.281) are called the Laguerre polynomials and constitute the everywhere finite solutions of the Laguerre equation:
10.3.3
Contour Integral Representation
Laguerre polynomials are also defined by the complex contour integral
dz
(10.283)
where the contour C is any closed path enclosing the point z = x. To show the equivalence of the two definitions, we evaluate the contour integral by using the residue theorem as (10.284)
To find the residue, we use the expansions
zn = (2+ x - x)n = [ ( z- z) 21"
+
(10.285) (10.286) (10.287)
LAGUERRE EQUATION
503
and -
ex-r
e-(r--2)
(10.288)
00
= C(-l)m
(2 -
m=O
x)."
m!
(10.289) '
Using these, we write the integrand of the contour integral [Eq. (10.283)] as zrl
e x -z
( z - x)n+l
c n
-
n! ( z - x)G7?--l l!(n- l ) ! ( z - x)n+l m=O 1 =o a
m=O
n
(.
(- l)"n!
-
x)Z-n-l+m
( z - x)" m! 5 n-Z .
(10.290) (10.291)
z=o
For the residue we need the coefficient of the (z-x)-' term, that is, the terms with
I-n-
I + m = -1, l=n-m.
(10.292) (10.293)
Therefore the residue is obtained as (10.294) Substituting into Equation (10.284)' we obtain (10.295) which agrees with our previous definition [Eq. (10.281)]
10.3.4
Rodriguez Formula
Using the contour integral representation [Eq. (10.283)] and the Cauchy derivative formula: 27ri n!
--f'"'(Zo)
=
(10.296)
we can write the Rodriguez formula of Laguerre polynomials as
ex dn(xne-2) n! dx"
Ln(x)= -
(10.297)
504
10.3.5
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
Generating Function
To obtain the generating function, T ( z , t ) ,we multiply L,(x) [Eq. (10.295)] by tn and sum over n to write
cc c o n
=
n=O T=O
( -l)'n!z'tn (n - r)!(r!)Z'
+ s t o write
We introduce a new dummy variable s as n = r 0000
+
(-l)T(r
(10.299)
S)!ZTtY+S
(10.300)
r=O s=O
Note that both sums now run from zero to infinity. We rearrange this as
c 00
T ( z , t )=
03
(-1)T5rtr
r!
r=o
s=o
+
( r s)!tS (r!)s! .
(10.301)
If we note that the second sum is nothing but (Dwight) 00
(r
+ s)!t"
1
(10.302)
s=O
we can rewrite Equation (10.301) as
(10.303) Finally, using ex of L,(x):
=
~ ~ o z r we / robtain ! the generating function definition 1
T ( z , t )= (1 - t ) exp
-xt
03
[m]
(10.304)
n=O
10.3.6 Special Values and Recursion Relations Using the generating function, and the geometric series, 1/(1- t ) = C,"==, tn, we easily obtain the special value
L,(O) = 1.
(10.305)
From the Laguerre equation [Eq. (10.265)] we also get by inspection
L',(O) = -n.
(10.306)
505
LAGUERRE EQUATION
Differentiating the generating function with respect t o t gives
(n+ I)Ln+l(X)= (271
+ 1 - x)L,(x)
-
US
nLn-l(x)
(10.307)
and differentiating with respect t o x , we obtain
L',+, (X )
-
L;(z)
=
(10.308)
-Ln ( x ).
Using the first recursion relation [Eq. (10.307)], the second recursion relation can also be written as
x L ~ ( x=) nLn(x) - nLn-I(x).
(10.309)
10.3.7 Orthogonality If we multiply the Laguerre equation by e c X as
d2L, (10.310) + (1- x)e-"--dLn = -ne-"L,(x), ePxxdx2 dx the left-hand side becomes exact and can be written as (see Chapter 9) (10.311) We first multiply both sides by L m ( x ) and then integrate by parts:
-
Jd
03
z dL, [xe-"%]
dx = -n
Jd
(10.313) 03
e-"LnLm dx. (10.314)
Interchanging m and n, we obtain another equation: O0
dL,
( 10.315)
which, when subtracted from Equation (10.314) gives ( m- n)
e-"L,L,
dx = 0.
( 10.316)
This gives the orthogonality relation as
e-"L,L,dz
= N,Sn,.
(10.317)
Using the generating function [Eq. (10.304)], the normalization constant, Nn; can be obtained as 1.
506
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
10.3.8 Series Expansions in Laguerre Polynomials Like the previous special functions we have studied, any sufficiently smooth real function in the interval [0, 00) can be expanded in terms of the Laguerre polynomials as 03
(10.318) n=O
where the coefficients en are found by using the orthogonality relation: cn = JO
e -" f ( x )L, ( x ) dx.
(10.319)
Convergence of this series t o f ( x ) is guaranteed when the real function, f ( x ) , is piecewise smooth in every subinterval, [ X I , Z ~ ]where , 0 < x1 < 2 2 < 00,of [O, cm) and x is not a point of discontinuity, and the integral
(10.320) is finite. At the points of discontinuity the series converges to (Lebedev)
(10.321) Example 10.5. Laguerre series of e-ax : This function satisfies the conditions stated in Section 10.3.8 for a > 0; hence we can write the series 00
(10.322) n=O
where the expansion coefficients are obtained as
(10.323)
$1
00
=
e-a" __ dn (e-"xn) dx dxn
(10.325)
n!
an
-
(a
(10.324)
+ 1)"+1' n = 0 , 1 , ... .
(10.326)
PROBLEMS
507
PROBLEMS
1. Find Legendre series expansion of the step function:
Discuss the behavior of the series you found at x
= a.
Hint: Use the asymptotic form of the Legendre series given as
where
E
is any positive number (Lebedev).
2. Show the parity relation of Legendre polynomials: P1(-x)
=
(-1)1fi(x).
3. Using the basic recursion relations [Eqs. (10.102) and (10.103)], derive (i)
Pi+l(x) = (1
+ 1)Pi(x)+ x q ’ ( x ) .
4. Show the relation cc 1-t2 = C(21fl)fi(z)tl. (1 - 2xt f t2)3/2 1=0
5. Show that Legendre expansion of the Dirac delta function is
6. Show that Hermite polynomials satisfy the parity relation
Hn(x)= (-1yHn(-z). 7. (i) Show that Hermite polynomials can also be defined as
508
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
(ii) Define your contour and evaluate the integral t o justify your result.
8. Show the integral
9. Show that 00
z2e-"2H,(z)Hm(x) dz
= 2n-W2(2n
+ l)n!6,, + 2"7W(n + 2)!5,+a,m
+ 2n-27r1'2n!5,
- 2 ,m .
10. Show the Laguerre expansion m
xm
=
C c , ~ , ( x ) , m = 0 , 1 , 2 , .. . , n=O
where
11. Using the generating function definition of Laguerre polynomials, show that the normalization constant, N,, in
is 1.
r
e-"L,L,dx
= N,6,,
12. Prove the basic recursion relations of the Laguerre polynomials:
13. Using basic recursion relations obtained in Problem 10.12, derive
z L k ( x ) = n L n ( x )- nL,-l(x)
CHAPTER 11
BESSEL’S EQUATION AND BESSEL FUNCTIONS
Bessel functions are among the most frequently encountered special functions in physics and engineering. They are very useful in quantum mechanics in WKB approximations. Since they are usually encountered in solving potential problems with cylindrical boundaries, they are also called cylinder functions. Bessel functions are used even in abstract number theory and mathematical analysis. Like the other special functions, they form a complete and an orthogonal set. Therefore, any sufficiently smooth function can be expanded in terms of Bessel functions. However, their orthogonality is not with respect to their order but with respect to a parameter in their argument, which usually assumes the values of the infinitely many roots of the Bessel function. In this chapter, we introduce the basic Bessel functions and their properties. We also discuss the modified Bessel functions and the spherical Bessel functions. There exists a wealth of literature on special functions. Like the classic treatise by Watson, some of them are solely devoted to Bessel functions and their applications.
Essentials of Mathematical Methods in Science and Engineering. By $. SelGuk Bayin Copyright @ 2008 John Wiley & Sons, Inc.
509
510
BESSEL'S EQUATION AND BESSEL FUNCTIONS
11.1 BESSEL'S EQUATION AND I T S SERIES SOLUTION Bessel's equation is defined as 2
Ym x 2d - dx2
dym
+ X - dx
2
+ (x
2
-
m )ym = 0, x
2 0,
(11.1)
where the range of the independent variable could be taken as the entire real axis or even the entire complex plane. At this point we restrict m t o positive and real values. Since x = 0 is a regular singular point, we can try a series solution of the form 00
(11.2) k=O
with the derivatives 03
(11.3) k=O 00
=E c k ( k
Y77 " l
+r)(k+T - 1)~"+'-~.
(11.4)
k=O
Substituting these into the Bessel's equation we write 03
00
k=O
k=O
k=O
(11.5) k=O which can be arranged as 00
+
x [ ( k r)(k
+r
-
1)
+ (k +r )
k=O
We now let k 03
+ 2 = k'
-
+
03
m 2 ] ~ k x k + rc ~ k=O
k
x
=~0.
+ (11.6) ~ ~
in the second sum and drop primes to write 03
+
c ( ( k T ) ( k 4-T - 1 )
+ ( k + r ) - m 2 ] ~ k x k ++r ~
C
k
-
2
~= ' 0.~
~(11.7)
k=2
k=O
Writing the first two terms of the first series explicitly, we can have both sums starting from k = 2 , thus
+ +
(r2- m2)cOzT [(r 1)2- m2]c1xr+1 00
+ C [ ( ( k+ k=2
T)2 - m2)Ck
+
Ck-2]Zk+T
= 0.
(11.8)
~
BESSEL'S EQUATION AND ITS SERIES SOLUTION
511
Equating coefficients of the equal powers of x t o zero, we obtain
(r2- m2)co= 0, co
+
[(r
+
-
[ ( k r ) 2- m2]c k
# 0,
(11.9)
m2]c1= 0,
+ ck-2
(11.10)
= 0, k = 2 , 3 , . . .
.
(11.11)
The first equation is the indicia1 equation, the solution of which gives the values of r as
r = f m , m > 0. For the time being, we take r gives
=
(11.12)
m, hence the second equation [Eq. ( l l . l O ) ]
[ ( m+ 1)2 - m 2 ] q= 0, (am 1 ) C l = 0 ,
(11.13) (11.14)
+
which determines c1 as zero. Finally, the third equation [Eq. (11.11)] gives the recursion relation for the remaining coefficients as ck-2
[(lc Now, with r
=
+ ?-)2 - m2]' k = 2 , 3 , . . . .
(11.15)
m, all the nonzero coefficients become
co # 0, c2
=
c4 =
(11.16)
+
CO
(11.17)
( m 2)2 - m2 '
+ 2)2
[(m
CO -
+
m2][(m 4)2 - m2]'
A similar procedure gives the series solution for r
=
-m; hence for r
(11.18)
=m
we
can write
=
C C Z ~ X ~ m~ +> ~0. ,
(11.20)
k=O
To check convergence, we write the general term as (11.21)
512
BESSEL'S EQUATION AND BESSEL FUNCTIONS
From the limit
c2 k
= lim k-oo
=
(11.23)
c2 ( k- 1) X2
lim
(2k
k-oo
"1
= lim
+ m)2
4k2
k-x
+o
0 : To evaluate this integral we make use of the integral representation in Equation (11.104). Replacing Jo with its integral representation we obtain
1"
e-"JO(lx) dx
=
=
r2 lT'21
I" 27r
dx e-kx-
(11.197)
dz eCkz cos [lzsin p]
(11.198)
00
dp
= 7r -
cos [lzsinp] d p
7r
k dp k2 + l 2 sin2 p
(11.199)
k , l > 0.
(11.200)
1
d m l
so"
Since the integral e-kzJO(lz) dx is convergent, we have interchanged the order of the p and z integrals in Equation (11.198). Example 11.2. Evaluate e - k 2 x 2Jm(lz)zmtldx: This is also called the Weber integral, where k , 1 > 0 and m > -1. We use the series representation of J , [Eq. (11.50)] t o write
I"
e-kzz2Jm(lz)zm+1 dx
(11.204) -
I" (2Ic2)m+l
-12/4k2,
k , l > 0 and m > -1.
(11.205)
Since the sum converges absolutely, we have interchanged the summation and the integration signs and defined a new variable, t = k 2 x 2 , in Equation (11.203).
534
BESSEL’S EQUATION AND BESSEL FUNCTIONS
Example 11.3. Crane problem: We now consider small oscillations of a mass raised (or lowered) by a crane with uniform velocity. Equation of motion is given by d --(mL20) mgl sin 6 = 0, (11.206) dt where 1 is the length of the cable and m is the mass raised. For small oscillations we can take
+
sine
21
8.
( 11.207)
For a crane operator changing the length of the cable with uniform velocity, VO, we write dl dt
- = VO
(11.208)
and the equation of motion becomes 19
+ 2voe + g o
= 0.
(11.209)
We now switch to 1 as our independent variable. Using the derivatives (11.210)
(11.211) we can write the equation of motion in terms of 1 as d2Q
g + 21 ddl9 + +(l) lv,
- -d12
= 0.
(11.212)
In applications we usually encounter differential equations of the form
]
a2 - p2c2 2
52
y(x) = 0,
(11.213)
solutions of which can be expressed in terms of Bessel functions as
Y(X)= za [A,J,(bz“)
+ AINp(bxC)].
( 11.2 14)
Applying to our case, we identify 1 - 2a 2
2 2
a -pc
2,
(11.215)
=0,
(11.216)
=
(11.217) (11.218)
PROBLEMS
535
which gives (11.219)
We can now write the general solution of Equation (11.212) as
Time-dependent solution, Q ( t )is, obtained with the substitution
l ( t ) = lo + vot.
(11.221)
PROBLEMS 1. Drive the recursion relations
2m JnL-l(x) Jm+l(x)= -Jm(x),
+
m = I , & . ..
X
and J ~ - ~ ( X-) J ~ ~ + ~ (=X2)J k ( x ) , m =
1,2,. . . .
Use the first equation to express a Bessel function of arbitrary order ( m = 0 , 1 , 2 , . . . ) in terms of J o ( x ) and J ~ ( x )Also . show that for m = 0 the second equation is replaced by Jb(X) = -J1(x). 2. Derive Equation (11.58):
and Equation (11.63):
3. Verify the following Wronskians: W [ J m ,H E ' ] =
22 --
7rX
,
4i
w [ H : ) , H g ) ] = --
n-2
2
W [ J m , N m ]= E'
,
536
BESSEL'S EQUATION AND BESSEL FUNCTIONS
4. Find the constant, C , in the Wronskian
C w [&n(x),Km(z)l= ---. X 5. Verify the Wronskian W [ & , Lm] =-
2 sin rnr 7lX
6. Gamma function: To extend the definition of factorial t o noninteger values, we can use the integral
where r(z) is called the gamma function. Using integration by parts, show that for x 2 1
Use the integral definition to establish that r(1)= 1 and then show that when n is a positive integer 1) = n!. Because of the divergence at x = 0, the integral definition does not work for x 5 -1. However, definition of the gamma function can be extended to negative values of x by writing above formula as
r(n+
1 qX) = -qX + I), X provided that for negative integers we define
1 r(-n)
= 0,
n is integer.
Using these first, show that
r(-1/2)= J;; and then find the value of r(-3/2). Also evaluate the following values of the gamma function:
7. Evaluate the integral d z , a, b
3
> 0, 2n + - > m > -1. 2
PROBLEMS
537
Hint: Use the substitution
8. For the integer values of n prove that A-,(z) = ( - l y N , ( X ) .
9. Use the generating function
n=-cc
to prove the relations
Jn(-x)
=
(-l)nJn(x)
and
which is also known as the addition formula of Bessel functions.
10. Prove the formula
Jn(x) = (-1)CX"
('")" x dx
Jo(z)
by induction, that is, assume it to be true for n = N and then show for Nfl. 11. Derive the formula eizcose -
C
imJm(.z)eime,
m=-cc
where z is a point in the complex plane. This is called the Jacobi-Anger expansion and gives a plane wave in terms of cylindrical waves.
12. In Fraunhofer diffraction from a circular aperture we encounter the integral
I
- la 12x r dr
d0 eibr'OS
',
538
BESSEL'S EQUATION AND BESSEL FUNCTIONS
where a and b are constants depending on the physical parameters of the problem. Using the integral definition
1LT
cos [mcp- xsincp] dcp,
Jm(x)= first show that
I
-
27r
m = 0, f l ,f 2 , . . . ,
La
Jo(br)r d r
and then integrate to find
I = (27ra/b)Ji(ab). Hint: Using the recursion relation:
m -Jm X
+ J A = Jm-l,
first prove that
d dx
-[zmJm ( x ) ]= x m Jm-l ( x ) .
Also note that a similar equation,
can be obtained via
m --Jm X
+ J A = -Jm+l.
13. Prove the following integral definition of the Bessel functions:
14. Using the generating function definition of Bessel functions, show the integral representation
Jm(x)= 15. Prove
i*
cos[mcp - x sin cpldcp, m = 0 , fl, f2,..
PROBLEMS
539
where n is positive integer. 16. Show that
17. Show that the spherical Bessel functions satisfy the differential equation
18. Spherical Bessel functions, jl(x) and n~(x),can also be defined as
Q(X) =
(-x)l
(:$
(---) cos x
Prove these formulas by induction 19. Using series representations of the Bessel functions, y 7 n and fl ,.L, show that the series representations of the spherical Bessel functions, j l and n ~are , given as
c 00
jl(X) = 2lX1
n=O
( - l y ( l + n)! X2n, n!(2n 21 l)!
+ +
n=O
(-l)"(n - l ) ! zn X n!(2n - 21)!
Hint: Use the duplication formula: 22kr(k
+ i ) r ( k + 1/2) = r ( i / 2 ) r ( 2 l c + 1) = J;;(2k)!.
20. Using the recursion relations given in Equations (11.84) and (11.85), J"-l(X)
and
= x-"-
d [X"J"(X)] dx
540
BESSEL'S EQUATION AND BESSEL FUNCTIONS
show that spherical Bessel functions satisfy the following recursion relations:
where yl stands for anyone of the spherical Bessel functions, j l , nl, or hj1,2) 21. Show that the solution of the general equation
can be expressed in terms of the Bessel functions as ~ ( 2= )
[AoJ,(bz") + AINp(bxC)].
CHAPTER 12
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
The majority of the differential equations of physics and engineering are partial differential equations. The Laplace equation,
T%(T+)= 0,
(12.1)
which plays a central role in potential theory, is used in electrostatics, magnetostatics, and stationary flow problems. Diffusion and flow or transfer problems are commonly described by the equation 1 N(?.’,t)
T h ( T + t, ) - -
a2
at
= 0,
(12.2)
where o is a physical constant depending on the characteristics of the environment. The wave equation,
V%(?.’,t)
1 #Q(?.’,t)
-
v2
at2
= 0,
(12.3)
where ‘u stands for the wave velocity, is used to study wave phenomena in many different branches of science and engineering. The Helmholtz equation, Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
s. SelGuk Bayin
541
542
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
TPQ(7) + k ; Q ( 7 ) = 0,
(12.4)
is encountered in the study of waves and oscillations. Nonhomogeneous versions of these equations, where the right-hand side is nonzero, are also frequently encountered. In general, the nonhomogeneous term represents sources, sinks, or interactions that may be present. In quantum mechanics, the timeindependent Schrodinger equation is written as
(12.5) while the time-dependent Schrodinger equation is given as tl2
--PQ(7,t) 2m
+ V(?)Q(?,t)
= itl
as(?, t )
(12.6)
dt
Partial differential equations are in general more difficult to solve. Integral transforms and Green’s functions are among the most commonly used techniques to find analytic solutions (Bayin). However, in many of the interesting cases it is possible to convert a partial differential equation t o a set of ordinary differential equations by the method of separation of variables. The majority of the partial differential equations of physics and engineering can be written as a special case of the general equation:
VZQ(7,t) +KQ(7,t)
=a
d Z Q ( 7 , t )+,dQ(7,t)
’
at
at2
(12.7)
where a and b are usually constants but K could be a function of ?. In this chapter we discuss treatment of this general equation by the method of separation of variables in Cartesian, spherical and cylindrical coordinates. Our results can be adopted to specific cases by an appropriate choice of the parameters K , a and b.
12.1 SEPARATION OF VARIABLES IN CARTESIAN COORDINATES In Cartesian coordinates we start by separating the time variable in Equation (12.7) by the substitution Q ( 7 , t )= F(?)T(t)
(12.8)
and write
T(t)g2F(?)
:: Z]
+ r ; F ( 7 ) T ( t )= F ( 7 ) [a-
+ b-
,
(12.9)
543
SEPARATION OF VARIABLES IN CARTESIAN COORDINATES
where we take
K
as a constant. Dividing both sides by F(?)T(t) gives
1
(12.10)
where the left-hand side is only a function of 7and the right-hand side is only a function o f t . Since T and t are independent variables, the only way this equation can be true for all 7and t is when both sides are equal to the same constant. Calling this constant - k 2 , we obtain two equations: (12.11) and (12.12) The choice of a minus sign in front of k2 is arbitrary. In some problems, boundary conditions may require a plus sign if we want to keep k as a real parameter. In Cartesian coordinates the second equation is written as
We now separate the x variable by the substitution
and write
(12.15) which, after division by X ( x ) G ( y z, ) , becomes
X ( X ) dx2
1 G(y,z)
dz2 (12.16)
Similarly, the only way this equality can hold for all x and (y, z ) is when both sides are equal to the same constant, k:, which gives the equations
1 d2X(x) 2 -~ +k,=O
X ( x ) dx2
(12.17)
and
(12.18)
544
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
Finally, we separate the last equation by the substitution
which gives
Dividing by Y ( y ) Z ( z ) we , write 1 d2Y(y)-
Y ( y ) dy2
22z )
[%+
1
z(z)
(K
+ k2
-
]
k z ) Z(Z) ;
(12.21)
and by using the same argument used for the other variables, we set both sides equal t o the constant k i t o obtain
(12.22)
d2Z(z) + (6 dz2
+ k 2 - k; - k i ) Z ( z ) = 0.
(12.23)
In summary, after separating the variables, we have reduced the partial differential equation [Eq. (12.7)] t o four ordinary differential equations:
+ b-ddTt + k 2 T ( t )= 0 , d 2 X ( x )+ k 2 X ( X ) = 0 , dx2
d2T adt2
+ + k2 (K
-
(12.25)
+ k ; Y ( y ) = 0,
(12.26)
k; - k i ) z ( z ) = 0.
( 12.27)
d2Y(Y) dY2
d2Z(z) dz2
(12.24)
During this process, three constants, k , k,, and k,, which are called the separation constants, have entered into our equations. The final solution is now written as
Q ( T + , t )= T ( t ) X ( x ) Y ( y ) Z ( z ) .
(12.28)
12.1.1 Wave Equation One of the most frequently encountered partial differential equations of physics and engineering is the wave equation:
T’”Q(?;t,t)
-
1 82Q(?;t,t)
212
at2
= 0.
(12.29)
SEPARATION OF VARIABLES IN CARTESIAN COORDINATES
545
For its separable solutions we set 1
a=-,
V2
b=0, K=O,
(12.30)
where zi is the wave speed. Introducing w ,
w
=
k v , k2 = kq + k ; + k : ,
(12.31)
which stands for the angular frequency, we find the equations to be solved in Cartesian coordinates as
d2T dt2 d2X(x) dx2
-+ W 2 T ( t ) = 0 ,
+ k ; X ( z ) = 0, d2Y(y) + k;Y(y) = 0, dY2 d2Z(z) dz2
+ k:Z(z) = 0.
(12.32)
( 12.33)
( 12.34) (12.35)
All these equations are of the same type. If we concentrate on the first equation, the two linearly independent solutions are coswt and sinwt. Hence the general solution can be written as
T ( t )= a0 cos wt
+ a1 sin wt
or as
T ( t )= A c o s ( w t + 6), where ( a o , a l ) and ( A , 6) are arbitrary constants to be determined from the boundary conditions. In anticipation of applications to quantum mechanics, one can also take the two linearly independent solutions as e*Zwt. Now the solutions of Equations (12.32)-(12.35) can be conveniently combined to write
where
( 12.37) (12.38) These are called the plane wave solutions of the wave equation, and Q ( 7 , t ) corresponds to the superposition of two plane waves moving in opposite directions.
546
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
12.1.2 Laplace Equation In Equation (12.7) if we set K=O
(12.39)
and assume no time dependence, we obtain the Laplace equation
V’”Q(?”)= 0.
(12.40)
Since there is no time dependence, in Equation (12.11) we also set k = 0 and
T ( t )= 1, thus obtaining the equations to be solved for a separable solution as d2X(x) dx2 d2Y(y) dY2
d2Z(z) (kp dz2
--
+ kPX(X) = 0,
(12.41)
+ ICiY(y) = 0,
(12.42)
+ r$)Z(z) = 0,
(12.43)
Depending on the boundary conditions, solutions of these equations are given in terms of trigonometric or hyperbolic functions. Example 12.1. Laplace equation inside a rectangular region: If a problem has translational symmetry along one of the Cartesian axes, say the z-axis, then the solution is independent of z . Hence we solve the Laplace equat,ion in two dimensions: (12.44) solution of which consists of a family of curves in the xy-plane. Solutions in three dimensions are obtained by extending these curves along the z direction to form a family of surfaces. Consider a rectangular region (Fig. 12.1) defined by x E [O,aI, Y E [O,bI,
( 12.45)
and the boundary conditions given as Q(x, 0) = f(x), Q(x, b) = 0, Q(0,Y) = 0, Q ( a , y ) = 0.
In the general equation [Eq. (12.7)] we set
(12.46) (12.47) (12.48) (12.49)
547
SEPARATION OF VARIABLES IN CARTESIAN COORDINATES
I' I
I
Figure 12.1
Laplace equation inside a rectangular region.
a = b = 6 = 0.
(12.50)
No time dependence gives k = 0 and T ( t ) = 1. Since there is no z dependence, Equation (12.27) also gives kz kp = 0; hence we define
+
X2 = k2 = -k2
Y'
(12.51)
which gives the equations to be solved for a separable solution as
( 12.52) (12.53) Solutions of these equations can be written immediately as
X ( z ) = a0 sin X z + a1 cos Ax, Y (y) = bo sinh Xy + bl cosh Xy. Imposing the third boundary condition [Eq. (12.48)], we set which yields
X ( z ) = a0 sin Xz.
(12.54) (12.55) a1
= 0,
(12.56)
Using the last condition [Eq. (12.49)], we find the allowed values of X as nrr X n -- - , n = 1 , 2 ,.") ( 12.57)
a
548
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
which gives the solutions
nrx
X,(x) = a0 sin -, Y,(y)
n = 1 , 2 , .. . , nr nr
(12.58)
a
+ bl cosh -ya
= bo sinh -y
a
( 12.59)
Hence, the solution of the Laplace equation becomes
+ bl cosh -ya
1.
(12.60)
Without any loss of generality, we can also write this as
nr nr Qn(x,y) = A [sin T-51[ Bsinh -9 a
+ cosh -ya
(12.61)
We now impose the second condition [Eq. (12.47)] to write (12.62) and obtain B as cosh 7b sinh 7b '
B=-
(12.63)
Substituting this back into Equation (12.61), we write
[
XPn(x,y)= A sin-x
1[
y
sinh ( b - y) sinhFb
(12.64)
So far we have satisfied all the boundary conditions except the first one, that is, Equation (12.46). However, the solution set
{ x,(x)=
a0
nrx
sin -, a
n = 1,2,...
1,
( 12.65)
like the special functions we have seen in Chapters 10 and 11, forms an orthogonal set satisfying the orthogonality relation
la
[sin
7 1 y]dx [sin
=
(s)
.&t
(12.66)
Using these base solutions, we can express a general solution as the infinite series. 00
00
n.= 1
n=l
a
sinh
yb ( 12.67)
SEPARATION OF VARIABLES IN CARTESIAN COORDINATES
549
't
Figure 12.2
A different choice for the boundary conditions.
Note that the set
{ @ n ( x , y ) }> n = 1 , 2 , . . . ,
(12.68)
is also orthogonal. At this point, we suffice by saying that this series converges to S ( x ,y ) for any continuous and sufficiently smooth function. Since the above series is basically a Fourier series, we will be more specific about what is meant from sufficiently smooth when we introduce the (trigonometric) Fourier series in Chapter 13. We now impose the final boundary condition [Eq. (12.46)] to write (12.69)
To find the expansion coefficients we multiply the above equation by sin and integrate over [0, a] and then use the orthogonality relation [Eq. (12.66)l to obtain
yx
c, =
(:)
La
f(x) sin n r x dx. a
(12.70)
Since each term in the series (12.67) satisfies the homogeneous boundary conditions [Eqs. (12.47)-(12.49)], so does the sum. Now, let us consider a different set of boundary conditions (Fig. 12.2): (12.71) (12.72) (12.73) (12.74)
550
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
In this case the solution can be found by following similar steps:
I
c c,, ?I [ oc
~ ( xy) , =
sinh ? ( a - z) sinhyu '
[sin
n=l
where
C,
=
(12.75)
(f) lb
F ( y ) sin n T Y dy. b
(12.76)
Note that in this case the boundary conditions forces us to take X2 = -k:. = k i in Equation (12.51). Solution for the more general boundary conditions (Fig. 12.3) (12.77) (12.78) (12.79) (12.80) can now be written as the linear combination of the solutioris given in Equation (12.67) and (12.75) as
1c, sin a":sinh 00
~ ( zy ), =
n=l
[
b
1
+ F c n s i n y sinh y ( u - z) sinh y u n=l
'
where the coefficients are found as in Equations (12.70) and (12.76). Similarly, when all the boundary conditions are not homogeneous, the general solution is written as a superposition of all four cases.
12.1.3 Diffusion and Heat Flow Equations For the heat flow and diffusion problems, we need to solve the equation 1 aQ(T+,t)
TPQ(T+,t)- 2 Q
at
= 0,
(12.81)
which can be obtained from the general equation [Eq. (12.7)] by setting K=O,
a=0, andb=-
1
Q2.
(12.82)
551
SEPARATION OF VARIABLES IN CARTESIAN COORDINATES
'T
Figure 12.3
For more general boundary conditions.
Now the equations to be solved for a separable solution becomes 1 dT -+ k 2 T ( t )= 0,
a2 d t d2X(x) k 2 X ( Z ) = 0, dx2 d2Y(y) dY2
(12.83)
+
(12.84)
+ k i Y ( y ) = 0,
(12.85) (12.86)
dz2 In the last equation we have substituted
k 2 - k2 - k2 x y
= k2
2'
(12.87)
Solution of the first equation gives the time dependence as
T ( t )= T g e - k 2 a Z t ,
(12.88)
while the remaining equations have the solutions
+ + +
X ( x ) = a0 cos k x x a1 sin k,x, Y ( y ) = bo cos k,y bl sin k,y, Z ( z ) = co c o s k , ~ c1 sin k,z.
(12.89) (12.90) (12.91)
E x a m p l e 12.2. Heat transfer equation in a rectangular region: consider the one-dimensional problem
First
552
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
with the boundary conditions
(12.93) (12.94) (12.95) Using the time dependence given in Equation (12.88) and the Fourier series method of Example 12.1, we can write the general solution as
c 00
~ ( xt ), =
a, (sin E )e - ( n 2 r 2 a 2 / a 2 ) + U
n=l
(12.96)
where the boundary conditions determined k as k = nr/u,n = 1,2, . . . and the expansion coefficients are given as
a, =
(:)
L a f ( x ) s i n -n r x dx.
,
(12.97)
U
For the heat transfer equation in two dimensions,
d2Q(X,Y,t) dx2
+
d2Q(2,Y,t)- 1 dQ(X,Y,t) dY2 a 2 at '
(12.98)
the solution over the rectangular region
satisfying the boundary conditions
(12.100) (12.101) (12.102) (12.103) (12.104) can be written as the series
(12.105) The expansion coefficients are now obtained from the integral
):(
Am, =
I" 1"
1-
f ( x , y) [sin mrx [sin U
y]
dxdy.
(12.106)
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
12.2
553
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
In spherical coordinates, Equation (12.7) is written as
(12.107) where the ranges of the independent variables are (12.108) We first substitute a solution of the form
Q(?,t) = F(?)T(t)
(12.109)
and write Equation (12.107) as
(12.110) Multiplying the above equation by (12.111) and collecting the position dependence on the left-hand side and the time dependence on the right-hand side, we obtain
Since 7and t are independent variables, the only way to satisfy this equation for all ? and t is to set both sides equal t o the same constant, say - k 2 . Hence, we obtain the following two equations:
(12.113)
554
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
and (12.114)
where the equation for T ( t )is an ordinary differential equation. We continue separating variables by using the substitution
F ( 7 ) = R(T)Y(O,41,
(12.115)
Equation (12.113) as
Multiplying both sides by (12.117) we obtain
Since r and ( 0 , 4 ) are independent variables, this equation can only be satisfied for all r and (0,d) when both sides of the equation are equal to the same constant. We call t,his constant X and write
& (r 2:r- R ( r ) ) +
[ ( K + k 2 ) r 2 - A] R ( r ) = 0
( 12.119)
and
Equation (12.119) for R ( r )is now an ordinary differential equation. We finally separate the 0 and 4 variables in Y (B,4) as
y (0,4) = -0 (0)
(4)
(12.121)
and write sin0 d0
+ XO (0) ip (4)=
0 (0) d 2 @(4)' sin2e
---
(12.122)
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
555
Multiplying both sides by
(12.123) and calling the new separation constant m2,we write
We now obtain the differential equations to be solved for 0 (0) and @ ( 4 ) as
+ [Asin'
0 - m2]o (0) = o
(12.125)
and
(12.126)
In summary, via the method of separation of variables, in spherical coordinates we have reduced the partial differential equation
VZQ(T+,t) + K!P(T+,t)
=a
a2Q(T+,t) at2
+
,aQ(?,t) at
'
(12.127)
to four ordinary differential equations:
(12.128) (12.129)
+ [Xsin2e
-
m2]@Am (8)= 0 ,
( 12.130) (12.131)
which have to be solved simultaneously with the appropriate boundary conditions to yield the final solution as
During this process three separation constants, k , X and m, indicated as subscripts, have entered into our equations. For the time-independent cases we set a = b = k = 0 and
T ( t ) = 1.
(12.133)
556
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
For problems with azimuthal symmetry, where there is no take
4 dependence, we
m = 0, Q(4)= 1. For the
4 dependent solutions, we impose the periodic boundary condition
+ an)= a m ($1
@7rl(4
(12.134)
>
to write the general solution of Equation (12.131) as
( 4 )= no cos m4 + a1 sin m4, m = O,1,2, . . . .
(12.135)
Note that with applications to quantum mechanics in mind, the general solution can also be written as
( 4 ) = a0 ezm@+ al ePrn@.
(12.136)
(cosm4, s i n m 4 } , m = O , l , 2 , . . .
(12.137)
@71L
Since the set
is complete and orthogonal, an arbitrary solution satisfying the periodic boundary conditions can be expanded as w2
A, cos m4 + ,3€
Q (4) =
sin m4.
(12.138)
nt=O
This is basically the trigonometric Fourier series. We postpone a formal treatment of Fourier series to Chapter 13 and continue with Equation (12.130). Defining a new independent variable, namely 2
= cose, z E [-1,1],
we write Equation (12.130) as
For i n = 0, this reduces to the Legendre equation. If we impose the boundary condition that O ~ o ( z be ) finite over the entire interval including the end points, the separation constant X has to be restricted to integer values: X = l ( l + l ) , l = O , l , 2, . . . .
( 12.140)
Thus, the finite solutions of Equation (12.139) become the Legendre polynomials (Chapter 10): @lO(Z) =
9(z).
(12.141)
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
557
Since the Legendre polynomials form a complete and an orthogonal set, a general solution can be expressed in terms of the Legendre series as M
(12.142) 1=0
For the cases with m # 0, Equation (12.139) is called the associated Legendre equation, polynomial solutions of which are given as (12.143) For a solution with general angular dependence, Y ( 8 ,q5), we expand in terms of the combined complete and orthogonal set
ulnL= P r ( c o s 0) [A[,,cos m#
+ B I , sin m4 , I = 0,1, . . . , m = 0,1, . . . , I
A particular complete and orthogonal set constructed by using the 0 and q~! solutions is called the spherical harmonics. A detailed treatment of the associated Legendre polynomials and spherical harmonics is given in Bayin, hence we suffice by saying that the set {Plm(z),1 = 0, I , . . . , } is also complete and orthogonal. A general solution of Equation (12.139) can now be written as the series
c 00
O(8)=
C,Plrn(C0S8).
(12.144)
1=0
So far, nothing has been said about the parameters a, b, and K ; hence the solutions found for the q5 and 8 dependences, (4) and O(z), are usable for a large class of cases. To proceed with the remaining equations that determine the t and the T dependences [Eqs. (12.128) and (12.129)], we have to specify the values of these parameters, a , b, and K , where there are a number of cases that are predominantly encountered in applications. 12.2.1 Laplace Equation To obtain the Laplace equation,
a'Q(?")= 0,
(12.145)
we set K = a = b = 0 in Equation (12.7). Since there is no time dependence I; is also zero, hence the radial equation [Eq. (12.129)] along with Equation (12.140) becomes -
Z ( 1 + 1 ) R ( r )= 0
(12.146)
558
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
or
d2R r2dr2
+ 2r-ddRr - 1(1+
This is nothing but the Cauchy-Euler solution of which can be written as
R[ ( T ) = Carl
l ) R ( r ) = 0.
(12.147)
equation (Chapter 9), the general
+ c1-. 1
(12.148)
We can now write the general solution of the Laplace equation in spherical coordinates as o
o
l
r
.
l
where Almr Blm, a[,, and bl, are the expansion coefficients to be determined from the boundary conditions. In problems with azimuthal or axial symmetry, the solution does not depend on the variable 4 , hence we set m = 0, thus obtaining the series solution as (12.150)
12.2.2 Boundary Conditions for a Spherical Boundary Boundary conditions for Q ( T , ~ ) on a spherical boundary with radius a is usually given as one of the following three types: I. The Dirichlet boundary condition is defined by specifying the value of Q ( r , 0 )on the boundary, r = a , as Q ( a , 0) = f
(6
(12.151)
11. When the derivative is specified,
( 12.152) we have the Neumann boundary condition. 111. When the boundary condition is given as
( 12.153) where do could be a function of 0, it is called the general boundary condition.
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
For finite solutions inside a sphere, we set take the solution as
559
BI= 0 in Equation (12.150) and
co
@(T,O) = C A l r ' P l ( c o s 8 ) .
( 12.154)
1=0
For the Dirichlet condition
the remaining coefficients, Al, can be evaluated by using the orthogonality relation of the Legendre polynomials as Al =
2 i) (l+
l T f ( 0 ) P l ( c o s ( I ) s i n Ddo.
( 12.156)
Outside the spherical boundary and for finite solutions at infinity we set Al = 0 and take the solution as
( 12.157) Now the expansion coefficients are found from (12.158)
For a domain bounded by two concentric circles with radii a and b, both A1 and BI in Equation (12.150) are nonzero. For Dirichlet conditions
and
we now write
and (12.162)
560
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
Using the orthogonality relation of the Legendre polynomials, we obtain two linear equations,
Alal
+ Bl-&
=
(1
+
i) 1
7r
f l ( Q ) P l ( c o s 8sin8 ) dQ
(12.163)
and (12.164) which can be solved for Al and Bl. Solutions satisfying the Neumann boundary condition [Eq. (12.152)] or the general boundary conditions [Eq. (12.153)] are obtained similarly. For more general cases involving both angular variables, 8 and 4, the general solution [Eq. (12.149)] is given in terms of the associated Legendre polynomials. This time the Dirichlet condition for a spherical boundary is given as
and the coefficients, Al, and Ell,, in Equation (12.149) are evaluated by using the orthogonality relation of the new basis functions: ulnL(Q, 4) = Py"(cos8)[a~ cosrn4
+ a2 sinrn41.
(12.166)
Example 12.3. Potential of a point charge inside a sphere: Consider a hollow conducting sphere of radius a held at zero potential. We place a charge q at point A along the z-axis at r' as shown in Figure 12.4. Due to the linearity of the Laplace equation, we can write the potential, a(?),at a point inside the conductor as the sum of the potential of the point charge and the potential due to the induced charge on the conductor, Q(?), as
( 12.167) where
a(?,)
Due to axial symmetry has no 4 dependence, hence we write it as n(r,O). Since O ( r , Q )must vanish on the surface of the sphere, we have a Dirichlet problem. The boundary condition we have to satisfy is now written as -
4
da2+ r f 2- 2ar' cos Q
= Q ( u , 8).
(12.169)
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
Figure 12.4
561
Point charge inside a grounded conducting sphere.
Using the generating function definition of the Legendre polynomials:
we can write the left-hand side of Equation (12.169) as
a
(12.171) 1=0
r' Since - < 1, the above series is uniformly convergent. Using the LegU
endre expansion of Q(r,Q): 00
Q(T,
Q) =
C
Alr'fi(COSO),
(12.172)
1=0
we can also write the Dirichlet condition, !€'(a,O ) , as 00
(12.173) 1=0
562
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
Comparing the two expressions for Q ( a ,8) [Eqs. (12.171) and (12.173)] we obtain the expansion coefficients:
(5) 1
A1 which allows us to write
Q(T,
=
-2
a
,
(12.174)
19) [Eq. (12.172)] as
(12.175) Using the generating function definition of Pl [Eq. (12.170)],we rewrite Q(r?B)as
1 4 Q ( r , 0 )= -a J 1 - 2 ( 3 c o s B + ( ~ )2' Now the potential at
( 12.176)
7becomes
We rearrange this as
a
Q(7) =
(12.178)
4
If we call
p =1 7- 7 ' 1
( 12.179)
and introduce q', r" and p' such that g' = -9-
a r' '
a" r' '
+ T"2
= - pl = JT2
-
2rlrr.cos 0,
( 12.180)
we can also write Q(7) as
a(?) = -9 + -.4/ P
P'
(12.181)
Note that this is the result that one would get by using the image method, where an image charge, q', is located along the z axis at A' at, a distance of T" from the origin (Fig. 12.4).
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
12.2.3
563
Helmholtz Equation
In our general equation [Eq. (l2.7)] we set
a
=
b = 0,
K
=
( 12.182)
k o2 ,
to obtain the Helmholtz equation:
a'%(?")+ lC;*.(?"))
= 0.
(12.183)
Since there is no time dependence, in the separated equations [Eqs. (12.128)(12.131)] we also set k = 0 and T ( t )= 1. The radial part of the Helmholtz equation [Eq. (12.129)] becomes
+ [kgr2 or d2R 2 d R -+--+ dr2 r dr
-
[& - - -
1(1+ l)]R ( r )= 0
1(1+ l ) ]
r2
R(r)= 0.
( 12.184)
( 12.185)
The general solution of Equation (12.185) is given in terms of the spherical Bessel functions as
Rz(r) = coji(kor) + cini(kor).
(12.186)
Now, the general solution of the Helmholtz equation in spherical coordinates can be written as the series
cc a ? ,
Q ( Tt ),=
+
[AlmjL(kor) Bzmnz(kor)l P ; " ( C O S ~ ) cos(,4
+ fjzm),
I=O m = O
(12.187) where the coefficients Al,,, Bl,, 61, are to be determined from the boundary conditions. Including the important problem of diffraction of electromagnetic waves from the earth's surface, many of the problems of mathematical physics can be expressed as such superpositions. In problems involving steady state oscillations with time dependence described by ezwt or e-z"wt,nl(k0r) in Equation (12.187) is replaced by h ~ " ( k 0 r )or h,(1)( k o r ) ,respectively. 12.2.4
Wave Equation
In Equation (12.7) we set
( 12.188)
564
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
and obtain the wave equation
a'"(7,t)
1
=-Q(7,t),
(12.189)
V2
where v stands for the wave velocity. The time-dependent part of the solution,
T ( t ) ,is now determined by the equation [Eq. (12.128)] d2T -k2v2T(t), (12.190) dt2 the solution of which can be written in the following equivalent ways: -=
T ( t )= a0 cos wt + a1 sin wt = A0 cos(wt + A l ) , w = k ~ ,
(12.191) (12.192)
where ao, a1, AO, A1 are integration constants. The radial equation [Eq. (12.129)] is now written as
-+--+ d2R 2 d R dr2 T dr
[
k -~ l ( l T ; l ) ] R ( r ) = 0,
(12.193)
where the solution is given in terms of the spherical Bessel functions as
Rl(r) = coj1(kr)
+ c1n1(lcr).
(12.194)
We can now write the general solution of the wave equation [Eq. (12.189)] as 0
0
1
[Al,jl(kr)
Q ( 7 , t )=
+ Bl,nl(kr)]
Plm(cos8)cos(m4
+ Sl,)
cos(wt
+A),
1=0 m=O
( 12.195) where the coefficients Al,, ary conditions. 12.2.5
Bl,,, Sl,, Al are to be determined from the bound-
Diffusion and Heat Flow Equations
In Equation (12.7) if we set ~ = 0 b, f O , a = 0 ,
(12.196)
t) V2Q(7,t ) = b as(?, dt '
( 12.197)
we obtain
which is the governing equation for diffusion or heat flow phenomenon. Since k 2 # 0, using Equation (12.128) we write the differential equation to be solved for T ( t )as
bdi'o + k2T(t) = 0, dt
( 12.198)
565
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
which gives the time dependence as
T ( t )= C e - k 2 t / b ,
(12.199)
where C is an integration constant to be determined from the initial conditions. Radial dependence is determined by Equation (12.129), -d + 2 R- - +2 d R dr2 r dr
[
k 2 - - i ( i + l ) ] R ( r ) = 0,
r2
(12.200)
solutions of which are given in terms of the spherical Bessel functions as
Rl(r)
= Aoj l(k r)
+ Bonl(kr).
(12.201)
Now the general solution of the diffusion equation can be written as
1=0
m=O
(12.202) where the coefficients Al,, conditions.
12.2.6
Bl,,
61, are to be determined from the boundary
Time-Independent Schrodinger Equation
For a particle of mass m moving under the influence of a central potential, V ( r ), the time-independent Schrodinger equation is written as (12.203) where E stands for the energy of the system. To compare with Equation (12.7) we rewrite this as
2mE
2mV(r)
( 12.204)
In the general Equation [Eq. (12.7)] we now set
(12.205)
a=b=O. Using Equation (12.128) we also set k a function of r :
2m
K(T)
=0
and T ( t )= 1. Note that
= - [E - V ( r ) ;] fi2
K
is now
(12.206)
566
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
hence the radial equation [Eq. (12.129)] becomes
f( r z dR) + ($[ E 2
-
V(r)] - 1(1
+ 1)
(12.207)
For the Coulomb potential, solutions of this equation are given in terms of the associated Laguerre polynomials, which are closely related to the Laguerre polynomials (Bayin). 12.2.7
Time-Dependent Schrodinger Equation
For the central force problems the time-dependent Schrodinger equation is
+ V(r)Q(?;f,t) = ih d Qd( t7 , t )’
ti2 2m
--+$(7,t)
(12.208)
which can be rewritten as
?Q(?;f,t)
2mV(r)
-
___ Q ( 7 , t )= -2ti2
, 2 m as(F ,t ) h dt .
(12.209)
We now have K =
2mV(r)
-___ , a = Q a n d b = - - . ti2
2mi fi
(12.2 10)
The time-dependent part of the solution satisfies
2mi dT h dt
+ k 2 T = 0,
(12.211)
(12.212) We relate the separation constant k 2 with the energy, E , as
2mE
k =-.
h , ’
( 12.213)
hence T ( t )is written as
T ( t )= TOe-iEt/h.
(12.214)
The radial part of the Schrodinger equation [Eq. (12.129)] is now given as
f (r
2
z)+ ($ dR
[E - V(T)] - 1(1+ 1)
(12.215)
where 1 = 0, 1,.. . . Solutions of Equation (12.215) are given in terms of the associated Laguerre polynomials. Angular part of the solution, Q(?, t ) , comes from Equations (12.130) and (12.131), which can be expressed in terms of the spherical harmonics xm(8, 4).
SEPARATION OF VARIABLES IN CYLINDRICAL COORDINATES
567
12.3 SEPARATION OF VARIABLES IN CYLINDRICAL CO 0R DINATES
In cylindrical coordinates we can write the general equation,
d 2 Q ( 7 ’ t ) b a Q ( 7 t, ) PQ(7, t )+ K Q ( 7 , t ) = a at2 at ’ +
( 12.216)
as
(12.217) Separating the time variable as
Q ( 7 ’ t )= F ( 7 ) T ( t ) ,
(12.218)
we write
(12.219) Dividing by F ( f ) T ( t )gives us the separated equation
-
a-
+ b-1 dt
’
Setting both sides equal to the same constant,
-x2, we obtain (12.221)
and
We now separate the z variable by the substitution
568
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
to write
(12,224) Dividing by G(r,qb)Z(z) and setting both sides equal to the same constant, -A2, we get
d22(z) dz2
+
(K
- X"Z(z)
=0
(12.225)
and
(12,227) to write
Dividing by R ( r ) @ ( 4 ) / rand 2 setting both sides to the constant p2 gives us thc last two equations as
--% 1d
(r-$-) dR(r)
+ ( x 2 + x2
-
(12.229)
and (12.230)
In summary, we have reduced the partial differential equation [Eq. (12.216)] in cylindrical coordinates to the following ordinary differential equations: (12.231)
Ir d dr
(rF) d R ( r )
+ ( x 2 + X2
-
g)
R ( r )= 0 ,
(12.232) (12.233)
d2Z(z) dz2
_ _ _ -( A 2 - K ) Z ( Z ) = 0.
(12.234)
SEPARATION OF VARIABLES IN CYLINDRICAL COORDINATES
569
Combining the solutions of these equations with the appropriate boundary conditions, we write the general solution of Equation (12.216) as
Q(?,t) = T ( t ) R ( r ) @ ( 4 ) Z ( z ) .
(12.235)
When there is no time dependence, we set in Equations (12.231)-(12.234)
(L
= 0,
b = 0,
For azimuthal symmetry there is no
x = 0 , T ( t )= 1.
(12.236)
4 dependence, hence we set
@(4)= 1.
(12.237)
a=0, b=0,
(12.238)
p = 0,
12.3.1 Laplace Equation When we set K=O,
Equation (12.216) becomes the Laplacc equation:
PQ(?) = 0.
(12.239)
For a time-independent separable solution, namely
we also set x = 0 and T solved become
=
1 in Equation (12.231); hence the equations to be
(12.24 1) (12.242)
d2Z(z)
~-
dz2
X”(z)
=
0.
(12.243)
Solutions can be written, respectively, as
+ + +
R ( r ) = aoJp(Xr) alNp(Xr), @ ( 4 )= bo cos p$ bl sin p $ , Z ( Z )= co cosh Xz c1 sinh Xz.
(12.244) (12.245) (12.246)
570
12.3.2
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
Helmholtz Equation
In Equation (12.216), when we set
~ = 2k a ,= 0 , b = 0 ,
(12.247)
we obtain the Helmholtz equation,
a‘”(7) + I c 2 8 ( ? )
(12.248)
= 0,
which when a separable solutions of the form Q(7) = R(r)@(q6)Z(z) is substituted, leads t o the following differential equations: (12.249) (12.250) (12.251) In terms of the separation constants the solution is now written as
x
Note that in Equation (12.231) we have set = 0 and T = 1 for no time dependence. We can now write the solution of the radial equation as
Solution of Equation (12.250) gives the q6 dependence as
@(@)= bo cos pq5
+ bl sin &.
(12.254)
Finally, for the solution of the z equation [Eq. (12.25l)l we define 2 - k2 =
k20
sinh koz
} { x - kki > o } .
( 12.255)
to write the choices
Z ( z ) = co 12.3.3
{
cOskOz
cash koz
}+ { c1
for
-
(12.256)
Wave Equation
For the choices K.
= 0,
1 b = 0, and a = -, V2
( 12.257)
SEPARATION OF VARIABLES IN CYLINDRICAL COORDINATES
571
Equation (12.216) becomes the wave equation,
( 12.258) where v stands for the wave velocity. For a separable solution, namely
equations to be solved become [Eqs. (12.231)-(12.234)] 1 d2T(t) -X2T(t)= 0,
v2 dt2
+
(12.259) (12.260) (12.261) (12.262)
where x,A, and p are the separation constants. Solution of the time-dependent equation gives
T ( t )= a0 cos w t + a1 sinwt,
(12.263)
w = vx.
(12.264)
x2 + x 2 = m2,
(12.265)
where we have defined
Defining a new parameter, namely
the solution of the radial equation is immediately written in terms of Bessel functions as
The solution of Equation (12.261) is
@(4)= bo cos p4 + bl sin p$
(12.267)
and for the solution of the z equation we write
Z ( z ) = co cosh Xz + c1 sinh Xz.
(12.268)
572
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
12.3.4
Diffusion and Heat Flow Equations
For the diffusion and the heat flow Equations, in Equation (12.216) we set
n = 0 , a = 0 , andb#O
(12.269)
to obtain
a*(?, V2Q(?,t) =b
t)
(12.270)
at
we have to solve the following differential equations:
1d dR(r) --z (r7)
bdll‘o + x2T(t) = 0 , dt
+ (x2+ X2
-
,)R(r) P2 r
= 0,
(12.272) (12.273) (12.274)
d22(z) dz2
_ _ _ - X”(z)
= 0.
(12.275)
The time-dependent part can be solved immediately to yield
( 12.276)
T ( t )= Toe-X2t/b, while the remaining equations have the solutions
R ( r ) = aoJp(rnr)+ U l N , ( r n T ) , m2 = x2 @(4)= bo cos pq5 bl sin p 4 , Z ( z ) = co cosh Xz + c1 sinh Xz .
+
+ X2,
(12.277) (12.278) (12.279)
Example 12.4. Dirichlet problem f o r the Laplace Equation: Consider the following Dirichlet conditions for a cylindrical domain (Fig. 12.5):
( 12.280) (12.281) (12.282) for the Laplace equation we have
VQ(T, $ , z ) = 0.
(12.283)
SEPARATION OF VARIABLES IN CYLINDRICAL COORDINATES
Figure 12.5
573
Laplace equation with Dirichlet conditions.
This could be a problem where we find the temperature distribution inside a cylinder with the temperature distributions at its top and bottom surfaces are given as shown in Figure 12.5, and the side surface is held at 0 temperature. Since the boundary conditions are independent of 4,we search for axially symmetric separable solutions of the form
= 1 for no 4 dependence, we use Equations (12.244) Setting /I = 0, and (12.246) to write R(r) and Z ( z ) as
+
R ( r ) = uoJo(Xr) u1No(Xr), Z ( z ) = co cosh Xz + c1 sinh Xz.
(12.284) (12.285)
Since N,(s)+ 00 when r + 0, for physically meaningful solutions that are finite along the z-axis, we set a1 = 0 in Equation (12.284). Using the first boundary condition [Eq. (12.280)], we write &(Xu) = 0
(12.286)
and obtain the admissable values of X as (12.287)
574
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
where XO, are the zeros of the Bessel function Jo(x).Now a general solution can be written in terms of the complete and orthogonal set, {Q,(T,z)
=
I).?(
[co,cosh ( % z )
Jo
+cl,sinh
(Tr)} , (12.288)
n = 1 , 2 , . . . as
c 00
Q ( r , z )=
[n,cosh
n=l
(12.289) Using the remaining boundary conditions [Eqs. (12.281) and (12.282)], we also write 00
,
A,Jo ( %ar )
f o ( r )= Q(r,O) =
(12.290)
n=l
c 00
fl(r) =
Q ( r , l )=
+ B,sinh
[A, cosh ( % l )
n.=l
(12.291) Using the orthogonality relation,
La’
(Fr) (Tr)
~ J o
JO
dr
=
a’ 2 5 [JI (xo,)] , ,,S
(12.292)
we can evaluate the expansion coefficients, A , and B,, as 2J:rfo(r)J0
( F r )dr
A, =
(12.293) a2 [Ji(xon)I’
and
(-)
2 [ L a r f ~ ( r ) J 0X O n r dr - cosh
Bn
=
a a2 [ J1(Q,)]
(%) (%)
(-)
Larf0(r)Jo
dr]
a
sinh
(12.294)
Example 12.5. Another boundary condition for the Laplace equation: We now solve the Laplace equation with the following boundary conditions: (12.295) (12.296)
( 12.297)
SEPARATION OF VARIABLES IN CYLINDRICAL COORDINATES
575
= 1 in Equation Because of axial symmetry, we again take p = 0, (12.242). To satisfy the second condition, we set co = 0 in Equation (12.246):
Z ( z ) = co cosh Xz
+ c1 sinh Xz,
(12.298)
to write Z ( z ) as Z ( z ) = c1 sinhXz.
(12.299)
If we write sinh as sin with an imaginary argument, that is, sinh Xz = -i sin i X z ,
(12.300)
and use the third condition [Eq. (12.297)], the allowed values of X are found as n --
nm -, n = 1 , 2) . . . .
(12.301)
1
Now the solutions can be written in terms of the modified Bessel functions as
(12.302)
Z ( z ) = c1 sin
(3
Since
K~ we also set
a1
(7.)
---f
oo as r
0,
4
= 0, thus obtaining the complete and orthogonal set
{ Q n ( r , z )= a010 ( Y r ) sin ( Y z ) } , n = 1 , 2 , . . . .
(12.303)
We can now write a general solution as the series 03
z) =
C A , I ~(5) 1 sin I (“2)
.
(12.304)
n=l
Using the orthogonality relation (12.305)
576
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
we find the expansion coefficients as
A,,=
(f)
~ g ( ~ a ) ] - ' ~ F ( r ) s i n ( ~ z ) d z (12.306) .
Example 12.6. Periodic boundary conditions: Consider the L a p h e equation with the following periodic boundary conditions:
q(?-, 0, z ) = *(?-,27r, z ) ,
(12.307) (12.308)
and Q ( u ,4 , z ) = 0.
(12.309)
Using the first two conditions [Eqs. (12.307) and (12.308)] with Equation (12.245):
Q(4)= bo cos pq5 + bl sin p4, which we write as
where 60 and S,,, are constants, we obtain the allowed values of m as p = m = O , 1 , 2 ,... . For finite solutions along the z-axis, we set [Eq. (12.244)] to write
a1 =
(12.311)
0 in the radial solution
( 12.312)
R ( r ) = aoJ,(Ar). Imposing the final boundary condition [Eq. (12.309)]:
( 12.313)
& ( x u ) = 0,
we obtain the admissable values of X as 57nn
An=---,
a
n = 1 , 2 ,...,
(12.314)
where z,,~ are the roots of J,(z). Finally, for the z-dependent solution we use Equation (12.246) and take the basis functions as the set
{ 'Psn,, = J,,
(y )
+
[cos(m$ S,)]
[cg
Zmnz
+
xmrLzl
cosh - c1 sinh ' a a (12.315)
SEPARATION OF VARIABLES IN CYLINDRICAL COORDINATES
where m = 0 , 1 , . . . and n soliition as the series
=
J,,, Ill
I1
=
1 , 2 , .. .
(y) cos(mq5 + 6 ) ,
577
. We can now write a general
[Amncosh -+ Bmnsinh a a Xmnz
(12.316)
Example 12.7. Cooling of a long circular cylinder: Consider a long circular cylinder with radius a , initially heated to a uniform temperature T I ,while its surface is maintained at a constant temperature To. Assume the length to be so large that the z-dependence of the temperature can he ignored. Since we basically have a two-dimensional problem, using the cylindrical coordinates, we write the heat transfer equation as
d Q ( r ,t ) - d 2 9 ( r ,t ) b-at dr
+ -r1-d 9dr( r ,t )'
( 12.317)
where b is a constant depending on the physical parameters of the system. We take the boundary condition a t the surface as
Q ( a , t ) = TO, 0 < t < 00,
(12.318)
while the initial condition at t = 0 is Q(T,O) =TI, 0
5 r < a.
( 12.319)
We can work with the homogeneous boundary condition by defining a new dependent variable as
R(r, t ) = Q ( T , t ) - TO,
(12.320)
where O ( r ,t ) satisfies the differential equation
b-
dR(r,t ) - d2R(r, t ) at dr
t) + -r1-dR(r, ar
(12.321)
with the boundary conditions
f2(a,t)= 0, 0 < t < 00
(12.322)
and
R(r,O)
= TI - To,
0 5 r < a.
(12.323)
We need a finite solution for 0 5 r < a and one that satisfies R(r, t ) as t 00. Substituting a separable solution of the form
----f
0
--f
O ( r ,t ) = R(r)T(t),
(12.324)
578
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
we obtain the differential equations to be solved for R(r) and T ( t )as
dT dt
(12.325)
dR [7] + X2R = 0,
(12.326)
b-++2T=O
and
1d
--$
where x is the separation constant. Note that Equations (12.325) and (12.326) can be obtained by first choosing a = 0 arid K = 0 in Equation (2.216) and then from Equations (12.231)-(12.234) with the choices p = 0, Q, = 1 and X = 0, Z ( z ) = 1. Solution of the time-dependent equation [Eq. (1.325)] can be written immediately as
T ( t )= Ce-X2t/b,
(12.327)
while the solution of the radial equation [Eq. (12.266)] is
R ( r ) = aoJo(xr)+ U l N O ( X T ) . Since No(xr) diverges as
T
o(T,
---f
0, we set
a1
( 12.328)
to 0, thus obtaining
t ) = uoJo(Xr)e-X”’b.
(12.329)
To satisfy the condition in Equation (12.322), we write
J o ( x a ) = 0, which gives the allowed values of Xn
x
(12.330)
as the zeros of J o ( z ) :
Xon
= -, n = 1 , 2 ) . . . U
.
(12.331)
Now the solution becomes
(”””.)
~ ~ t() =r A, ~ J ~ e-zgntlab.
(12.332)
U
Since these solutions form a complete and orthogonal set, we can write a general solution as the series
Since On(r,t ) satisfies all the conditions except Equation (12.323), their linear combination will also satisfy the same conditions. To satisfy the remaining condition [Eq. (12.323)], we write (12.334)
SEPARATION OF VARIABLES IN CYLINDRICAL COORDINATES
579
We now use the orthogonality relation:
along with the recursion relation Z"J,-l(Z)
d dx
= -[Z"Jm,(Z)]
(12.336)
and the special value J l ( 0 ) = 0, to write the expansion coefficients as (12.337) Ex a m p l e 12.8. Symmetric vibrations of a circular drumhead: Consider a circular membrane fixed at its rim and oscillating freely. For oscillations symmetric about the origin we have only r dependence. Hence we write the wave equation [Eq. (12.258)] in cylindrical coordinates as (12.338) where Q(r,t ) represents the vertical displacement of the membrane from its equilibrium position. For a separable solution, q ( r ,t ) = R ( r ) T ( t ) , the corresponding equations to be solved are (12.339) (12.340) where x is the separation constant. These equations can again be obtained from our basic equation [Eq. (12.216)] with the substitution ti = 0, b = 0, a = l / v 2 and X = 0, 2 = 1, p = 0, = 1 in Equations (12.259)-(12.262). The time-dependent equation can be solved immediately as
T ( t )= 60cos(wt + SI),
( 12.341)
where we have defined w 2 = x2v2. The general solution of Equation (12.340) is (12.342) We again set
a1 =
0 for regular solutions at the origin, which leads to
Q(r,t ) = AoJo ( t r ) cos(wt + 61). U
(12.343)
580
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
Since the membrane is fixed a t its rim, we write W
* ( a , t ) = JO (,r)
= 0.
( 12.344)
This gives the allowed frequencies as
w,
u n = 1,2,.. . , a
= Qn-,
(12.345)
where zon are the zeros of Jo(z). We now have the complete set of functions V
Q n ( r , t )= JO ( % r ) cos (zon-t a
+ 6,)
, n = 1 , 2 , .. . ,
(12.346)
which can be used to write a general solution as
C An& (a3. cos) (xon-t a" + 6,1 . 03
@(T, t ) =
(12.347)
n=l
Expansion coefficients A, and the phases 6, come from the initial conditions,
(12.349) as
6, = tan-' -,Yon
Xon
(12.350) (12.351)
(12.352) (12.353)
PROBLEMS
1. Solve the two dimensional Laplace equation in Cartesian coordinates,
PROBLEMS
Figure 12.6
581
Boundary conditions for the Problem 12.2.
inside a rectangular region, boundary conditions:
LC
E [O,u]
and y E [O,b],with the following
2. Solve the Laplace equation in Cartesian coordinates,
inside a rectangular region, boundary conditions:
LC
E [0, u] and y E [O, b ] , with the following
where fo and f i are constants (Fig. 12.6).
582
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
3 . In Example 12.1 show that the Laplace equation with the boundary conditions Q ( x ,0) = 0, 9 ( z ,b) = 0, Q‘(0,Y) = Q l ( % Y ) = 0,
leads to the solution
c 03
~ ( zy), =
C, [sin
n=l
I
?I [
sinh ?(a - x) sinh ?a
’
where
4. Under what conditions will the Helmholtz equation:
+
PQ(?;t) rC”?;t)S(?;t) = 0,
be separable in Cartesian coordinates.
5. Toroidal coordinates (a,p, 4)are defined as X=
c sinh a cos 4 coshcr - c o s p ’
c sinh a sin q5 csinp z= = cosha - cosp’ cosha - c o s p ’
where cy E [0, oo),P E ( - T , 7 r ] , 4 E ( - T , 7r] and the scale factor c is positive definite. Toroidal coordinates are useful in solving problems with a torous as the bounding surface or domains bounded by two intersecting spheres (see Lebedev for a discussion of various coordinate systems available). (i) Show that the Laplace equation a‘”(W
P, 4) = 0
in toroidal coordinates is given as d
sinhcr
dQ
da [ c o s h a - c o s / 3 ~ ]
+
(cosh CY
d [ c o s sinh ha-cosPdfl +-dp 01
1 d29 = 0. - cos p ) sinh a &h2
(ii) Show that as it stands this equation is not separable.
PROBLEMS
583
(iii) However, show that with the substitution
the resulting equation is separable as O(Q, P , 4) = A ( a I B ( P M 4 )
and find the corresponding ordinary differential equations for A ( Q ) , W P ) and C(4).
6. Using your result in Problem 12.5, find separable solutions of the heat flow cquation in toroidal coordinates.
7. Consider a cylinder of length 1 and radius u whose ends are kept at temperature zero. Find the steady-state distribution of temperature inside the sphere when the rest of the surface is maintained at temperature To. 8. Find the electrostatic potential inside a closed cylindrical conductor of length 1 and radius a , with the bottom and the lateral surfaces held a t potential V and the top surface held at zero potential. The top surface is separated by a thin insulator from the rest of the cylinder. 9. Show that the stationary distribution of temperature in the upper halfspace, z > 0, satisfying the boundary condition
T ( z ,y, 0) = F ( T ) =
TO, T < a , 0, r > a,
is given as
e-’”Jo(Xr)JI(Xu)dX. Hint: Use the relation 1 r
-S(T
Can you derive it?
-T
)
‘
=
XJ,(Xr)J,(Xr’)dX.
This Page Intentionally Left Blank
CHAPTER 13
FOURIER SERIES
In 1807 Fourier announced in a seminal paper that a large class of functions can be written as linear combinations of sines and cosines. Today, infinite series representation of functions in terms of sinosoidal functions is called the Fourier series, which has become an indispensable tool in signal analysis. Spectroscopy is the branch of science that deals with the analysis of a given signal in terms of its components. Image processing and data compression are among other important areas of application for Fourier series.
13.1 ORTHOGONAL SYSTEMS OF FUNCTIONS After the introduction of Fourier series, it became clear that they are only a part of a much more general branch of mathematics called the theory of orthogonal functions. Legendre polynomials, Hermite polynomials, and Bessel functions are among the other commonly used orthogonal function sets. Certain features of this theory are incredibly similar to geometric vectors, where in n dimensions a given vector can be written as a linear combination of n linearly independent basis vectors. In the theory of orthogonal functions, we can express almost any arbitrary function as the linear combination of a Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
3. Selsuk Bayin 585
586
FOURIER SERIES
set of basis functions. Many of the tools used in the study of ordinary vectors have counterparts in the theory of orthogonal functions. Among the most important ones is the definition of inner product, which is the analog of scalar or dot product for ordinary vectors. Definition 13.1. I f f and g are two complex-valued functions, both (Riemann) integrable in the interval [a,b],their inner product is defined as the integral
(13.1) For real-valued functions the complex conjugate becomes redundant. From this definition, it follows that the inner product satisfies the properties
(13.2) (13.3) (13.4) (13.5) (13.6) where c is a complex number. The nonnegative number, (f,f)'I2, is called the norm o f f . It is usually denoted by l l f l l and it is the analog of the magnitude of a vector. The following inequalities follow directly from the properties of inner product: Cauchy-Schwarz inequality:
l(f,s)l 5 llfll 11911
'
(13.7)
Minkowski inequality:
llf + 911 I llfll + 11g11 ,
(13.8)
which is the analog of the triangle inequality. Definition 13.2. Let S = {uo,u1,. . . } be a set of integrable functions in the interval [a,b]. If
(u,,un) = 0 for all m # n,
(13.9)
then the set S is called orthogonal. Furthermore, when the norm of each element of S is normalized to unity, IIunI)= 1, we have (urn,un) = &nn
( 13.10)
and the set S is called orthonormal. We have seen that Legendre polynomials, Hermite polynomials and Bessel functions are orthogonal sets. As the reader can verify, the set einx
{un(x)=
I
, n = 0 , 1 , . . . , xE[0,27r]
(13.11)
ORTHOGONAL SYSTEMS OF FUNCTIONS
587
is one of the most important orthonormal sets in applications. Linear independence of function sets is defined similar to ordinary vectors. A set of functions,
S = { U O , ~ I , . . .,un},
( 13.12)
is called linearly independent in the interval [a,b] if the equation
couo
+
ClUl
+ '.
'
cnu,
= 0,
(13.13)
where co, c1, . . . ,c, are in general complex numbers, cannot be satisfied unless all ci are zero:
co = c l
=
. . . = c,
= 0.
(13.14)
An infinite set, S = {uo, u1,. . . }, is called linearly independent in [a,b], if every finite subset of S is linearly independent. It is clear that every orthogonal/orthonormal set is linearly independent. All the similarities with vectors suggest that an arbitrary function may be expressed as the linear combination of the elements of an orthonormal set. An expansion of this sort will naturally look like M ~~
f(x) =
C cnun(x), x E [a,b].
(13.15)
n=O
There are basically two important questions t o be addressed: first, how to find the expansion coefficients and, second, will the series C,"==, c,u, converge to f(x)? Finding the coefficients, at least formally, is possible. Using the orthonormality relation, J u & u n dx = S,, we can find c, as M
(13.16) m=O
(13.17) - c,.
(13.18)
In other words, the coefficients of expansion are found as
=la pb
c,
u:f dx.
( 13.19)
As far as the convergence of the series C,"=,c,u, is concerned, the following questions come to mind: Does it converge uniformly? Does it converge only a t certain points of the interval [a,b]? Does it converge pointwise, that is, for all the points of the interval [a,b]? Uniform convergence implies the integrability of f(x) and justifies the steps leading t o Equation (13.19). In other words,
588
FOURIER SERIES
when the series Czzocnu, converges uniformly to a function f ( z ) , then that function is integrable and the coefficients, en, are found as in Equation (13.19). Note that uniform convergence implies pointwise convergence but not vice versa. For an orthonormal set, S = ( U O , U ~ ,.. . } , defined over the interval [u,b], an integrable fiinction can be written as the series
which is called the generalized Fourier series of f ( z ) with respect to the set S. A general discussion of the convergence of these series is beyond the scope of this book, but for majority of the physically interesting problems they converge (Bayin). However, for the (trigonometric) Fourier series we address this question in detail. In the meantime, to obtain further justification of the series representation in Equation (13.20), consider the partial sum
(13.21)
and the finite sum
where
bk
are arbitrary complex numbers. We now write the expression
ORTHOGONAL SYSTEMS OF FUNCTIONS
589
Using the properties of inner product; along with the orthogonality relation [Eq. (13.10)], we can write
c n
bkUk(Z),
( 13.25)
bzuz(z)
Z=O
( 13.26) k=O 1=O
n
=C l b k l
2
,
(13.27)
k=O
(13.28) n
(13.29) k=O n
=
C
(13.30)
bkCE
k=O
and (tn,
f) = ( f ,i n ) * =
c n
(13.31)
b2k.
k=O
Using Equations (13.27), (13.30),and (13.31), Equation (13.24) can be written as n
n
n
k=O
k=O
k=O
k=O n
k=O n
k=O
k=O
(13.34) Since the right-hand side [Eq. (13.34)] is smallest when write
lbIf
b -
SnI2 dz 5
If
- G I 2 dz.
bk
= C k , we can also
(13.35)
Sigriificancc of these results beconies clear if we notice that each linear combination, t,,, can be thought of as an approximation of the function f ( z ) . The intcgral on the left-hand side of Equation (13.35) can now be interpreted as
590
FOURIER SERIES
the mean square error of this approximation. The inequality in Equation (13.35) states that among all possible approximations, tn = b k u k , of f(z), the nth partial sum of the generalized Fourier series represents the best approximation in terms of the mean square error. Using Equation (13.34), we can obtain two more useful expressions. If we set c k = b k and notice that the right-hand side is always positive, we can write
c;=,
(13.36) Ja
k=O
which is called the Bessel's inequality. With the substitution Equation (13.34) also implies
ck = bk,
( 13.37) (13.38) Ja
Since as n
+c m
we have
Ilf
- SnI(
----f
k=O
0 , we obtain (13.39)
k=O
which is known as the Parseval's formula. We conclude this section with the definition of completeness: Definition 13.3. Given an orthonormal set of integrable functions defined over the interval [a,b] :
s = {uo,u1,. . . } .
(13.40)
In the expansion (13.41) n=O
if the limit (13.42) is true for every integrable f, then the set S is called complete and we say the series C ~ = o c n u n ( converges x) in the mean to f(z). Convergence in the mean is not as strong as uniform or pointwise convergence, but for most practical purposes it is sufficient. From Bessel's inequality [Eq. (13.36)] it is seen that for absolutely inb tegrable functions, that is, when the integral Ifldx exists, the series
sa
FOURIER SERIES
Cr=oIck(2 to 0 as n
591
also converges, hence the n t h term of this series necessarily goes In particular, if we use the complete set
+ 03.
s = {u7%= e i n z } , n = 0 , 1 , 2 , . . . , x E [ o , ~ T ] ,
(13.43)
we obtain the limit r271
(13.44) which can be used to deduce the following useful formulas: lim
.I,
lim
i2*
Pa"
n-cc
n+m
f ( x ) cosnx dx f ( x ) sinnx dx
= 0,
(13.45)
= 0.
(13.46)
This result can be generalized as the Riemann-Lebesgue Lemma for absolutely integrable functions (Apostol p. 471) as
2%Jd
b
f ( x ) sin(ax
+ p) dx = 0,
(13.47)
which holds for all real CY and p, and where the lower limit, a , could be --oo and the upper limit, b, could be 00.
13.2
FOURIER SERIES
The term Fourier series usually refers to series expansions in terms of the orthogonal system
S
= { u O , U ~ ~ - ~ , U ~ ~n} = ,
1 , 2 , .. . , x E
[-7r,~],
(13.48)
where (13.49) which satisfy the orthogonality relations
lT ; 1, ; 1"
1 " ; cos n x cos m x dx = , , ,a
1
"
1
"
sin n x sin m x dx
=,,S
sin n x cos m x dx = 0.
(13.50)
,
(13.51) (13.52)
592
FOURIER SERIES
Since the basis functions are periodic with the period 27r, their linear combinations are also periodic with the same period. We shall see that every periodic function satisfying certain smoothness conditions can be expressed as the Fourier series (13.53) Using the orthogonality relations, we can evaluate the expansion coefficients as
1
ulL=
f ( t )cosnt d t , n = 0 , 1 , 2 , . . .
/T
7r
,
(13.54)
-7r
(13.55) Substituting the coefficients [Eqs. (13.54) and (13.55)] back into Equation (13.53) and using trigonometric identities, we can write the Fourier series in more compact form as
& 1;
f ( t ) dt
f ( ~= )
1 "
+ -7r
f ( t )c o s n ( ~ - t ) dt.
,=I
(13.56)
--?r
Note that the first term uo/2 is the mean value of f ( x ) in the interval
[-7r,
7r].
13.3 EXPONENTIAL FORM OF T H E FOURIER SERIES Using the relations
(13.57)
we can express the Fourier series in exponential form as 03
( 13.58) where 1
c, = -(un 2
-
1 2
&), c; = -(u,
+ it),),
co
= ao.
(13.59)
Since c; = c - ~ we , can also write this in compact form as 00
(13.60) n=--co
CONVERGENCE OF FOURIER SERIES
13.4
593
CONVERGENCE OF FOURIER SERIES
We now turn t o the convergence problem of Fourier series. The whole theory can be built on two fundamental formulas. Let us now write the partial sum
(13.61) Substituting the definitions of the coefficients (13.55)]into the above equation, we obtain
+
n
(cos k t cos k x
ak
and
bk
[Eqs. (13.54) and
+ sin kt sin k x )
( 13.62)
k=l
(13.63) k=l
(13.64) where we introduced the function 1
Dn(t) = 5
+ CCOS kt.
(13.65)
k=l
Since both f ( t ) and D n ( t ) are periodic functions with the period 27r, after a variable change, t - x -+ t , we can write
Sn(x)= 1 7T
-"-x
f(x
+ t ) D n ( t )dt
(13.66) (13.67)
Using the fact that D n ( - t ) = D n ( t ) , this can also be written as
(13.68) Using the trigonometric relation (see Prob. 13.6):
t 2 (13.69)
594
FOURIER SERIES
we can also write o n ( % ) as
D,(2t) =
sin(2n + 1)t , 2 sin t
I (n+%
t # mr, m is an integer, (13.70) t
= mr, m
is an integer,
thus obtaining the partial sum as
which is called the integral representation of Fourier series. It basically says that the Fourier series written for f(z) converges at the point z, if and only if the following limit exists:
f(z + at) + f(x - at) sin t
2
]
dt.
(13.72)
In the case that this limit exists, it is equal to the sum of the Fourier series. We now write the Dirichlet integral: lim
n-cc
1
2 'g(tlt sinnt d t = g(0'), i~
(13.73)
which Jordan has shown to be true when g ( t ) is of bounded variation. That is, when we move along the z-axis the change in g ( x ) is finite (Apostol p. 473). Basically, the Dirichlet integral says that the value of the integral depends entirely on the local behavior of the function g ( t ) near 0. Using the Riemann-Lebesgue lemma, granted that g ( t ) / t is absolutely integrable in the interval [€,&I, 0 < E < 6, we can replace t in the denominator of the Dirichlet integral with another function, like sin t , that has the same behavior near 0 without affecting the result. In the light of these, since the function
(13.74) is continuous at t = 0 , we can write the integral
Now, the convergence problem of the Fourier series reduces to finding the conditions on f ( z ) which guarantee the existence of the limit [Eq. (13.72)]
f ( z + at) + f(x ~ [ ~ '2 ~
-
lim
2
114m 7(-
2t)
dt.
(13.76)
SUFFICIENT CONDITIONS FOR CONVERGENCE
595
Employing the Riemann-Lebesgue lemma one more time, we can replace the upper limit of this integral with 6, where S is any positive number less than 7r/2. This result, which follows from the fact that the limit
lini
11-30
2 T
lTl2 [ + f(z
at) + f ( z - 2 t ) 2
dt
+0
(13.77)
is true, is quoted as the Riemann localization theorem: Theorem 13.1. R i e m a n n localization theorem: Assume that f (z) is absolutely integrable in the interval [0, 27r] and periodic with the period 27r. Then, the Fourier series produced by f ( z ) converges for a given z, if and only if the following limit exists:
lim
1c-cc
5
2 1[ 7r
f(z
+ at) + f ( x - 2 t )
dt
2
1
(13.78)
where S < is a positive number. When the limit exists, it is equal to the sum of the Fourier series produced by f ( z ) . lmportance of this result lies in the fact that the convergence of a Fourier series at a given point is determined by the behavior of f(x) in the neighborhood of that point. This is surprising, since the coefficients [Eqs. (13.54) and (13.55)] are determined through integrals over the entire interval.
13.5
SUFFICIENT CONDITIONS FOR CONVERGENCE
We now present a theorem due to Jordan, which gives the sufficient conditions for the convergence of a Fourier series at a point (Apostol pg. 478). Theorem 13.2. Let f ( z ) be absolutely integrable in the interval ( 0 , 2 ~ ) with the period 27r and consider the interval [z - 6,z 61 centered at z in which f(x) is of bounded variation. Then, the Fourier series generated by f ( r )converges for this value of z to the sum
+
f(.+) + f(z-1 2
(13.79)
Furthermore, if f ( z ) is continuous a t z, then the series converges to f ( z ) . Proof of this theorem is based on showing that the limit in the Riemann localization theorem: (13.80) exists for (13.81)
596
FOURIER SERIES
and equals g(Of). This theorem is about the convergence of Fourier series at a given point. However, it says nothing about uniform convergence. For this, we present the so called Fundamental theorem. We first define the concepts of piecewise continuous, smooth and very smooth functions. A function defined in the closed interval [u,b] is piecewise continuous if the interval can be subdivided into a finite number of subintervals, where in each of these intervals the function is continuous and has finite limits at both ends of the interval. Furthermore, if the function, f ( z ) ,coincides with a continuous function, f i ( z ) , in the ith subinterval and if f i ( z ) has continuous first derivatives, then we say that f ( z ) is piecewise smooth. If, in addition, the function fi(z)has continuous second derivatives, we say f(z)is piecewise very smooth.
13.6
T H E FUNDAMENTAL T H E O R E M
Theorem 13.3. Let f ( z )be a piecewise very smooth function in the interval [ - 7 r , 7r] with the period 27r, then the Fourier series
- 5+ C 00
j(z)
a0
(a,cosnz + b, s i n n z ) ,
(13.82)
n=l
where the coefficients are given as
a, = 7r
b,
=
r
1 7r
f(z)cosnz, n = 0,1,. .. ,
(13.83)
f(z)sinnz, n = 1 , 2 , . . .
(13.84)
-7r
-7r
converges uniformly t o f(z)in every closed interval where f ( z ) is continuous. At each point of discontinuity, zl, inside the interval [ - ~ , 7 r ] , Fourier series converges to
and at the end points
17:
=b r to
1 [ lim 2 x-i?r-
1
f ( z ) + 2 lim - * 7 rf(z) + .
(13.86)
For most practical situations the requirement can be weakened from very smooth to smooth. For the proof of this theorem we refer the reader to Kaplan (p. 490). We remind the reader that all that is required for the convergence of a Fourier series is the piecewise continuity of the first and the second derivatives of the function. This result is remarkable in itself, since for the convergence of Taylor series, derivatives of all orders have to exist and the remainder term has to go t o zero.
597
UNIQUENESS OF FOURIER SERIES
13.7
UNIQUENESS OF FOURIER SERIES
Theorem 13.4. Let f(x) and g(x) be two piecewise continuous functions in the interval [ - T , 7r] with the same Fourier coefficients, that is,
1J"-" f(x)cosnx = -= 7r
J"-"
( 13.87)
g ( x )cosnx,
(13.88) Then f(x) = g(x),except perhaps a t the points of discontinuity. Proof of the uniqueness theorem follows at once, if we define a new piecewise continuous function as
and write the Fourier expansion of h(x)and use Equations (13.87) and (13.88). 13.8
EXAMPLES OF FOURIER SERIES
13.8.1 Square Wave
A square wave is defined as the periodic extension of the function (Fig. 13.1) -1,
-7r b
(9)
t sin bt
2bs/(s2
(10)
t cosbt
+ b2)', s > 0 ( s 2- b 2 ) / ( s 2+ b 2 ) 2 ,s > 0
(11)
6(t - a )
e-a s , a 2 0
>0 > 0 , n > -1
+b2), s > 0 + b2), s > 0 + a)n+', n > -1,
sf a
>0
>b
Inverse of a Laplace transform is shown as L-', which is also a linear operator:
L ? { X ( S ) + Y ( s ) }= L - l { X ( s ) } + L + { Y ( S ) } , L - ' ( U X ( S ) } = U L - ' { X ( S ) } , a is a constant.
(14.80) (14.81)
The above table can also be used to write inverse Laplace transforms. For example, using the first entry, we can write the inverse transform
L-l{
;}
= 1.
(14.82)
Two useful properties of Laplace transforms are given as
c > 0, t > c
L { I c (~ c ) } = e-''X(s),
(14.83)
and
L{eb"(t)} = X ( s - b ) , where more such relations can be found in Bayin. The convolution, is defined as
z ( t )=
Jc'
z(t')y(t
-
t') dt'
(14.84) IC
*y = z, (14.85)
624
FOURIER AND LAPLACE TRANSFORMS
It can be shown that the convolution of two functions, z(t) and y ( t ) , is the inverse Laplace transform of the product of their Laplace transforms: z
* y = X-l{x(s)Y(s)}.
(14.86)
In most cases, by using the above properties along with the linearity of the Laplace transforms, the needed inverse can be generated from a list of elementary transforms.
Example 14.1. I n v e r s e Laplace t r a n s f o r m s : Let us find the inverses of the following Laplace transforms: S
xl(s) = ( s + 1)(s + 3 ) '
(14.87) (14.88) (14.89)
Using partial fractions (Bayin), we can write X l ( s ) as S
(14.90)
Using the linearity of i?' and the third entry in the table we obtain
1 - -- ,-t 2
3 + -e-3t. 2
(14.91)
For the second inverse, we complete the square to write 1 X-l { X z ( s ) }= X-l { s 2 + 2 s + 3 } = x-1 = 1-1
{ {
s2
+ 2sl+ 1+ 2 l
( s + 1 l) 2 + 2 l .
(14.92)
We now use the fourth entry in the table t o write (14.93)
DIFFERENTIAL EQUATIONS AND LAPLACE TRANSFORMS
625
and employ the inverse of the property in Equation (14.84):
L-l { X ( s - b ) } = e b t z ( t ) ,
(14.94)
t.o obt,ain (14.95) For the third inverse, we use the property in Equation (14.83) to write the inverse:
L-’ ( e - “ ” X < s ) }= z(t - c ) , c > 0, t > c,
(14.96)
along with S
cosht,
(14.97)
Lpl { X , ( t ) } = cosh(t - c ) .
(14.98)
=
thus obtaining the desired inverse as
14.11 DIFFERENTIAL EQUATIONS A N D LAPLACE TRANSFORMS An important application of the Laplace transforms is t o ordinary differential equations with constant coefficients. We first write the Laplace transform of a derivative as (14.99) which, after integration by parts, becomes
Assuming that s > 0 and the limit limtioo z ( t ) e p s t
---f
0 is true, we obtain (14.101)
where s ( 0 + ) means the origin is approached from the positive t-axis. Similarly, wc find (14.102) =S2X(S)
-
SX(O+) - X’(O+),
(14.103)
626
FOURIER AND LAPLACE TRANSFORMS
where we have assumed that all the surface terms vanish in the limit as t + 03 and s > 0. Under similar conditions, for the n t h derivative we can write
Example 14.2. Solution of differential equations: Consider the following ordinary differential equation with constant coefficients and with a nonhomogeneous term:
d2x
dx
+ 2-dt dt2
+ 4x(t) = sin%,
( 14.105)
where the initial conditions are given as
x(0) = 0 and x’(0) = 0.
(14.106)
Assuming that Laplace transform of the solution exists, X ( s ) = L{x(t)}, and using the fact that L is a linear operator, we write
L
[ S 2 X ( S )-
sx(0) - x’(O)]
{
d2x
+ 2-dx + 4z(t) dt
I
=
L{sin2t),
( 14.107)
z + 2 [ s X ( s )- z(0)] + 4 X (s ) = s2+4‘
(14.109) By imposing the boundary conditions [Eq. (14.106)] we obtain the Laplace transform of the solution as S2X(S)
2 + 2 s X ( s ) + 4X(S) = 52 + 4’
( 14.110) n
X ( s )=
L
(s2+2s+4)(s2+4)’
(14.111)
To find the solution, we now have to find the inverse transform
Z(t) = L-I
{
2
(s2
+ 2s + 4) ( s 2 + 4)
I
(14.112)
TRANSFER FUNCTIONS AND SIGNAL PROCESSORS
627
Using partial fractions we can write this as
(14.113)
+ +
(9 2s 4) (s2
(14.114)
+ 2s + 4)
S
+ 4) + ' p i s + l }. 4 (s + +3 s2
(s
+ +3 1)2
(14.115)
1)2
Using the forth and the fifth entries in the table and Equation (15.84) we obtain the solution as
z ( t ) = --cos2t 41
+e;t
(
sn iJ?
(14.116)
14.12 TRANSFER F U N C T I O N S A N D SIGNAL PROCESSORS There are extensive applications of Laplace transforms to signal processing, control theory, and communications. Here we consider only some of the basic applications, which require the introduction of the transfer function. We now introduce a signal processor as a general device, which for a given input signal, u ( t ) ,produces an output signal z ( t ) .For electromagnetic signals the internal structure of a signal processor is composed of electronic circuits. The effect of the device on the input signal can be represented by a differential operator, which we take to be a linear ordinary differential operator with constant coefficients, say
d O=a--1,
a>0.
dt
(14.117)
The role of 0 is to relate the input signal, u ( t ) ,to the output signal, z ( t ) ,as
O z ( t )= u ( t ) ,
(14.118)
ad z ( t )+ z ( t ) = u ( t ) . dt
(14.119)
Taking the Laplace transform of this equation, we obtain
+
a s X ( s )- a z ( 0 ) X ( s ) = U ( s )
(14.120)
628
FOURIER AND LAPLACE TRANSFORMS
Figure 14.10 A single signal processor.
Since there is no signal out when there is no signal in, we take the initial conditions as
x ( 0 ) = 0 when u(0)= 0.
(14.121)
Hence, we write
(as
+ 1)X(s)= U ( s ) .
(14.122)
X(S) 1 U ( s ) as 1
( 14.123)
The function defined as
G ( s )= -- -
+
is called the transfer function. A general linear signal processor, G ( s ) , allows us to obtain the Laplace transform of the output signal, X ( s ) , from the Laplace transform, U ( s ) ,of the input signal as
X(S) = G ( s ) U ( s ) .
(14.124)
A single component signal processor can be shown as in Figure 14.10. For the signal processor represented by 1 (14.125) G ( s )= 1+as’ consider a sinosoidal input as u ( t )= sinwt.
( 14.126)
Since the Laplace transform of u ( t ) is
U ( s )=
W ~
52
+ w2’
(14.127)
Equation (14.124) gives us the Laplace transform of the output signal as X ( s )=
W
(9+ w 2 ) ( 1 + a s ) .
(14.128)
Using partial fractions, we can write the inverse transform as
z ( t )= F { X ( S ) }
(14.129)
CONNECTION OF SIGNAL PROCESSORS
Series connection of signal processors.
Figure 14.11
Figure 14.12
629
Parallel connection of signal processors.
which yields
x(t) =
[
1 +
w2a2]
sinwt -
[
wa +
w2a2]
coswt
a ] eCtla. (14.130) + [ 1+ww2a2
The last term is called the transient signal, which dies out for large times, hence leaving the stationary signal as
x ( t )=
[
I, ,
1
+
[+ ]
sinwt - 1
wa
w2a2
cos wt.
(14.131)
This can also be written as (14.132) where S = tan-' aw. In summary, for the processor represented by the differential operator
d dx
O=a--1,
a>0,
(14.133)
when the input signal is a sine wave with zero phase, unit amplitude, and angular frequency w, the output signal is again a sine wave with the same angular frequency w but with the amplitude (1 w2a2)-1/2 and phase S = tan-' aw,both of which depend on w.
+
14.13
CONNECTION OF SIGNAL PROCESSORS
In practice we may need to connect several signal processors to obtain the desired effect. For example, if we connect two signal processors in series (Fig.
630
FOURIER AND LAPLACE TRANSFORMS
Figure 14.13
G = Gl(GsG4 + GzGsGs)G7.
14.11), thus feeding the output of the first processor into the second one as the input, that is,
Xi(s) = Gi(s)Ui(s), X2(s) = Gz(s)Xi(s),
(14.134) (14.135)
we obtain X2 ( s ) =
Gz (s)Gi(s)Ui ( s ) .
(14.136)
In other words, the effective transfer function of the two processors, G1 and G2, connected in series become their product:
G ( s ) = Ga(s)Gi(s).
( 14.137)
On the other hand, if we connect two processors in parallel (Fig. 14.12), thus feeding the same input into both processors, (14.138) (14.139) along with combining their outputs,
we obtain the effective transfer function as their sum:
G(s) = Gz(s)
+ Gi(s).
(14.141)
For the combination in Figure 14.13 the effective transfer function is given as
G = G1 (G3G4
+ GzGgG6)G7.
(14.142)
Example 14.3. Signal processors in series: Consider two linear signal processors represented by the following differential equations:
.(t)
+ z ( t )= u(t)
(14.143)
CONNECTION OF SIGNAL PROCESSORS
631
with
x(0)= 0
(14.144)
and
.(t)
+ 2 i ( t ) + 42 = u ( t )
(14.145)
z(0) = i ( 0 ) = 0.
(14.146)
with
The individual transfer functions are
1 Gl(s) = -
(14.147)
1+s'
(14.148) thus for their series connection we write the effective transfer funct,ion as
G(s) = G z ( s ) G i ( s ) 1 (s2+2s+4)(1+s)'
(14.149)
For an input signal represented by the sine function:
( 14.150)
u ( t ) = sint; the Laplace transform is written as dc
1 1+ s 2 '
(14.151)
{ u ( t ) }= U ( s )= -
We can now write the Laplace transform of the output signal as X2 ( s )
= G2 (s)Gi(s)U ( S ) =
[
'I
1
(s2
(14.152)
+ 2s + 4 ) ( 1 + ).
(s2
+ 1)'
(14.153)
Using partial fractions this can be written as 14/3(26)
+ 2~/3(26)
5~/26 . (14.154) X ~ ( S=) s2 + 2 s + 4 1+s 52 + 1 We rewrite this so that the inverse transform can be found easily as +
14
(s
+
13/3(26)
+ 1)2+ 3
+
3(26)
1/26
(s
-
+ 1)2+ 3
L( 5I (1 ) ) . (14.155) -
26
1+s2
26
1+s2
632
FOURIER AND LAPLACE TRANSFORMS
This is the Laplace transform of the output signal. Using our table of Laplace transforms and Equation (14.84), we take its inverse to find the physical signal as
+ d$ where 6 =
-
sin(t
+ 6),
(14.156)
tan-' 5. Note that the output of the first processor, GI, is 1 x l ( t ) = - [ePt+ sint - cost] , 2
which satisfies the initial conditions in Equation (14.146).
PROBLEMS
1. The correlation coefficient is defined as
r = (ZY) - (4 (Y) , o x Uy
where oz and oy are the standard deviations of the samples. Show that doubling all the values of rc does not change the value of T . 2. Show that the correlation function Ro [Eq. (14.9)] can be written as
+ 2T(f
-
fo)
3. Show that the correlation function
s -
O -
24.f
+ fo)
-
24.f 4. Show that
-
fo)
SO [Eq. (14.19)] is given as
'>
2 7 r ( f o + f),
sin00
PROBLEMS
633
5. Show that the amplitude spectrum is an even function and the phase spectrum is an odd function, that is,
4 f )= A ( - f ) , W )= - O ( - f ) . 6. Find the Fourier transform of a Dirac-delta function. 7. Dirac-delta functions are very useful in representing discontinuities in physical theories. Using the Dirac-delta function express the three dimensional density of an infinitesimally thin shell of mass M . Verify your answer by taking a volume integral. 8. Find the Fourier transform of a Gaussian:
[-a 2 J:2 ] .
CY
f ( z ) = -exp
J;;
Also show that 00
where
9. If X ( s ) is the Fourier transform of ~ ( t show ) , that the Fourier transform of its derivative d x / d t is given as
X’(s) = (i2nf)X(s), granted that ~ ( t+) 0 as t + 500. Under similar conditions, generalize this to derivatives of arbitrary order. 10. Given the signal 4e-t,
f ( t )=
{o,
t 2 0,
t c;.
3. Consider solids of revolution generated by all parabolas passing through (0,O) and ( 2 , l ) and rotated about the x-axis. Find the one with the least volume between 2 = 0 and z = 1.
Hint: Take y
=z
+ Coz(2
-
x) and determine CO.
4. In the derivation of the alternate form of the Euler equation verify the equivalence of the relations
d2F dy'2
and
(E)
+
d2F
Byay'
(2) dzdy' d2F
+
dF
-
dy = o
664
CALCULUS OF VARIATIONS
5. Write the Euler equations for the following functionals:
(9 F
=
+
2 ~ ’zyy’ ~ - y2,
(ii)
F = yf2+ csin y, (iii)
F = x3yf2- xzy2 + 2yy’,
6. A geodesic is a curve on a surface, which gives the shortest distance between two points. On a plane, geodesics are straight lines. Find the geodesics for the following surfaces: (i) Right circular cylinder. (ii) Right circular cone. (iii) Sphere.
7. Determine the stationary functions of the functional
for the following boundary conditions: (i) The end conditions y(0) = 0 and y(1) = 1 are satisfied.
(ii) Only the condition y(0) = 0 is prescribed. (iii) Only the condition y(1) = 1 is prescribed.
(iv) No end conditions are prescribed.
8. The brachistochrone problem: Find the shape of the curve joining two points, along which a particle initially a t rest falls freely under the influence of gravity from the higher point t o the lower point in the least amount of time. 9. Find the Euler equation for the problem
6lr2F ( z ,y,y’,y’’) d z = 0
PROBLEMS
665
and discuss the associated boundary conditions. 10. Derive the Euler equation for the problem
subject to the condition that u ( x ,y ) is prescribed on the closed boundary of R.
11. Derive the Euler equation for the problem
F ( X , Y , U , U x ; ~ y ; ~ x x , ~ x y , U ydXdY y) =
0.
What are the associated natural boundary conditions in this case. 12. Write Hamilton's principle for a particle of mass m moving vertically under the action of uniform gravity and a resistive force proportional to the displacement from a n equilibrium position. Write Lagrange's equation of motion for this particle. 13. Write Hamilton's principle for a particle of mass m moving vertically under the action of uniform gravity and a drag force proportional to its velocity. 14. Write Lagrange's equations of motion for a triple pendulum consisting of equal masses, m, connected with inextensible strings of equal length 1. Use the angles @ I , & , and 0 3 that each pendulum makes with the vertical as the generalized coordinates. For small displacements from equilibrium show that the Lagrangian reduces to
15. Small deflections of a rotating shaft of length 1, subjected to an axial end load of P and transverse load of intensity p ( x ) is described by the differential equation d2 (
E I S )
+ Pdz" d2Y
-
pw2y - p ( x )
= 0,
dx2 where E I is the bending stiffness of the shaft, p is the density and w is the angular frequency of rotation. Show that this differential equation can be obtained from the variational principle
6
s' [ 0
~ E I Y"~ -Pyf2 1 - -pw 1 2y2 2 2 2
-
py
666
CALCULUS OF VARIATIONS
What boundary conditions did you impose? For other examples from the theory of elasticity see Hildebrand. 16. A pendulum that is not restricted t o oscillate on a plane is called a spherical pendulum. Using spherical coordinates, obtain the equations of motion corresponding t o T , 0,4:
mgcosQ-T=-m(s8
.2
.2
+ssin284), .2
-mg sin 8 = m(s0 - s sin 0 cos 04 ),
o=--
(ms2sin28$), s sin 8 d t where s is the length of the pendulum and T is the tension in the rope. Since there are basically two independent coordinates, 6 and 4, show that the equations of motion can be written as .2 0 - 4 s i n 0 c o s 8 + -9s i .n 8 = 0 and ms2sin28$ = I , S
where 1 is a constant. Show that the constant 1 is actually the ponent of the angular momentum: Z3
23
com-
= m(xl22 - 2 1 x 2 )
17. When we introduced the natural boundary conditions we used Equation (15.22):
where for stationary paths we set
and
Explain.
18. Write the components of the generalized momentum for the following problems: (i) plane pendulum, (ii) spherical pendulum, (iii) motion of earth around the sun and discuss whether they are conserved or not.
CHAPTER 16
PROBABILITY THEORY AND DISTRIBUTIONS
Probability theory is the science of random events. It has long been known that there are definite regularities among large numbers of random events. In ordinary scientific parlance, certain initial conditions, which can be rather complicated, lead to certain events. For example, if we know the initial position and velocity of a planet, we can be certain of its position and velocity a t a later time. In fact, one of the early successes of Newton’s theory was its prediction of the solar and lunar eclipses for decades and centuries ahead of time. An event that is definitely going t o happen when certain conditions are met is called certain. If there is no set of conditions that could make an event happen, then that event is called impossible. If under certain conditions an event may or may not happen, we call it random. From here, it is clear that the certainty, impossibility, and randomness of an event depends on the set of existing conditions. Randomness could result from a number of reasons. Some of these are the presence of large numbers of interacting parts, insufficient knowledge about the initial conditions, properties of the system, and also the environment. Probability is also a word commonly used in everyday language. Using the available information, we often base our decisions on how probable or Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
8. Selquk Bayin 667
668
PROBABILITY THEORY AND DISTRIBUTIONS
improbable we think certain chains of events are going t o unravel. The severity of the consequences of our decisions can vary greatly. For example, deciding not to take an umbrella, thinking that it will probably not rain, may just result in getting wet. However, in other situations the consequences could be much more dire. They could even jeopardize many lives including ours, corporate welfare, and in some cases even our national security. Probability not only depends on the information available but also has strong correlations with the intended purpose and expectations of the observer. In this regard, many different approaches to probability have been developed, which can be grouped under three categories: (1) Definitions of mathematical probability based on the knowledge and the priorities of the observer. (2) Definitions that reduce to a reproducible concept of probability for the same initial conditions for all observers. This is also called the classical definition of probability. The most primitive concept of this definition is equal probability for each possible event or outcome. ( 3 ) Statistical definitions, which follow from the frequency of the occurrence of an event among a large number of trials with the same initial conditions. We discuss (1) in Chapter 17 within the context of information theory. In this chapter, we concentrate on (2) and (3), which are related. 16.1
INTRODUCTION T O PROBABILITY THEORY
Origin of the probability theory can be traced as far back as to the communications between Pascal (1623-1662) and Fermat (1601-1665). An extensive discussion of the history of probability can be found in Todhunter. Modern probability theory is basically founded by Kolmogorov (1903-1987). 16.1.1
Fundamental Concepts
We define a sample space, S, as the set of all possible outcomes, A, B , . . . , of an experiment:
S
=
{ A ,B , . . . }.
(16.1)
If we roll a die, the sample space is
s = {1,2,3,4,5,61,
(16.2)
where each term corresponds to the number of dots showing on each side. We obviously exclude the events where the die breaks or stands on edge. These events could be rendered as practically impossible by adjusting the conditions. For a single toss of a coin the sample space is going to be composed of two elements.
S
=
{head, tail}.
(16.3)
INTRODUCTION TO PROBABILITY THEORY
669
An event, E, is defined as a set of points chosen from the sample space. For example,
E
=
{1,3,5}
corresponds to the case where the die comes up with an odd number, that is, either a 1, a 3 or a 5. An event may be a single element, such as
E = {4),
(16.4)
where the die comes up 4. Events that are single elements are called simple or elementary events. Events that are not single elements are called compound elements. 16.1.2
Basic Axioms of Probability
We say that S is a probability space if for every event in S we could find a number P(E)with (1) P ( E ) 2 0, (2) P ( S ) = 1, (3) If El and E2 are two mutually exclusive events in S, that is, their intersection is a null set, El n Ez = 8, then the probability of their union is the sum of their probabilities:
+
P(E1 u E2)= P(E1) P(E2).
(16.5)
Any function P ( E ) satisfying the properties (1)-(3) defined on the events of S is called the probability of E. These axioms are due to Kolmogorov. They are sufficient for sample spaces with finite number of elements. For sample spaces with infinite number of events, they have to be modified. 16.1.3
Basic Theorems of Probability
Based on these axioms, we now list the basic theorems of probability theory. Proofs can be found in Harris.
Theorem 16.1. If El is a subset of E s , that is, El
P(E1) < P ( E 2 ) . Theorem 16.2. For any event E in S ,
Theorem 16.3. Complementary set, E", of E is
P ( E " )= 1 - P(E),
c E2, then
670
PROBABILITY THEORY AND DISTRIBUTIONS
where EC
+ E = S , also shown as E" U E = S.
Theorem 16.4. Probability of no event happening is zero, that is,
P ( 0 )= 0 , where 0 denotes the null set. Theorem 16.5.For any two events in S , we can write
Theorem 16.6. If E l , E2, . . . , Em are mutually exclusive events with Ei n Ej = 0, 1 5 i, j 5 m and i # j , then m
P(E1 U E2 U . . . U E m ) =
C P(Ei).
(16.6)
i=l
So far we said nothing about how to find P ( E ) explicitly. When S is finite with elementary events E l , E 2 , .. . , E m , any choice of positive numbers P I ,P2, . . . , P,, satisfying m
C P i 4
(16.7)
i= 1
will satisfy the three axioms. In some situations, symmetry of the problem allows us to assign equal probability to each elementary event. For example, when a die is manufactured properly, there is no reason to favor one face over the other. Hence, we assign equal probability t o each face as 1
p1 = p2 = . . . = p(j = -. 6
(16.8)
To clarify some of these points, let us choose the following events for a die rolling experiment:
Ei
=
{1,3,5}, E2
=
{1,5,6}, and E3
=
{3},
(16.9)
which give
1 1 1 1 P(E1) = - - - = -, 6 6 6 2 1 1 1 1 P(E2) = - - - = 6 6 6 2 ' 1 P(E3) = -. 6
+ + + +
(16.10)
INTRODUCTION TO PROBABILITY THEORY
671
Note that in agreement with Theorem 16.1 we have
We can also use
EE
=
{ 2,4,6), Eg
=
{ 1 , 2 , 4 , 5 , 6 ) , El U E2
=
{1,3,5,6),
(16.12)
and
El n E2 = {1,5}
(16.13)
to write (16.14) 1 6
5 6'
P(E,")= 1 - P(E3) = 1 - - = -
(16.15)
which are in agreement with Theorem 16.3. Also in conformity with Theorem 16.5. we have
+
P(E1 U E2) = P(E1) P(E2) - P(E1 n E2)
(16.16)
( 16.17) For any finite sample space with N elements,
for an event
E = El U E2 U E3 U . . . U Em, m 5 N , where E l , Ez,. . . ,Em are elementary events with Ei n Ej we can write
(16.19) =
8, 1 5 i , j 5 m,
m
P ( E )= C P ( E i ) .
(16.20)
2x1
If the problem is symmetric so that we can assign equal probability to each elementary event, &, then the probability of E becomes
m P ( E )= -, N
(16.21)
where m is the number of elements of S in E . In such cases, finding P ( E ) reduces to simply counting the number of elements of S in E . We come back to these points after we introduce permutations and combinations.
672
PROBABILITY THEORY AND DISTRIBUTIONS
16.1.4
Statistical Definition of Probability
In the previous section using the symmetries of the system, we have assigned, a priori, equal probability to each elementary event. In the case of a “perfect” die, for each face this gives a probability of 1/6 and for a coin toss it assigns equal probability of 1/2 to the two possibilities: heads or tails. An experimental justification of these probabilities can be obtained by repeating die rolls or coin tosses sufficiently many times and by recording the frequency of occurrence of each elementary event. The catch here is, How identical can we make the conditions and how many times is sufficient? Obviously, in each roll or toss, the conditions are slightly different. The twist of our wrist, the positioning of our fingers, etc., all contribute to a different velocity, position and orientation of the die or the coin at the instant it leaves our hand. We also have to consider variation of the conditions where the die or the coin lands. However, unless we intentionally control our wrist movements and the initial positioning of the die or the coin to change the odds, for a large number of rolls or tosses we expect these variations, which are not necessarily small, to be random and to cancel each other. Hence, we can define the statistical probability of an event E as the frequency of occurrences:
P ( E ) = lim n-oo
number of occurrences of E n
(16.22)
Obviously, for this definition to work, the limit must exist. It turns out that for situations where it is possible to define a classical probability, fluctuation of the frequency occurs about the probability of the event and the magnitude of fluctuations die out as the number of tries increases. There are plenty of data to verify this fact. In the case of a coin toss experiment the probability for heads or tails quickly converges to 1/2. In the case of a loaded die, frequencies may yield the probabilities: 1 1 1 1 1 1 P(l)= -, P ( 2 ) = -, P(3) = -, P ( 4 ) = -, P ( 5 ) = -, P ( 6 ) = 2 8 4 16 32 32’ (16.23)
which also satisfy the axioms of probability. In fact, one of the ways t o find out that a die is loaded or manufactured improperly is to determine its probability distribution. When it comes to scientific and technological applications, classical definition of probability usually runs into serious difficulties. First of all, it is generally difficult to isolate the equiprobable elements of the sample space. In some cases the sample space could be infinite with infinite number of possible outcomes or the possible outcomes could be distributed continuously, thus making it difficult, if not always impossible, to enumerate. In order to circumvent some of these difficulties, the statistical probability concept comes in very handy.
INTRODUCTION TO PROBABILITY THEORY
673
16.1.5 Conditional Probability and Multiplication Theorem Let us now consider an experiment where two dice are rolled, the possibilities are given as
The first number in parentheses gives the outcome of the first die, and the second number is the outcome of the second die. If we are interested in the outcomes, { A } ,where the sum is 6, obviously there are 5 desired results out of a total of 36 possibilities. Thus the conditional probability is
5 P ( A ) = -. 36
(16.24)
We now look for the probability that the sum 6 comes (event A ) , if it is known that the sum is an even number (event B ) . In this case the sample space contains only 18 elements. Since we look for the event A after the event B has been realized, the probability is given as 5 P ( A / B )= -. 18
(16.25)
P ( A / B ) ,which is also shown as P ( A 1 B ) , is called the conditional probability of A. It is the probability of A occurring after B has occurred. Let us now generalize this to a case where {Cl,( 3 2 , . . . , en}is the set of uniquely possible, mutually exclusive and equiprobable events. Among this set let rri
5 n denote the number of events acceptable by A,
k: 5 n denote the number of events acceptable by B ,
r denote the number of events acceptable by both A and B . We show the events acceptable by both A and B as AB or A n B . Obviously, r 5 k: and r 5 m. This means that the probability of A happening after B has happened is
r P ( A / B )= k
r/n k/n
= -=
~
P(AB) P(B)
(16.26)
674
PROBABILITY THEORY AND DISTRIBUTIONS
Siiiiilarly.
( 16.27) Note that if P ( B ) is an impossible event, that is, P ( B ) = 0, then Equation (16.26) becomes meaningless. Similarly, Equation (16.27) is meaningless when P ( A ) = 0. Equations (16.26) and (16.27), which are equivalent, represent the multiplication theorem, and we write them as
P ( A B ) = P(A)P(B/A) = P(B)P(A/B).
(16.28)
For independent events, that is, the occurrence of A (or B) is independent of B (or A ) occurring, then the multiplication theorem takes on a simple form:
P ( A B )= P ( A ) P ( B ) .
(16.29)
For cxample. in an experiment we first roll a die, A, and then toss a coin, B. Clearly, the two events are independent. The probability of getting the iiuiiiber 5 in the die roll and a head in the coin toss is
P ( A B )=
1 1
6 2
=
1 12
-.
(16.30)
16.1.6 Bayed Theorem Let us iiow consider a tetrahedron with its faces colored as the first face is red, A, the second face is green: B , the third face is blue, C, and finally the fourth face is in all three colors, ABC. In a roll of the tetrahedron the color red has the probability 1 1 1 P ( A ) = - + - = -. 4 4 2
(16.31)
This follows from the fact that the color red shows in 2 of the 4 faces. Similarly, wc can write the following probabilities: 1 2
P ( B ) = P ( C ) = -, 1 P ( A / B ) = P ( B / C )= -, 2 1 P ( C / A )= P ( B / A ) = -, 2 1 2
P ( C / B )= P ( A / C ) = -.
(16.32)
INTRODUCTION TO PROBABILITY THEORY
675
This means that the events A, B , C are pairwise independent. However, if it is known that B and C has occurred, then we can be certain that A has also occurred, that is,
P ( A / B C )= 1,
(16.33)
which means that events A, B , C are collectively dependent. Let us now consider an event B that can occur together with one and only one of the n mutually exclusive events:
{ A l , Az, . . . ,An}.
(16.34)
Since BA, and BA, with i # j are also mutually exclusive events, we can use the addition theorem of probabilities to write n
P ( B )= C P ( B A i ) .
( 16.35)
i=l
Using the multiplication theorem [Eq. (16.28)], this becomes n
( 16.36) i=l
which is also called the total probability. We now drive an important formula called Bayes’ formula. It is required to find the probability of event Ai provided that event B has already occurred. Using the multiplication theorem [Eq. (16.28)], we can write
P(AiB) = P ( B ) P ( A i / B )= P ( A i ) P ( B / A i ) ,
(16.37)
which gives (16.38) Using the formula of total probability [Eq. (16.36)], this gives Bayes’ formula:
( 16.39) which gives the probability of event Ai provided that B has occurred first. Bayes’ formula has interesting applications in decision theory and is also used extensively for data analysis in physical sciences (Bather; Sivia and Skilling). Example 16.1. Colored balls in six bags: Three bags have composition A1 with 2 white and 1 black ball each, one bag has composition A2 with 9 black balls, and the remaining 2 bags have composition A3 with 3
676
PROBABILITY THEORY AND DISTRIBUTIONS
white balls and 1 black ball each. We select a bag randomly and draw one ball from it. What is the probability that this ball is white? Call this event B. Since the ball could come from any one of the six bags with compositions All A2, and A3, we can write
B
= A1B
+ A2B + A3B.
(16.40)
Using the formula of total probability, we write
P(B)= P ( A i ) P ( B / A i )+ P(A2)P(B/A2)+ P(A3)P(B/A3), (16.41) where (16.42) 3 4
2 3
P ( B / A l )= -, P(B/A2)= 0 , P(B/A3)= -,
(16.43)
to obtain 3 2 6’3
1 6
P(B)=--+-.O+--
2 3 6‘4 (16.44)
Example 16.2. Given six identical bags: The bags have the following contents:
3 bags with contents A1 composed of 2 white and 3 black balls each, 2 bags with contents A2 composed of 1 white and 4 black balls each, 1 bag with contents A3 composed of 4 white and 1 black balls each. We pick a ball from a randomly selected bag, which turns out t o be white. Call this event B. We now want to find, after the ball is picked, the probability that the ball was taken from the bag of the third composition. We have the following probabilities:
( 16.45) 2
1 5
4 5
P ( B / A I )= -, P(B/A2)= -, P(B/A3)= -. 5
(16.46)
Using the Bayes’ formula [Eq. (16.39)], we obtain
(16.47)
677
INTRODUCTION TO PROBABILITY THEORY
2a
Figure 16.1
Buffon’s needle problem.
16.1.7 Geometric Probability and Buffon’s Needle Problem We have mentioned that the classical definition of probability is insufficient when we have infinite sample spaces. It also fails when the possible outcomes of an experiment are distributed continuously. Consider the following general problem: On a plane we have a region R and in it another region r. We want to define the probability of a thrown point landing in the region r. Another way to pose this problem is: What is the probability of a point, coordinates of which are picked randomly, falling into the region r. Guided by our intuition, we can define this probability as
P=
area of r area of R ’
(16.48)
which satisfies the basic three axioms. We can generalize this formula as
P=
measure of r measure of R’
(16.49)
where the measure stands for length, area, volume, etc.
Example 16.3. Buffon’s needle problem: We partition a plane by two parallel lines separated by a distance of 2a. A needle of length 21 is thrown randomly onto this plane (1 < u ) . We want to find the probability of the needle intersecting one of the lines. We show the distance from the center of the needle to the closest line with x and the angle of the needle with Q (Fig. 16.1). Configuration of the needle is completely specified by x and 8. For the needle to cross one of the lines, it is necessary and sufficient that
x 5 1 sin Q
(16.50)
678
PROBABILITY THEORY AND DISTRIBUTIONS
I I
a
7--
x
I
= lsin 0
I I I I I I I
I I I I I I I I I I
0
Figure 16.2
0
Area in the Buffon’s needle problem
be satisfied. Now the probability is the ratio of the region under the curve x = 1 sin8 to the area of the rectangle 1 sinddd in Figure 16.2:
st
P=
:J I sin 8 d8 an
(16.51) Historically, the Buffon’s needle problem was the starting point in solving certain problems in the theory of gunfire with varying shell sizes. It has also been used for purposes of estimating the approximate value of 7r. For more on the geometric definition of probability and its limitations, we refer to Gnedenko.
16.2 PERM UTATlONS A N D COMB INAT10NS We mentioned that in symmetric situations, where the sample space is finite and each event is equally probable, assigning probabilities reduces to a simple counting process. To introduce the basic ideas, we use a bag containing a number of balls numbered as 1 , 2 , . . . , N . As we shall see, the bag and the balls could actually stand for many things in scientific and technological applications.
16.2.1 The Case of Distinguishable Balls with Replacement We draw a ball from the bag, record the number, and then throw the ball back into the bag. Repeating this process k times, we form a k-tuple of numbers,
679
PERMUTATIONS AND COMBINATIONS
( I C ~ , X .~. . , ,xz,. . . ,xk), where x, denotes the number of the zth draw. Let 5' be the totality of such k-tuples. In the first draw, we could get any one of the N balls; hence there are N possible and equiprobable outcomes. In the second draw, since the ball is thrown back into the bag or replaced with an identical ball, we again have N possible outcomes. All together, this gives N2 possible outcomes for the first two draws. For k draws, naturally the sample space contains
(16.52)
lVk
k-tuples as possible outcomes 16.2.2
The Case of Distinguishable Balls Without Replacement
We now repeat the same process but this time do not replace the balls. In the first draw, there are N independent possibilities. For the second draw, there are N - 1 balls left in the bag, hence only N - 1 possible outcomes. For the r t h draw, r 5 k , there will be N - T 1 balls left, thus giving only N - T 1 possibilities. For k draws, the sample space will contain N ( k )elements:
+
N ( k )= N ( N
-
+
1)(N- 2 ) . . . ( N - k
+ 1),
(16.53)
which can also be written as "k)
=
-
N ( N - 1) . . . ( N - k + 1) [ ( N - k ) ( N - k [(N-k)(N- k-1)...2.1] N! (N -k)!'
-
1). . . 2 . 11
~
(16.54)
Permutation is a selection of objects with a definite order. N objects distributed into k numbered spaces has N ( k )distinct possibilities, N ( ' ) , which is also written as N P ~When . k = N , we have N ! possibilities, thus we can write (16.55) This is also taken as the definition of O! as O! = 1.
(16.56)
Example 16.4. A coin i s tossed 6 t i m e s : If we identify heads with 1 and tails with 2, this is identical t o the bag problem with replacement, where AT = 2 and k = 6. Thus there are 26
= 64
680
PROBABILITY THEORY AND DISTRIBUTIONS
possibilities for the 6-tuple numbers. The possibility of any one of them coining, say E = ( I , 2 , 1 , 1 , 1 ,a),is 1 P ( E ) = -. 64
(16.57)
Example 16.5. A die i s rolled five times: This is identical to the bag problem with replacement, where N = 6 and k = 5. We now have 65 = 7776
(16.58)
possiblc outcomes Example 16.6. Five cards selected f r o m a deck of 52 playing cards: If we assume that the order in which the cards are selected is irrelevant, then this is equivalent to the bag problem without replacement. Now, N = 52 and k = 5, which gives 52(5) =
52! = 311,875,200 (52 - 5)!
(16.59)
possible outcomes. Example 16.7. Friends t o visit: Let us say that we arrived at our home town and have 5 friends to visit. There are 5! = 120 different orders that we can do this. If we have time for only three visits, then there are 5(3) = 60 different ways. Example 16.8. Number of different numbers: How many different numbers can we make from the digits 1 , 2 , 3 , 4 ? If we use two digits and if repeats are permitted, there are 42 = 16 possibilities. If we do not allow rcpeats, then there are only 4(2)= 12 possibilities.
16.2.3 The Case of Indistinguishable Balls Lct us iiow consider N balls, where not all of them are different. Let there be 7 7 1 balls of one kind, 122 balls of the second kind,. . . , n k balls of the kth kind SUCll that n1
+ n2 + . . . + nk = N .
(16.60)
We also assume that balls of the same kind are indistinguishable. A natural question to ask is, In how many distinct ways, N(”ln2. n k ) (also written as N P , , ~ ,r l,A~) , can we arrange these balls? When all the balls are distinct, wc have N ! possibilities but n1 balls of the first kind are indistinguishable. Thus. n l ! of these possibilities, that is, the permutations of the n1 balls among themselves, lead to identical configurations. Hence, for distinct arrangements
PERMUTATIONS AND COMBINATIONS
681
we have t o divide N ! by nl!. Arguing the same way for the other kinds, we obtain
Permutation is an outcome with a particular ordering. For example, 1234 is a different permutation of 4231. In many situations we are interested in selection of objects with no regard to their order. We call such arrangements combinations and show them as (16.62)
ncr,
which means the number of ways r objects can be selected out of n objects with no att,ention paid to their order. Since the order of the remaining n,- r objects is also irrelevant, among the ,P, = n ! / r ! permutations, there are ( n - r ) ! that give the same combination, thus
,c,= ( n -p,r ) ! 72
(16.63)
~
-
n! ( n - r)!r!
(16.64)
'
Combinations are often shown as
nc, =
(;).
(16.65)
It is easy to show that
(r)
=
(n
r).
(16.66)
Example 16.9. N u m b e r of p o k e r hands: In a poker hand there are 5 cards from an ordinary deck of 52 cards. These 52 cards can be arranged in 52! different ways. Since the order of the 5 cards in a player's hand and the order of the remaining 47 cards do not matter, the number of possible poker hands is
(Y) 16.2.4
=
52! = 2,598,960. (5!)(47!)
(16.67)
Binomial and Multinomial Coefficients
Since ,C, also appear in the binomial expansion n
(16.68) j=O
682
PROBABILITY THEORY AND DISTRIBUTIONS
they are also called the binomial coefficients. Similarly, the multinomial expansion is given as
where the sum is over all nonnegative integer r-tuples ( k l ,k2,. . . ,k r ) with their sum kl k2 . . . k , = n. The coefficients defined as
+ + +
( 16.70) are called the multinomial coefficients. Some useful properties of the binomial coefficients are:
(16.71) (16.72) (16.73)
( 16.74) (16.75)
16.3
APPLICATIONS T O STATISTICAL MECHANICS
An important application of the probability concepts discussed so far comes from statistical mechanics. For most practical applications, it is sufficient to consider gases or solids as collection of independent particles, which move freely except for the brief moments during collisions. In a solid, we can consider atoms vibrating freely essentially independent of each other. According to quantum mechanics, such quasi-independent particles can only have certain discrete energies given by the energy eigenvalues €1, € 2 , . . . . Specific values of these energies depend on the details of the system. At a given moment and at a certain temperature, the state of a system can be described by giving the number of particles with energy number of particles with energy
€1, €2,
(16.76)
683
APPLICATIONS TO STATISTICAL MECHANICS
Our basic goal in statistical mechanics is to find how these particles are distributed among these energy levels subject to the conditions
Cr~i N ; =
(16.77)
i
C r L i E i=
u,
(16.78)
i
where N is the total number of particles and U is the internal energy of the system. To clarify some of these points, consider a simple model with 3 atoms, a,b, and c (Wilks). Let the available energies be O , E , ~ E ,and 3 ~Let . us also assume that the internal energy of the system is 3 ~Among . the three atoms, this energy could be distributed as
It is seen that all together there are 10 possible configurations or complexions, also called the microstates, in which the 3~ amount of energy can be distributed among the three atoms. Since atoms interact, no matter how briefly, through collisions, the system fluctuates between these possible complexions. We now introduce the fundamental assumption of statistical mechanics by postulating, a priori, that all possible complexions are equally probable. If we look at the complexions a little more carefully, we see that they can be grouped into three states, S1,S2,S3, with respect t o their occupancy numbers, 121 , 712, 7131 as
Note that only 1 complexion corresponds to state S1, 3 complexions to S2 and 6 complexions to state 5’3. Since all the complexions are equiprobable, probabilities of finding the system in the states 5’1,S2, and Ss are and respectively. This means that if we make sufficiently many observations, 6 out of 10 times the system will be seen in state S3, 3 out of 10 times it will be in state S2,and only 1 out of 10 times it will be in state 5’1. In terms of
&,
&, &,
684
PROBABILITY THEORY AND DISTRIBUTIONS
a time parameter, given sufficient time, the system can be seen in all three states. However, this simple 3-atom model will spend most of its time in state 5'3, which can be considered as its equilibrium state.
16.3.1 Boltzmann Distribution for Solids We now extend this simple model t o a solid with energy eigenvalues E I , € 2 , . . . . A particular state can be specified by giving the occupancy numbers 7 2 1 , 1 2 2 , . . . of the energy levels. We now have a problem of N distinguishable atoms distributed into boxes labeled E ~ , E Z , . . , , so that there are nl atoms in box 1, 712 atoms in box 2, etc. Atoms are considered as distinguishable in the sense that we can identify them in terms of their locations in the lattice. Since how atoms are distributed in each box or the energy level is irrelevant, the number of complexions corresponding to a particular state is given as (16.79) The most probable state is naturally the one with the maximum number of complexions subject t o the two constraints
N=CTl,
(16.80)
2
and
u =Y n i E , .
(16.81)
Mathematically, t is problem is solved by finding the occupancy numbers that make W a maximum. For reasons t o be clear shortly, we maximize 1nW and write (16.82)
d (In W ) Sni . dni
=C i
(16.83)
The maximum number of complexions satisfy the condition SlnW
= 0,
(16.84)
subject to the constraints
C~ni= 0, C~ni~i = 0.
(16.85)
i
i
(16.86)
APPLICATIONS TO STATISTICAL MECHANICS
685
We now introduce two Lagrange undetermined multipliers, Q and P. Multiplying Equation (16.85) by o and Equation (16.86) by P and then adding to Equation (16.83) gives (16.87) With the introduction of the Lagrange undetermined multipliers. we can treat all Sn, in Equation (16.87) as independent and set their coefficients to zero:
d (In W) dni
=o.
+cr+PEi
(16.88)
We now turn to Equation (16.79) and write In W as In W
N!
= In
nl!nz!..
'
= InN! - C 1 n ( n Z ! ) .
(16.89)
2
Using the Stirling approximation for the factorial of a large number, namely lnn,!
rz
n, Inn,
-
n,,
(16.90)
this can also be written as (16.91) After differentiation, we obtain
=
-Inn,.
(16.92)
Substituting this into Equation (16.88), we write -hn,
+ + PE, = 0 Q
(16.93)
to obtain
n,= A e P E 7 ,
(16.94)
where we have called A = e a . This is the well-known Boltzmann formula. As we shall see shortly, p is given as
p=--
1 kT
,
(16.95)
where k is the Boltzmann constant and T is the temperature and A is determined from the condition which gives the total number of particles as N = ni.
xi
686
16.3.2
PROBABILITY THEORY AND DISTRIBUTIONS
Boltzmann Distribution for Gases
Compared to solids. the case of gases have basically two differences. First of all, atoms are now free t o move within the entire volume of the system. Hencc, they are not localized. Second, the distribution of energy eigenvalues is practically continuous. There are many more energy levels than the number of atonis, hence the occupancy number of each level is usually either 0 or 1. Mostly 0 and almost never greater than 1. For example, in 1 cc of helium gas at 1 atm and 290 K there are approximately lo6 times more levels than atonis. Since atoms can not be localized, we treat them as indistinguishable particlcs. For the second difference, we group neighboring energy levels in tmndlcs so that a complexion is now described by saying nl particles in the 1st bundle of g1 levels with energy E I , n2 particles in the 2nd bundle of g2 levels with energy ~ 2
,
The choice of g k is quite arbitrary, except that n k has to be large enough to w:trraiit usage of the Stirling approximation of factorial. Also, gk must be large but not too large so that each bundle can be approximated by the avcragc energy E L . As before, the most probable values of n k are the ones corresponding to the niaxiniuni number of complexions. However, W is now more complicated. We first conceiitrate on the kth bundle, where there are gk levels available for t i & particles. For the first particle, naturally all gk levels are available. For thc second particle, there will be (gk - 1) levels left. If we keep going on like this, we find
diffcwiit possibilities. Since gk
>> n k , we can write this as g:k.
(16.97)
Witliiii the kth bundle, it does not matter how we order the n k particles, thus we divide 9:' with n k ! . This gives the number of distinct complexions for the kth biuidle as (16.98) Siiiiilar expressions for all the other bundles can be written. Hence the total nuiiibcr of coriiplcxions become (16.99)
APPLICATIONS TO STATISTICAL MECHANICS
687
This has to be maximized subject to the conditions
N
(16.100)
=c n k , k
u= C n k E k .
(16.101)
k
Proceeding as for the gases and introducing two Lagrange undetermined multipliers, a: and 0,we write the variation of 1nW as
( 16.102) This gives the number of complexions as nk
where A comes from N
=
(16.103)
=AgkePEk,
C kn k , arid ,i3 is again equal to -1lkT.
16.3.3 Bose-Einstein Distribution for Perfect Gases We now remove the restriction on n k . For the Bose-Einstein distribution there is no restriction on the number of particles that one can put in each level. We first consider the number of different ways that we can distribute n k particles over the g k levels of the kth bundle. This is equivalent to finding the number of different ways that one can arrange N indistinguishable particles in g k boxes. Let us consider a specific case with 2 balls ( n k = 0 , 1 , 2 ) and three boxes ( g k = 3 ) . The 6 distinct possibilities, which can be described by the formula
6=
+
[2 (3 - l)]! 2!(3 - I)!
are shown below:
I ++ II - II - I
I + II
-
I1 + I
I - II ++ II I -
I
-
II + II + I
I - II - II ++ I
’
( 16.104)
688
PROBABILITY THEORY AND DISTRIBUTIONS
This can be understood by the fact that for three boxes there are two partitions, shown by the double lines, which is one less than the number of boxes. The numerator in Equation (16.104), [ 2 + (3 - l)]!, gives the number of permutations of the number of balls plus the number of partitions. However, the permutations of the balls, 2!, and the permutations of the partitions, (3l)!, among themselves do not lead to any new configurations, which explains the denominator. Since the number of partitions is always 1 less than the number of boxes, this formula can be generalized as (16.105) For the whole system this gives W as (16.106) Proceeding as in the previous cases, that is, by introducing two Lagrange undetermined multipliers, cr and p, for the two constraints, N = C k n k and = n k & k , respectively and then maximizing In W , we obtain the Bose-Einstein distribution as
u
ck
( 16.107) One can again show that /3 = -l/lcT. Notice that for high temperatures, where the - 1 in the denominator is negligible, Bose-Einstein distribution reduces to the Boltzmann distribution [Eq. (16.103)]. Bose-Einstein Condensation: Using the method of ensembles, one can show that the distribution is also written as (16.108) where ni is now the average number of particles in the i t h level, not the group of levels, with the energy ~ i .For the lowest level, i = 1, this becomes (16.109) which means that we can populate the lowest level by as many particles as we desire by making cr very close t o E I I l c T . This phenomenon with very interesting applications is called the Bose-Einstein condensation. 16.3.4
Fermi- Dirac Distribution
In the case of Fermi-Dirac distribution the derivation of ni proceeds exactly the same way as in the Bose-Einstein distribution. However, with the exception that due to Pauli exclusion principle, each level can only be occupied
STATISTICAL MECHANICS AND THERMODYNAMICS
689
by only one particle. For the first particle there are g k levels available, which leaves only ( g k - 1) levels for the second particle and so on, thus giving the number of arrangements as. (16.110) Since the particles are indistinguishable, n! arrangements among themselves have no significance, which for the kth bundle gives the number of possible arrangements as gk!
n!(gk
-
nk)!'
(16.111)
For the whole system this gives (16.112) Using the method of Lagrange undetermined multipliers and the constraints = c k n k and u = x k n k & k , one obtains the Fermi-Dirac distribution function as (16.113) We have again written ,O = - l / k T and a is to be determined from the condition N = C kn k . With respect to the Bose-Einstein distribution [Eq. (16.107)], the change in sign in the denominator is crucial. It is the source of the enormous pressures that hold up white dwarfs and neutron stars. 16.4
STATISTICAL MECHANICS A N D T H E R M O D Y N A M I C S
All the distribution functions considered so far contained two arbitrary constants, cr and @, which were introduced as Lagrange undetermined multipliers. In order to be able to determine the values of these constants, we have to make contact with thermodynamics. In other words, we have t o establish the relation between the microscopic properties like the occupation numbers, energy levels, number of complexions, etc., and the macroscopic properties like the volume (V), density ( p ) , pressure ( P ) ,and entropy ( S ) . 16.4.1
Probability and Entropy
We know that in reaching equilibrium, isolated systems acquire their most probable state, that is, the state with the most number of complexions. This is analogous t o the second law of thermodynamics, which says that isolated
690
PROBABILITY THEORY AND DISTRIBUTIONS
systems seek their maximum entropy state. In this regard, it is natural to expect a connection between the number of complexions, W , and the thermodynamic entropy, S.To find this connection, let us bring two thermodynamic systems, A and B , with their respective entropies, SA and S B , in thermal contact with each other. The total entropy of the system is
S = SA + SB.
(16.114)
If W A and W , are their respective number of complexions, also called microstates, the total number of complexions is
W = W A. W B . If we call the desired relation
(16.115)
S = f ( W ) ,Equation (16.114) means that
+ f(Wi3) = f(WAWB).
f(wA)
(16.116)
Differentiating with respect to Wu gives us
~ ’ W =BW) A ~ ’ ( W A W B ) .
(16.117)
Differentiating once more but this time with respect to W A ,we get
(16.118) The first integral of this gives In f ’ ( W ) = - In W
+ constant
(16.119)
or
( 16.120) Integrating once more, we obtain f ( W ) = k l n W + constant, where k is some constant to be determined. We can now write the relation between the entropy and the number of complexions as
S
=
k l n W +SO.
(16.121)
If we define the entropy of a completely ordered state, that is, W = 1 as 0, we obtain the final expression for the relation between the thermodynamic entropy and the number of complexions, W, as S=klnW.
(16.122)
691
STATISTICAL MECHANICS AND THERMODYNAMICS
16.4.2
Derivation of
Consider two systems, one containing N and the other N’ particles, brought into thermal contact with each other. State of the first system can be described by giving the occupation numbers as nl
n2
particles in the energy states particles in the energy states
€1 ~2
Similarly, the second system can be described by giving the occupation numbers as n: particles in the energy states n/2 particles in the energy states
E:
E;
Now the total number of complexions for the combined system is
w = w1 . w,
(16.123) (16.124)
When both systems reach thermal equilibrium, their occupation numbers become such that In W is a maximum subject to the conditions
N =
Eni, En:,
(16. 25)
i
N’
=
(16. (16.127)
2
where N , N’. and the total energy, U , are constants. Introducing the Lagrarige undetermined multipliers, a , a’, and ,/3, we write
(16.128) i
( 16.129) i
(16.130) i
Proceeding as in the previous cases, we now write Sln W Stirling’s formula [Eq. (16.90)]to obtain
=0
E(-Inn, + a + ,Lkl)Sn,+ X(-Inn; + a’ + p&i)Sn; L
J
and employ the
= 0.
(16.131)
692
PROBABILITY THEORY AND DISTRIBUTIONS
For this to be true for all Sni and an;, we have to have (16.132) where A = eQ and A' = ea'. In other words, p is the same for two systems in thermal equilibrium. To find an explicit expression for p, we slowly add d Q amount of heat into a system in equilibrium. During this process, which is taking place reversibly at constant temperature T , the allowed energy values remain the same but the occupation numbers, ni, of each level change such that W is still a maximum after the heat is added. Hence, using Equation (16.83), (16.133) and [Eq. (16.87)]: (16.134) the change in In W is written as
=
-aC~ni- P C E ~ S ~ ~(16.135) . i
i
During this process, the total number of particles does not change: = 0.
(16.136)
i
Since the heat added to the system can be written as
dQ =
SniEi,
(16.137)
i
Equation (16.135) becomes
p = --. d l n W dQ
(16.138)
Using the definition of entropy obtained in Equation (16.122), S = kln W, Equation (16.138) can also be written as (16.139)
RANDOM VARIABLES AND DISTRIBUTIONS
Figure 16.3
693
Time function X ( t ) .
In thermodynamics, in any reversible heat exchange taking place at constant temperature, dQ is related to the change in entropy as
dQ = TdS.
(16.140)
Comparing Equations (16.139) and (16.140), we obtain the desired relation as
p=-- 1
kT’
(16.141)
where k can also be identified as the Boltzmann constant by further comparisons with thermodynamics.
16.5
RANDOM VARIABLES A N D DISTRIBUTIONS
The concept of random variable is one of the most important elements of the probability theory. The number of rain drops impinging on a selected area is a random variable, which depends on a number of random factors. The number of passengers arriving at a subway station at certain times of the day is also a random variable. Velocities of gas molecules take on different values depending on the random collisions with the other molecules. In the case of electronic noise, voltages and currents change from observation to observation in a random way. All these examples show that random variables are encountered in many different branches of science and technology. Despite the diversity of these examples, mathematical description is similar. Under random effects, each of these variables is capable of taking a variety
694
PROBABILITY THEORY AND DISTRIBUTIONS
of values. It is imperative that we know the range of values that a random variable can take. However, this is not sufficient. We also need to know the frequencies with which a random variable assumes these values. Since random variables could be continuous or discrete, we need a unified formalism to study their behavior. Hence, we introduce the distribution function of probabilities of the random variable X as
F/y(x) = P ( X 5 x).
(16.142)
From now on we show random variables with the uppercase Latin letters, X , Y, . . . , and the values that they can take with the lowercase Latin letters, x,y,. . . . Before we introduce what exactly F X ( L Cmeans, ) consider a time function X ( t ) shown as in Figure 16.3. The independent variable t usually stands for time, but it could also be considered as any parameter that changes continuously. We now define the distribution function Fx(x)as 1
rT
F X ( z ) = lim -5 C x [ X ( t ) dt, ] T - m 2T 1-T
( 16.143)
where C, is defined as (16.144) The role of C, can be understood from the next figure (Fig. 16.4), where the integral in Equation (16.143) is evaluated over the total duration of time during which X ( L C is ) less than or equal to x in the interval [-T,T].The interval over which X 5 x is indicated by thick lines. Thus, (16.145) is the time average evaluated over the fraction of the time that X(t)is less than or equal to x. Note that the distribution, F x ( z ) , gives not a single time average but an infinite number of time averages, that is, one for each x. Example 16.10.
Arcsine distribution: Let us now consider the function
X ( t ) = sinwt.
(16.146)
%],
During the period [0, X ( t ) is less than x in the intervals indicated by thick lines in Figure 16.5. When LC > 1, X ( t ) is always less than x, hence
Fx(Z)= 1, x > 1.
(16.147)
RANDOM VARIABLES AND DISTRIBUTIONS
T
-T Figure 16.4
When x
695
Time average of X ( t )
< -1, X ( t ) is always greater than x , hence F x ( x )= 0 , x
< 1.
(16.148)
For the regions indicated in Figure 16.5 we write
=
1 -[7r+2sin-'x],
(16.149)
W
thus obtaining the arcsine distribution as 1,
x > 1,
1 + - s1i n. - 1 x ,
1x1 5 1,
0,
x < -1.
2 7 r
(16.150)
Arcsine distribution is used in communication problems, where an interfering signal with an unknown constant sinosoid of unknown phase may be thought to hide the desired signal.
696
PROBABILITY THEORY AND DISTRIBUTIONS
Figure 16.5
16.6
Arcsine distribution
DISTRIBUTION FUNCTIONS AND PROBABILITY
In the above example, the evaluation of the distribution function was simple, since the time function, X ( t ) , was given in terms of a simple mathematical expression. In most practical situations due to the random nature of the conditions, X ( t ) cannot be known ahead of time. However, the distribution function may still be determined by some other means. The point that needs to be emphasized is that random processes in nature are usually defined in terms of certain averages like distribution functions. If we remember the geometric definition of probability given in Section 16.1.7, the fraction of time that X ( z ) 5 J: is actually the probability of the event { X ( t )5 x} happening. In the same token, we call the fraction of time that 2 1 < X ( t ) 5 52 the probability of the event (51 < X ( t ) 5 52} and show it as P(z1 < X ( t ) 5 z 2 } . From this, it follows that
P { X ( t )5
5) = Fx(5).
(16.151)
Since
1, a < x < b , (16.152)
G ( x )- Ca(5)= 0, otherwise,
DISTRIBUTION FUNCTIONS AND PROBABILITY
697
we can write
P{zl < X ( t )5 =
z2} =
l o o
lim 2T
T+=
{CZ* [X(t)l- cz, [X(t)I}dt
Fx(z2)- Fx(z1).
(16.153)
Thus, the probability P { z l < X ( t ) 5 z2} is expressed in terms of the distribution function F x (x).This argument can be extended to nonoverlapping intervals [ X I , zz], [x3,24],. . . as P{z.l < X ( t ) 5
x2,53
< X ( t )5 24,’..}
=
[Fx(x2)- Fx(z1)l + [Fx(z4)- Fx(z3)I + . . . .
(16.154)
From the definition of the distribution function, also called the cumulative distribution function, one can easily check that the following conditions are satisfied: (i) 0 I Fx(z)51, (ii) limr--oo F x ( z ) = 0 and limz-m Fx(z)= 1, (iii) F x ( z ) 5 Fx(z’), if and only if z 5 2’. This means that Fx(z)satisfies all the basic axioms of probability given in Section 16.1.2. It can be proven that a real valued function of a real variable which satisfies the above conditions is a distribution function. In other words, It is possible to construct at least one time function, X ( z ) , the distribution function of which coincides with the given function. This result removes any doubts about the existence of the limit i
F x ( z ) = Iim T-oo 2;]-~
r7
CZ[X(t)]d t .
(16.155)
Note that condition (iii) implies that wherever the derivative of F x ( z ) exists, it is always positive. At the points of discontinuity, we can use the Diracdelta function to represent the derivative of F x ( z ) . This is usually sufficient to cover the large majority of the physically meaningful cases. With this understanding, we can write all distributions as integrals: (16.156) where (16.157) The converse of this statement is that if p x ( z ) is any nonnegative integrable function, (16.158)
698
PROBABILITY THEORY AND DISTRIBUTIONS
Figure 16.6
The uniform distribution.
) in Equation (16.156) satisfies the conditions (i)-(iii). The then F X ( I Cdefined function, ~ x ( I cobtained ), from Fx(x)is called the probability density function. This name is justified if we write the event { X ( t ) in D } , where D represents some set of possible outcomes over the real axis, as
P { X ( t ) in D} =
(16.159)
or as
P { z 5 X ( t ) 5 IC
16.7
+ d r ~ }= p x ( ~dx. )
(16.160)
EXAMPLES OF CONTINUOUS DISTRIBUTIONS
In this section we introduce some of the most commonly encountered continuous distribution functions. 16.7.1
Uniform Distribution
Probability density for the uniform distribution is given as
( 16.161)
699
EXAMPLES OF CONTINUOUS DISTRIBUTIONS
where a is a positive number. The distribution function Fx(x) is easily obtained from Equation (16.156) as (Fig. 16.6)
I
0,
x < -a, (16.162)
16.7.2
Gaussian or Normal Distribution
The bell-shaped Gauss distribution is defined by the probability density
PX(X) =
1
e-(1/202)(z-m)2
, a>0, - m < x < c Q ,
(16.163)
and the distribution function (Fig. 16.7)
( 16.164) where
( 16.165) Gaussian distribution is extremely useful in many different branches of science and technology. In the limit as a + 0, px(x) becomes one of the most commonly used representations of the Dirac-delta function. In this sense, the Dirac-delta function is a probability density.
16.7.3
Gamma Distribution
{
The Gamma distribution is defined by the probability density
a n e ; i i )2n-l
, x>o,
PX(X) = 0,
( 16.166)
x50,
where a , n > 0. It is clear that px(x) 2 0 and S_",px(x) dx = 1. There are two cases that deserves mentioning: (i) The case where n = 1 is called the exponential distribution:
PX(X> =
ae-"",
x > 0,
0,
x 5 0.
(16.167)
700
PROBABILITY THEORY AND DISTRIBUTIONS
Figure 16.7
The Gauss or the normal distribution
(ii) The case where a = 1/2, n = m/2, where m is a positive integer, is called the x2 distribution (Chi-square), with m degrees of freedom:
,-xPx(m12)-1 Px(X) =
2m/2qm/2) '
x > 0, (16.168)
In general, the integral
(16.169) cannot be evaluated analytically for the x2 distribution, however there exists extensive tables for the values of Fx(x).The x2 distribution is extremely useful in checking the fit of an experimental data t o a theoretical one.
16.8
DISCRETE PROBABILITY DISTRIBUTIONS
When X is a discrete random variable, the distribution function, Fx(x),becomes a step function. Hence it can be specified by giving a sequence of numbers, x1,x2,. . . , and the sequence of probabilities, px(xl),px(z2),. . . , satisfying the following conditions: px(z2) > 0,
2
= 1 , 2 , .. .
, ( 16.170)
DISCRETE PROBABILITY DISTRIBUTIONS
701
Now, the distribution function, F x ( z ) ,is given as
Some of the commonly encountered discrete distributions are given below: 16.8.1
Uniform Distribution
Given a bag with N balls numbered as 1 , 2 , . . . ,N . Let X be the number of the ball drawn. When one ball is drawn at random, the probability of any outcome z = 1 , 2 , .. . , N is 1 P x ( Z ) = -.
(16.172)
N
The (cumulative) distribution function is given in terms of the step function as
( 16.173) A ,
i=l
where i is an integer and O(z - i) =
16.8.2
{
1,
xzi,
0,
x 0. Using this write the mean and the variance .
38. Given the distribution function F x ( x ) = 22/37r --1/2xe-"4/2 1
x>0,
find the mean and the variance. 39. Complete the details of the integrals involved in the derivation of the moments of the Gaussian distribution.
40. Given the following probability density of X : c0x4(1
-
x)4, 0
Figure 17.1
727
I 1> Pinball machine with one pin.
discuss other aspects of information and its relation t o decision theory. Our system is composed of a car with a driver and a road map, where at each junction the road bifurcates. The road could have as many junctions as one desires and naturally a corresponding number of potential destinations. We can also have an observer who can find out by which destination the car has arrived by checking each terminal. If the driver flips a coin at each junction to choose which way to go, we basically have the Roederer’s pinball machine, where the car is replaced by a ball and the gravity does the driving. Junctions are the pins of the pinball machine that deflect the balls to the right or the left with a definite probability, usually equally, and the terminals are now the bins of the pinball machine. For the time being, let us continue with the pinball machine with one pin and two bins as shown in Figure 17.1. Using binary notation, we show the state of absence of a ball in one of the bins, left or right, as 10) and its presence as 11). Note that observing only one of the bins is sufficient. In classical information theory, if the ball is not in one of the bins, we can be certain that it is in the other bin. If the machine is operated N times, the probability of observing 10) is po = No/N, where NOis the number of occurrences of 10) in N runs. Similarly, the probability of observing the state 11) is p l = N1/N, where N I is the number of occurrences of the state 11). Naturally, N = NO N1 and po + P I = 1. In the symmetric situation, po = p l = 0.5, by checking only one of the bins an observer gains on the average l b of information. That is, the amount of information equivalent t o the answer of a yes or no question. In the classical theory, the ball being in the right or the left bin has nothing to do with the observer checking the bins. Even if the observer is not interested in checking the bins, the event has occurred and the result does not change. Let us now consider the case where the observer has adjusted the pin so that the ball always falls into the left bin. In this case po = 0 and
+
728
INFORMATION THEORY
= 1. Since the observer is certain that the ball is in the left bin, by checking the bins, left or the right, there will be no information gain. In other words, the observer already knows the outcome and no matter how many times the pinball machine is run, the result will not change. This is a case where the observer may respond as “What else is new?” If the pin is adjusted so that p l is very close to zero, but not zero, then among N observations, the observer will find the ball in the left bin only in very very few of the cases. Now the response is more likely to be ‘LWow!” The classical information theory also defines an objective i n f o r m a t i o n value-as Roederer calls it, the “Wow” factor. It is also called the novelty value or the i n f o r m a t i o n content, which is basically a measure of how surprising or how rare the result is t o the observer. Information value, I , will naturally be a function of how probable that event is among all the other possible outcomes, that is, I ( p ) . To find a suitable expression, let us call the information value of observing the state 10) as 10 and 11) as 11.For the case where we adjusted the pin so that the ball always falls into the same bin, that is, pi = 1 (i = 0 or l), we define the information value as I, = 0. This is so, since the result is certain and no information is to be gained by an observation. We also define Ii -+ 00 for the cases where pi + 0. For independent events, a, b, . . . , we expect the information value t o be additive, that is, I = I , + I b + . . . . From the probability theory, we know that for independent events with probabilities p a , p b , . . . , the total probability is given as the product p = papb.. . , hence we need a function satisfying the relation
p1
+ Ib(pb) + . = I(&) + I(Pb) + . . .
I(paPb . . . ) = Ia(Pa)
‘ ’
‘
(17.2)
We also require the information value to satisfy the inequality I ( p i ) > I ( p j ) , when pi > p j . A logarithmic function of the form I ( p ) = -Clogp, C = const.
(17.3)
satisfies all these conditions. Since p 5 1, the negative sign is needed to ensure I > 0. To determine the constant C, we use the symmetric case of the pinball machine, where the average information gain is l b . Setting I(0.5) to 1, namely I(0.5) = -Cl0g(O.5) = 1,
(17.4)
we find C = 1/ log 2. Thus, the information value for the pinball machine can be written as 1
I a. -- -I log Pi log 2
= -log,pi,
i
=0
or 1,
where log, is the logarithm with respect to base 2.
(17.5)
CLASSICAL INFORMATION THEORY
729
H
Figure 17.2
Shannon’s H function for the pinball machine.
17.2.1 Prior Uncertainty and Entropy of Information Since Shannon and Weaver were interested only in the information content of the message received with respect to the set of all potentially possible messages that could be received, they called I the information content. One of the most important quantities introduced by the Shannon’s theory answers the question: Given the probabilities of each alternative, how much information on the average can we expect to gain when one of the possibilities is realized beforehand? In other words, what is the prior uncertainty of the outcome? For this, Shannon introduced the entropy of the source of information or in short the entropy of information, H , as
(17.6) Note that H is basically the average information value or the expected information gain. For the pinball machine, if the probability of finding the ball in the left bin is p , then the probability of not finding it is (1 - p ) , which gives the H function as (17.7) For a pinball machine with 2 possible outcomes, the entropy of information, H , is shown in Figure 17.2. For the symmetric case, p = 0.5, the expected gain of information is l b . When p = 0 or p = 1, H is zero since there is no prior uncertainty and we already now the result. We can extend our road map or the pinball machine to cases with more than two junctions or pins. At each junction or pin there will be two possibilities.
730
INFORMATION THEORY
Figure 17.3
Pinball machine with 4 possible equiprobable outcomes.
In general we can write H as N-I
(17.8) i=O
where N is the number of possible final outcomes and Cipi = 1. When all the possible outcomes are equiprobable, H has an absolute maximum
H = log2 N .
(17.9)
From the above definition [Eq. (17.8)], it is clear that H is zero only when all pi except one is zero. Since pi = 1, the nonzero pi is equal to 1. Note that when pi = 0, we define p i log, pi as 0. In the case of four equiprobable possible outcomes (Fig. 17.3), probability of finding the ball in one of the bins in the final level is 1/4. In this case H = 2b.
xi
Example 17.1. Pinball machine: In the binary pinball machine, if the pin is adjusted so that p~ = 2/3 and p1 = 1/3, the expected information gain, H , is
H
2 3 3 2 = 0.92b.
= - log2 -
+ -13 log2 3 ( 17.10)
Example 17.2. Expected information gain: With the prior probabilities 0.3, 0.2, 0.2, 0.1, 0.2,
(17.11)
731
CLASSICAL INFORMATION THEORY
H is found as
H = - [0.3log2 0.3
+ 3(0.2) log2 0.2 + 0.1 log2 0.11 ( 17.12)
= 2.25b.
Similarly, for a single roll with an unbiased die, which can be viewed as a pinball machine with a single pin, where the ball has 6 all equiprobable paths to go, the value of H is obtained as
(17.13)
= 2.583.
Example 17.3. Scale of H : We have the freedom t o choose the basis of the logarithm function in the definition of H . In general we can use log,, where T is the number of possible outcomes. In the most frequently used binary case, T is 2. However, a change of basis from T to s is always possible. Taking the logarithm of 2 = T ' O ~ Tto~ the base s, we write log, z = [log, T ] log,
2,
z > 0,
( 17.14)
and obtain the scale relation between H, and H , as
H,
=
( 17.15)
[log, T ] H,.
We can now use logarithms with respect to base 10 to find
= 3.32Hlo.
H2
as
(17.16)
In what follows, unless otherwise specified, we use the binary basis. We also use the notation where log means the logarithm with respect to base 10 and In means the logarithm with respect t o base e as usual. 17.2.2
Joint and Conditional Entropies of Information
The Case of Joint Events: Consider two chance events, A and B , with m and n possibilities, respectively. Let p ( i , j ) be the joint probability of the i t h and j t h possibilities occurring for A and B , respectively. For the joint event we write the entropy of information as m
n
(17.17)
732
INFORMATION THEORY
For the individual events, A and
B,we have, respectively, (17.18)
(17.19)
Comparing
with Equation (17.17) it is easily seen that the inequality
H ( A ,B)5 H ( A ) + H ( B )
(17.20)
holds. Equality is true for independent events, where (17.21) Equation (17.20) says that the uncertainty or the average information value of a joint event is always less than the sum for the individual events. The Case of Conditional Probabilities: We write the conditional probability [Eq. (16.27)] of B assuming the value j after A has assumed the value i as (17.22) where the denominator is the total probability of A assuming the value i that is acceptable by B.The conditional entropy of information of B, that is, H ( B / A ) ,is now defined as the average of the conditional entropy: (17.23) as
m
n
i
j
(17.24)
CLASSICAL INFORMATION THEORY
733
1x7,
where in the last step we have substituted p ( i , j ) for p ( z , j ’ ) ] p ( j / i ) [Eq. (17.22)]. The quantity H ( B / A ) is a measure of the average uncertainty in B when A has occurred. In other words, it is the expected average information gain by observing B after A has happened. Substituting the value of p ( j / i ) [Eq. (17.22)] and using Equation (17.18), and after rearranging, we obtain
( 17.25)
= H ( A ,B ) - H ( A ) .
Thus.
H ( A ,B ) = H ( A ) + H ( B / A ) .
(17.26)
Using Equations (17.20) and (17.26), we can write the inequality
H(A)
+ H ( B ) 2 H ( A ,B)= H ( A ) + H ( B / A ) ,
( 17.27)
hence
H ( B )2 H(B/A).
(17.28)
In other words, the average information value to be gained by observing B after A has been realized can never be greater than the average information value of the event B alone. Entropy of Information for Continuous Distributions: For discrete set of probabilities, p l , p l , . . . , p n , the entropy of information was defined as n
H = -Cp.10g2pi.
(17.29)
i
For continuous distribution of probabilities, H is defined as
JI, co
H
=
-
P ( Z ) log2 P ( Z )
dx.
(17.30)
For probabilities with two arguments we can write H ( x ,y) as (17.31) Now the conditional probabilities become (17.32) (17.33)
734
Pi
1/23
1/23
1\23
1/23
1/23
1/8
1/23
1/23
Ii
3
3
3
3
3
3
3
3
Figure 17.4
Car and driver with 8 possible targets.
Example 17.4. H for the Gaussian distribution: Gaussian probability distribution is given as (17.34)
We can find H as
(17.35)
CLASSICAL INFORMATION THEORY
735
17.2.3 Decision Theory We now go back to our car and driver model. For the sake of argument we use three junctions or “decision” points shown as 1,2 and 3 (Fig. 17.4). At each junction the road bifurcates and the driver has to decide which way t o go. This allows us to add purpose, decisions, and strategy into our model, where the driver has a specific target, say, a friend in terminal 3. When pi = 1/8, i = 1,.. . , 8 , we have the maximum entropy of information of this system, which is 3b: 8
8
i=l
i=l
(17.36) If the driver is not given any instructions and flips a fair coin at each junction to decide which way to go, the probability of reaching terminal 3 in a given try is 1/8. Reaching terminal 3 will naturally require numerous tries involving backtracking and recalling past decisions and then reversing to the other alternative. On the average, only one out of 8 tries the driver will be successful. Once the terminal 3 is reached, the driver will have acquired information with the information value I3= 3b: 13 =
-
1 -log, 8
1 log0.125 log 2
= 3b.
(17.37)
If the driver is given the sequence of instructions about which way to turn at the junctions, like right - left - right, the information content of which is 3b, then he or she can reach the desired terminal with certainty. However, the information expected to be gained by the driver when he or she gets there is now zero. The information given by the driver’s friend has removed all the prior uncertainty. Now, consider a case where some of the roads are blocked as shown in Figure 17.5. Now the H value of the system is
i=l
=
=-[&I
-
i= 1
+ 2(0.125) log, 0.1251 [3(0.25) log0.25 + 2(0.125) logO.125]
[3(0.25)log, 0.25
= 2.256.
(17.38)
Since the prior uncertainty has decreased, H is naturally less than 3b. However, the driver still has two make three binary decisions. Hence, the needed
736
INFORMATION THEORY
Pi
114
0
118
118
0
0
114
114
Ii
2
0
3
3
0
0
2
2
Figure 17.5
Car and driver with some of the roads blocked.
information to reach the terminal 3 is still worth 3b. This is also equal to the information (novelty) value, 13, of terminal 3. The difference originates from the fact that in Shannon’s theory, H is defined independent of purpose. It is basically the average information expected to be gained by an observer from the entire system. It doesn’t matter which terminal is intended and why. On the other hand, if the driver aims t o reach terminal 7, then there are only two binary decisions to be made, left - right; hence the amount of information needed is 2b. Note that a t junction 2B (Fig. 17.5) the right road is blocked. Hence, there is no need for a decision by the driver. The information value, 17, of terminal 7, is also 2b. 17.2.4
Decision Theory and Game Theory
To demonstrate the basic elements of the decision theory and its connections with the game theory, we consider a case where a merchant has to decide whether he/she should expand his/her business or not. In this case, each terminal of the decision tree has a prize/profit or penalty/loss waiting for our subject. Advisors tell that if the merchant expands the electronic goods department (EGD), in recession (R) he/she will lose $50,000. However, if the economy remains good (EG), the electronic goods department will make $120,000. On the other hand, if the merchant expands the household items department (HHI), and gets caught in recession, he/she will lose $30,000 but if
CLASSICAL INFORMATION THEORY
737
-$50000$120000 -$30000 $80000 Figure 17.6 Decision tree for the merchant.
the economy remains good, he/she will make $80,000. Finally, if the merchant does not expand (DNE), in recession, he/she will make $2,000; and if the economy remains good, he/she will make $30,000. This merchant thinks that the probabilities for the economy going into recession and remaining good are 2/3 and 1/3, respectively. The merchant also thinks that if he/she expands, the probabilities for the electronic goods and household items departments outperforming each other are 1/3 and 2/3, respectively. What should this merchant decide? We can draw the decision tree shown in Figure 17.6. We can calculate the merchants expected losses or gains for the next fiscal year as follows: If the merchant expands the electronic goods department, the expected gain is 1 2 3 3
- (50,000) -.-
11 + (120,000) -.= 2,222. 3 3
(17.39)
If the merchant expands the household items department, the expected gain is
.
(;)
= 4,444
(17.40)
738
INFORMATION THEORY
If the merchant does not expand, the expected gain is 2,000
(9 (3 -
$30,000
-
= 11,333.
(17.41)
Since the merchant’s expected profit in the last option, $11,333, is greater than the previous two cases, this merchant should delay expanding capacity for another year. This method is called Bayes’ criteria and works when we can identify the decision points and assess the probabilities. If the merchant has no idea about the probabilities and if he/she is a pessimist afraid of losing money, then the merchant should decide t o wait for another year and avoid the risk of losing $50,000. This is called the minimax criteria. That is, you minimize the maximum expected loss. Minimax criteria is aniorig the many that can be used in such situations. In this example, the merchant appears as if he/she is playing a game with the economy. The merchant has four moves: expand or wait and if he/she decides to expand, expand the electronic goods department or the household items department. The merchant also gets to make the first move. On the other hand, economy has two moves, go into recession or remain good, and it does not care about what the merchant has decided. For a game with two players, A and B , each having two moves, a l , a2 and b l , b2. rcspectively, we can write the following payoff matrix, which is also called the normal form representation : Player A
In this representation, L(ai,b j ) is the loss function for the player A, when A chooses strategy ai and B chooses strategy b j , where i, j = 1 , 2 . For the player A , the loss function, L ( a i,b j ) , is positive for losses and negative for gains and vice versa for the player B . Since whatever one player wins, the other player loses, such games are called zero-sum games. In other words, there is no cut for the house and no capital is neither generated nor lost. Zero-sum games could be symmetric or asymmetric under the change of the identities of the players. That is, in general, L(ai, b j ) # -L(bj, a i ) . Depending on which player makes the first move, the above payoff matrix can also be shown with the decision trees in Figure 17.7. These are called extensive form representations. In extensive form games, players act sequentially and they are aware of the earlier moves made. There are also games where players act without knowing what their opponent has decided. Games where players act simultaneously are essentially games of this type. Games where the players act with incomplete information about their opponents moves have also been
CLASSICAL INFORMATION THEORY
B
A
Figure 17.7 1,2.
739
Decision trees for the players A and B , where Lij = L(ai, b j ) , i , j =
designed. In such games, normal form representation is usually preferred over the extensive form. Let us now consider the game depicted in Figure 17.8, where the player A makes the first move at the first decision point, where he/she has two alternatives, a1 and u2. At the second decision point, player B gets to make his/her move with four choices: b l , 62 when A decides a1 and b3, b4 when A decides a2. We now introduce two random variables, x and y,that can only take the values 0 and 1 at the decision points 1 and 2, respectively. The decision function, dl(x), at point 1 is defined as
dl(X)
=
{
a1,
z = 0,
u2,
z = 1.
(17.43)
Similarly, the decision functions for the player B are defined as
Dl(Y) =
{
bi,
y=0, D2(Y) =
b2,
y = 1,
{
b3,
y = 0,
bq,
y = 1.
(17.44)
In this game the random variables can be tied to the actions of a third element. For example, in designing a strategy for a political confrontation one often has to factor in the potential responses of other countries. Let the player A assign the probabilities p and q for the potential actions of this third element, which affects the decisions of both A and B (Fig. 17.8). To compare the merits of all these decisions for A, we also define the expected risk/loss function R(ai,b j ) as
R ( a z , b , ) = E{L(dl(X),D,(Y))}
7
(17.45)
740
INFORMATION THEORY
A
Ll 1
Figure 17.8
Ll2
L21
Statistical game, where
Lij
L22 = L ( a i ,b j ) , i, j = 0 , l .
where the expected value, E , has to be taken with respect to the random variables z and y as
Player A can now write the payoff matrix
and minimize the expected maximum losses or risks by using the minimax criteria.
Example 17.5. Normal form games and payofl matrices: In decision theory, given the alternatives, it is important to make a n informed choice. Depending on the situation, costs or gains could stand for many different things and the payoff matrices can be made as complex as one
741
CLASSICAL INFORMATIONTHEORY
B
Figure 17.9
Decision trees for the players A and B in Example 17.5.
desires. Consider the following payoff matrix: Player A
Player B I bl
I
7
I
-4
Ib2I 3 I 5
1, I
(17.46)
The corresponding decision trees are given in Figure 17.9. When no probabilities can be assigned, using the first decision tree and the minimax criteria, player A chooses strategy a2 to avoid the risk of losing 7 points in case everything goes bad (remember that plus sign is for losses for player A ) .
Example 17.6. Another game and strategy f o r A: Let us consider player A in the second decision tree (Fig. 17.9, right). Now, B makes the first move and A has to decide without knowing what B has decided. Since A acts without seeing his/her opponent’s move, we connected points 2 and 3 by a dotted line. Another way to look at this game is that A and B decide simultaneously. We now show how A can use a random number generator to minimize his/her maximum loss by adjusting the odds for the two choices, a1 and a2, as y and (1 - y) a t points 2 and 3, respectively. If B chooses strategy b l , then A can expect t o lose Ebl =
7y - 4(1 - y)
(17.47)
742
INFORMATION THEORY
Figure 17.10
Ebl and
Eb2
points. If B chooses strategy b2, then A can expect to lose
Eb2 = 3y 4-5(1 - y)
(17.48)
points. If we plot E b l and Eb2 as shown in Figure 17.10, we see that A can minimize his/her maximum expected loss by following strategy a1 , 9 out of 13 times, and by following strategy a2,4 out of 13 times. If this is a one-time decision, A can use a random number generator to pick the appropriate strategy with the odds adjusted accordingly. The theory of games is a relatively new branch of mathematics closely related to the probability theory, information theory and the decision theory. Since the players and payoffs could have many different forms, it has found a wide range of applications to economic phenomena such as auctions, and bargaining. Other important applications are given in biology, computer science, political science, and philosophy (Szabo and Fath; Miller and Miller; Bather; Osborne) .
17.2.5
Traveler’s Dilemma and Nash Equilibrium
An interesting non-zero-sum game that Basu introduced in 1994, where each player tries to maximize their return with no concern to what the other player is getting, has attracted a lot of attention among game theorists. Even though the game can be played among any number of players, it is usually presented in terms of two players as the traveler’s dilemma.
CLASSICAL INFORMATION THEORY
743
An airline loses two pieces of luggage, both of which contain identical antiques that belong to two separate travelers. Being afraid that the travelers will claim inflated prices, the airline manager separates the passengers and tells them that the company is liable for up to $100 per luggage and asks them to write an integer between and including 2 and 100. The manager also adds that if they both write the same number, the company will honor that as the actual price of the antique and pay both travelers that amount. In case they write different numbers, the company will take the lower number as the actual price of the antique and pay that amount to both travelers. However, in this case the company will deduct $2 from the traveler who wrote the larger number as penalty and add $2 as bonus to the traveler who wrote the smaller number. According to these rules, we can now construct the following payoff matrix, where the first column represents the choices for the first traveler, John, and the first row represents the choices for the second traveler, Mary. The numbers in parentheses represent the reimbursements that John and Mary will receive, respectively. Payoff matrix for the traveler’s game:
For example, the numbers in the third column of the second row is (5,1), which means that when John chooses 3 and Mary chooses 4, John gets $5 and Mary one gets $1. In this case the lower number is 3; hence they both get $3 but since John gives the lower number, 3 , a $2 bonus is added to his reimbursement, while a $2 penalty is deducted from the Mary’s, thus bringing their total reimbursements to $5 and $1, respectively. The question is, What numbers or strategy should the travelers choose? Assuming that both travelers are rational, let us see what the game theory predicts. We start with John. Since his aim is to get the maximum possible amount as reimbursement, he first thinks of writing 100. However, he immediately realizes that for the same reason Mary could also write 100. Hence, by lowering his claim to 99, John expects to pick up the $2 bonus, thus increasing his return to $101. Then on a second thought, he realizes that Mary, being a rational person like himself, could also argue the same way and write 99, thus
744
INFORMATION THEORY
reducing his return to $99. Now, John could do better by pulling his claim down to 98, which will allow him to get $100 with the bonus. Continuing along this line of reasoning, John cascades down to the smallest number 2. Since they are both assumed to be rational, Mary also comes up with the same number. Hence, the game theory predicts that both travelers write the number 2, which is the Nash equilibrium for this game. However, in practice almost all participants pick the number 100 or a number very close to it. The fact that the majority of the players get such high rewards by deviating so much from the Nash equilibrium is not easy to explain mathematically. Since it is not possible to refer to the vast majority of people as being irrational, some game theorists have questioned the merits of this game, while others have proposed various modifications. In analyzing payoff matrices a critical concept is the Nash equilibrium. If each player has chosen a strategy and no player can improve his or her situation unilaterally, that is, when the other players keep their strategy unchanged, the set of strategies and the corresponding payoffs correspond to a Nash equilibrium. In 1950 Nash showed in his dissertation that Nash equilibrium exists for all finite games with any number of players. There is an easy way to identify Nash equilibrium for pure strategy games, which is particularly helpful when there are two players with each player having more than two strategies available. In such cases, formal analysis of the payoff matrix could be quite tedious. For a given pair of numbers in a cell, we simply look for the maximum of a column and check if the second member of the pair has a maximum of the row. When these conditions are met, as in the cell with ( 2 , 2 ) in the traveler’s dilemma, then that cell represents the Nash equilibrium. An N x N payoff matrix can have N x N pure strategy Nash equilibria. In the traveler’s dilemma there is only one Nash equilibrium. A mixed strategy is a strategy where the players make their moves randomly according to a probability distribution that tells how frequently each move is to be made. A mixed strategy can be understood in contrast t o a pure strategy, where players choose a certain strategy with the probability 1. For example, in the traveler’s dilemma game, if at least one of the travelers chooses his or her number by using a random number generator, then the game is called a mixed strategy game. The concept of stability is very important in physical systems and has been investigated for many different kinds of equilibrium. Stability of Nash equilibria in mixed strategy games can be defined with respect t o infinitesimal variations of the probabilities as follows: In a given Nash equilibrium, when the probabilities for one of the players are varied infinitesimally, the Nash equilibrium is stable (i) if the player who did not change has no better strategy in the new situation and (ii) if the player who did change is now playing strictly with a worse strategy.
CLASSICAL INFORMATION THEORY
745
When these conditions are met, a player with infinitesimally altered probabilities will quickly return to the Nash equilibrium. An important point is that stability of the Nash equilibrium is related to but not identical with the stability of a strategy. The dilemma in the traveler’s dilemma game is in our difficulty in explaining why people choose something which the game theory deems as irrational and yet get such high rewards. With the hopes of coming up with an explanation, game theorists have introduced a number of different equilibrium concepts like strict equilibrium, the rationalizable solution, perfect equilibrium, and more. Yet in all these cases one reaches the prediction (2,2) for the traveler’s dilemma (Basu). Game theory assumes that both travelers are rational and have the time to construct and analyze the payoff matrix correctly. The fact that so many players decide diametrically opposite t o what the theory predicts and do well means that there are other important factors in the decision making process that we use. In real-life situations, aside from being rational, we can also assume that, on the average, people are honest and will not view this as an opportunity to make extra cash, hence would be glad to get out even. However, in order to be able to exercise this option, players need to have access to a critical piece of information, which the game lacks-that is, how much they have actually paid for the antique. A realistic modification of the game would be to give each player a number chosen with a certain distribution representing the actual cost of the antique for that player. Most of the players will be given numbers close t o each other within a reasonable bargaining range, say 10%. Fewer players will have paid a lot more or a lot less than the mean price for various reasons. When these factors are missing from the game, players will naturally view picking any number from 2 t o 100 as their declared rightful choice; hence to maximize their return, they will not hesitate to choose a number very close to the upper limit. Since the players are not given any clue whatsoever about the actual cost of the antique and since they are not taking away the other traveler’s right to write whatever he or she wants, and also considering that the airline company is the strong hand, which after all is at fault, they will have no problem in rationalizing their action. In summary, what is rational or irrational largely depends on the circumstances under which we are forced to make a decision and how much time and information we have available. In some situations, panicking may even be a rational thing to do. In fact, in times of crisis when there is no hope or time for coming up with a strategy that will resolve our situation, to avoid freezing or decision paralysis, our brains are designed to panic. This helps us t o come up with a strategy of some sort with the hopes that it will be the right one. Decisions reached through panic are not entirely arbitrary. During this time, our brain and our subconscious mind goes through a storm of ideas and potential solutions and somehow picks one. Once panic sets in, the decision that comes out is out of our control. All we can do is t o hope that it turns out to be the right one or one that is close to being right. When all else fails, in line with the famous saying A bad decision is better than n o decision, the ability
746
INFORMATION THEORY
to panic may actually be a n invaluable advantage in evolution. Of course, unnecessary and premature activation of this mechanism, when there is still time and the means of coming up with the correct decision, is a pathology that needs to be cured. The thought processes that lead t o the experimental findings of the traveler’s dilemma game still remains unknown. The game has been applied to situations like arms race and competing companies cutting prices, where the players find themselves in slowly but gradually worsening situations. While the game theory and the Nash equilibrium may not pinpoint what the right decision is for a given situation, they may help to lay out how far things can escalate and what some, and only some, of the options are. When one finds a stable Nash equilibrium, there is no guarantee that others, nearby or far away, with much higher rewards do not exist. In such cases, information channels called weak links may give players the hints or the signals of the existence of other equilibria. Once a stable Nash equilibrium is reached, whether one should stay put or decide t o abandon that position and search for other equilibria with potentially higher rewards depends largely on one’s insight, ability to interpret and utilize such weak signals, and courage. When these games are applied t o economics, the payoff is money and in biology it is gene transmission, both of which are crucial in survival. All this being said, the role of leadership in critical decisions can neither be overlooked nor underestimated. It is the leadership qualities that open up new options, notice changing parameters, and create new channels of communication, which to others appear nonexistent or impossible. As evidenced in the behavior of many different complex systems, weak links play a crucial role in the processes of decision making (Csermely). As for the traveler’s dilemma, the two players and the airline manager are parts of the same social network, hence they can never be considered as totally isolated. Their minds continuously gather information through weak links from their collective memory, which has important bearing on their final decisions. For example, one, or both, of the passengers may remember from the news that, once an equally reputable company has declined to pay when both passengers claimed reimbursements very close the high end. Such considerations are important even in the experimental game.
17.2.6
Classical Bit or Cbit
The classical bit is defined as the physical realization of a binary system, which can exist only in two mutually exclusive states and which can be read or measured by an appropriate device. If the two states are equiprobable, the amount of information gained in such a measurement is 1 bit. This is also the maximum amount of information that can be obtained from any binary device. We now introduce Cbit as a device or a register, not as a unit of information. Cbit is a stable device whose state does not change by measurement. However, its state can be changed by some externally driven operation. An
CLASSICAL INFORMATION THEORY
Figure 17.11
747
Necker cubes.
external device that sets or resets a Cbit is called a gate. In preparation to our discussion on its quantum version, Qbit, we represent the states of a Cbit by two orthonormal 2-vectors:
(17.49) As far as the observer is concerned, H refers to the potential knowledge, that is, the knowledge that the observer does not have but expects to gain on the average after the outcome is learned. After the observation is made and the state of the Cbit is found, H collapses t o zero, since subsequent observations cannot change the result for that observer. However, for a second observer, who does not know the outcome, H is still l b . In other words, collapse is in the mind of the observer. Cbit is always either in state 10) or 11) , whether an observation has been made or not does not change this fact. Observer’s state of mind, which is basically a certain pattern of neural networks, wonders between the two possibilities, 10) or 11); and after the observation is made, it collapses to a new state, from “I wonder what it is?” to “I see” or from not knowing to knowing. This reminds us the Necker cubes (Fig. 17.11). If we relax and look continuously at the center of the cube on the left, we see two surfaces oscillating back and front. Since we are looking at the projection of a three-dimensional cube onto a plane, our brain does not have sufficient information t o decide about which side is the closer one. Hence, it shows us the two possibilities by oscillating the surfaces with a definite frequency, which is probably a function of the processing speed of our brain. On the other hand, if we look at the cube on the right, which is still two-dimensional but with the missing information about the surfaces included, the two surfaces no longer oscillate. We shall not go into this any further; however, the inclusion of human mind as an
748
INFORMATION THEORY
information processor always makes the problem much more interesting and difficult. For nontrivial calculations, one needs more than one Cbit. A 2-bit classical system can be constructed by combining two Cbits, ( A ) and I B ) , each of which has its own two possible states. Now the combined state, ( A B ), has four possible states given as
Any measurement will yield 2b worth of information culminating with only one of the above 4 states. In general, one represents the states of n Cbits in terms of 2" orthonormal vectors in 2" dimensions. Since technological applications always involve multiple Cbits, it is worth getting acquainted with the nomenclature. The four states of a 2-Cbit system is usually written in terms of tensor products as 10) @ 10) , 10) @ 11) , 11) c 3 10) , 11)@ 11).
(17.54)
Sometimes we omit @ and simply write (17.55) Other equivalent ways of writing these are
and
lo), , 1 1 ), ~ , 1 3 ) ~ .
( 17.57)
In the last case the subscript stands for the number of Cbits in the system, which in this case is 2 and the numbers 0 , 1 , 2 , 3 correspond to the zeroth,
CLASSICAL INFORMATION THEORY
749
first, second, and third states of the 2-Cbit system. In general, for an n-Cbit system there are 2% mutually exclusive states, hence we write
lx), , where
J:
is integer 0 5 x < 2%.
(17.58)
The analogy with vector spaces and orthonormal basis vectors gains their true meaning only in quantum mechanics, where we can construct physical states by using their linear combinations with complex coefficients in Hilbert space. In classical systems, the only meaningful reversible operations on nCbit systems is the 2n! distinct permutations of the 2%basis vectors. We can write the 4-Cbit state 10)10)11) 10) in the following equivalent ways:
10) @ 10) @ 11) @ 10) = 10) 10)11) 10) = j0010) = la), .
( 17.59)
In general a 4-Cbit state is written as
kW2xixn) = b ) 4 ,
(17.60)
where x is a number given by the binary expansion 2 = 823
+ 422 + 221 + 2 0 .
(17.61)
The states are enumerated starting with zero on the right. For example, for the state l0OlO) we have 20= 0, z1 = 1, x2 = 0, 2 3 = 0, thus becomes
x
+ 4.0 + 2.1 + 1.0
= 8.0
= 2.
(17.62)
The tensor product is defined as
(17.63)
For example,
1001) = p)3 =
(i)@ b)@( ( :)=
(17.64)
@ :( ) =
750
17.2.7
INFORMATION THEORY
Operations on Cbits
In quantum information theory all operations on Qbits are reversible. In the classical theory there are only two reversible operations that could be performed on a single Cbit, do nothing and flip, which can be represented as matrices: Do nothing or the identity operator I:
w = (01
l0 ) (
:>=( ;)=lo): (17.65)
Flip operator X:
(17.66) An example for an irreversible operation is the action of the erase operator El:
(17.67) (17.68)
It. is irreversible, since the original states cannot be reconstructed from the output states. We also have the following operators:
w = ( ;. ' ) ( ; ) = ( ; ) = t l ) ,
,,,=(; ZlO)=
(
; ) = - ( :)=-to,, ) ( ; ) = ( ;)=to),
;l)(
01
-1 0
(17.69)
(17.70) Even though these operations are mathematically well-defined, they are meaningless in the classical context. Only the states 10)and 11) have meaning within the context of Cbits. However, operators like Z, which are meaningless for a single Cbit, when used in conjunction with other meaningless operators, could
751
CLASSICAL INFORMATION THEORY
gain classical meaning in multi-Cbit systems. For example, the operator
-(I 1 2
+ ZlZO)
(17.71)
acts as the identity operator for the 2-Cbit states 10) 10) and 11) 11). On the other hand, it produces zero, another classically meaningless result, when operated on 10) 11) and 11) 10) . The subscript on Z indicates the Cbit on which it acts on. For example, on a 4-Cbit state,
the flip operator, XI, acting on the first Cbit is defined as XI = I @ I @ X @ I ,
(17.73)
where
We cannot emphasize enough that we start counting with the zeroth Cbit on the right. Similarly, 1
-(I 2
-
(17.75)
ZlZ0)
is the identity operator for 10) 11) and 11) 10) and produces 0 when operated on 10) 10) and 11) 11). For multiple Cbit systems, another operator which represents reversible operations, is the Swap operator S i j . It exchanges the values of the Cbits represented by the indices i and j :
Another useful operation on 2-Cbit systems is the reversible XOR or the controlled-NOT gate implemented by the operator Clo as
, whenever the control The task of Cl0 is to flip the value of the target bit, 01). bit, [ x i ) ,has the value 1. One can easily show that Clo can be constructed from single Cbit operators as 1 ClO = -(I 2
+ z1+ x0
-
XOZ1).
Other examples of Cbits and their gates can be found in Mermin.
(17.78)
752
17.3
INFORMATION THEORY
QUANTUM INFORMATION THEORY
In the previous sections we have introduced the basic elements of Shannon’s theory and discussed Cbits, which are binary devices with two mutually exclusive and usually equiprobable states. Even though the technological applications of Cbits involve quantum processes like photoelectric effect, semiconductivity, tunneling, etc., they are basically classical devices working with classical currents of particles. Recently, the possibility of using single particles and quantum systems opened up the possibility of designing new information processing devices with vastly different properties and merits. Even though the classical information theory has been around for over 60 years with many interesting interdisciplinary applications, quantum information theory is still at its infancy with its technological applications not yet in sight. However, considering its potential in both theory and practice, quantum information theory is bound t o be a center of attraction for many years to come. In what follows we start with a quick review of the basics of quantum mechanics. 17.3.1
Basic Quantum Theory
A quantum system is by definition a microscopic system, where the classical laws of physics break down and the laws of quantum mechanics have to be used. The most fundamental difference between classical physics and quantum mechanics is about the effect of measurement on the state of a system. Measurement of a system always involves some kind of interaction between the measuring device and the system. For example, to measure the temperature of an object, we may bring it in contact with a mercury thermometer. During the measurement process, the thermometer and the object come to thermal equilibrium by exchanging a certain amount of heat. Finally, they reach thermal equilibrium at some common temperature and the mercury column in the thermometer settles at a new level, which allows us t o read the temperature from a scale. At the end of the measurement process, neither the thermometer nor the object will be at their initial temperatures. However, the amount of mercury inside the bulb is usually so small that it reaches thermal equilibrium with the object with only a tiny amount of heat exchanged, hence the temperature of the object does not change appreciably during this process. This shows that even in classical physics measurement effects the state of a system. However, what separates classical physics from quantum mechanics is that in classical physics these effects can either be minimized by a suitable choice of instrumentation or algorithms can be designed to take corrective measures. In this regard, in classical information theory a Cbit remains in its original state no matter how many times it is measured. It changes its state only by the action of some external devices called gates. In classical physics there are particles and waves. These are mutually exclusive properties of matter. Particles are localized objects with no dimensions,
QUANTUM INFORMATION THEORY
753
while waves are spread out in entire space. One of the surprising features of quantum mechanics is the duality between the particle and wave properties. Electrons and photons sometimes behave like particles and sometimes like waves. Furthermore, nature of the experimental setup determines whether electrons or photons behave as particles or as waves. The de Broglie relation A=-
h
(17.79)
P’
where h = 6.60 x 10-27~m2gs-1is the famous Planck constant, establishes the relation between a wave of wavelength X and a particle of momentum p. Similarly, the Planck formula
E = hu
(17.80)
establishes the particle property of light by giving the energy of photons in terms of the frequency of the electromagnetic waves. In quantum mechanics, measurement on a system changes the state of the system irreversibly and there are limits on the accuracy with which certain pairs of observables can be measured simultaneously. Position and momentum are two such conjugate observables. Heisenberg’s uncertainty principle, which could be considered as the singly most important statement of quantum mechanics, states that position and momentum cannot be determined simultaneously with greater precession than AX&
TI 2
2 -,
(17.81)
where TI = h/27r. Uncertainty principle does not say that we cannot determine the position or momentum as accurately as we want. But it says that if we want to know the momentum of a particle precisely, that is, as Ap + 0, then the price we pay is to lose all information about its position: lim Ax Ap-0
ti
2 ---+ 2AP
m.
(17.82)
In other words, particles begin t o act like pure waves extended throughout the entire space, hence they are everywhere. Similarly, if we want to know the position precisely, then we lose all information about its momentum. As Feynman says, “The uncertainty principle protects quantum mechanics. Heisenberg recognized that if it were possible t o measure the momentum and the position simultaneously with greater accuracy, the quantum mechanics would collapse. So he proposed that it must be impossible.” Since 1926, Heisenberg’s uncertainty principle and quantum mechanics have been victorious over many experimental and conceptual challenges and still maintain their correct status. Mathematical formulation of quantum mechanics is quite different from classical physics and it is based on a few principles:
754
INFORMATION THEORY
(I) The state of a system is completely described by the state vector, defined in the abstract Hilbert space, which is the linear vector space of square integrable functions. State vector is a complex valued function of real arguments. When continuous variables like position are involved, I Q) can be expressed as Q ( x ) .In this form it is also called the wave function or the state function. When there is no room for confusion we use both. The absolute value square of the state function, I S ( x ) l 2 ,gives the probability density, and \Q(z)I2dx is the probability of finding a system in the interval between x and x+dx. Since it is certain that the system is somewhere between -co and co,the state function satisfies the normalization condition
IQ),
2
IQ(x)l dx
=
1.
(17.83)
(11) In quantum mechanics the order in which certain dynamical variables are measured is important. It is for this reason that observables are represented by Hermitian differential operators or Hermitian matrices acting on the state vectors in Hilbert space. Due to their Hermitian property, these operators have real eigenvalues and their eigenvectors, also called the eigenstates, form a complete and orthogonal set that spans the Hilbert space. For a given operator, A, with the eigenvalues ai and the eigenstates I u i ) , or ui(x), we can express these properties as
A lUi) = a2 lu2) , Orthogonality: Completeness:
JC
u,’(x)uj(x)dx = Sij,
ut(x’)ui(x) = ~ ( x-’ x).
(17.84) (17.85) (17.86)
i
Eigenvalues, ai,which are real, correspond to the measurable values of the dynamical variable A. When an observable has discrete eigenstates, its observed value can only be one of the corresponding eigenvalues. Using the completeness property [Eq. (17.86)], we can express the general state vector of a quantum system as a linear combination of the eigenstates of A as (17.87) i
where lui)are also called the basis states. Expansion coefficients, ci, which are in general complex numbers, can be found by using the orthogonality relation [Eq. (17.85)] as (17.88)
755
QUANTUM INFORMATION THEORY
These complex numbers, ci, are called the probability amplitudes, and from the normalization condition (17.83) satisfy the condition (17.89) Now the expectation value, ( A ) ,of a dynamical variable is found as
( A )=
/
Q * ( x ) A Q ( x dx. )
(17.90)
Using the orthogonality relation (17.85), we can write ( A ) as
i
i
(17.91)
A quantum state prepared as (17.92) is called a mixed state. When a measurement is performed on a mixed state of the dynamical variable A , the result is one of the eigenvalues. The probability of the result 2 being the m t h eigenvalue, a,, is given by p , = Ic,~ . The important thing here is that once a measurement is done on the system, it is left in one of the basis states and all the information regarding the initial state, that is, all cis in Equation (17.92) are erased. This is called the collapse of the state vector or the state function. From the collapsed state vector, it is no longer possible to construct the original state vector. That is, all information about the original state contained in the coefficients, cis, are lost irreversibly once a measurement is made. How exactly the collapse takes place is still debated and is beyond the conventional quantum mechanics. Expansion coefficients, cis, in the state function can only be determined by collecting statistical information on the probabilities, p i s , by repeating the experiment over many times on identically
756
INFORMATION THEORY
prepared states. It is also possible to prepare a system in the mth eigenstate of A as l‘zl) =
1%).
(17.93)
In this case, all c,s except c,, which is equal to 1, are zero. Such states are called pure states. We have to remind the reader that for any another observable, Equation (17.93) will not be a pure state, since Iu,) is now the mixed state of the new observable’s eigenstates. Applying all this to Shannon’s theory, we see that for a pure state, that is, p , = 0, i # m, and p , = 1, Shannon entropy, H , is zero-in other words, zero new information. No matter how many times the measurement is repeated on identically prepared states, there will be no surprises. For a mixed state, Shannon entropy is given as N
H = - c p k l o g 2 p k , where k
c
Ick/
2
p k = 1,
=c
k
(17.94)
k
where N is the dimension of the Hilbert space. From a mixed state, the maximum obtainable information is
H = log2 N ,
(17.95)
which occurs when all the probabilities are equal, that is, when pi = 1,” for all i. Note that once the state vector collapses, further measurements of A will yield the same value. In other words, with respect to that observable, the system behaves classically. Using the definition of the expectation value, we can write the uncertainty principle for two dynamic variables, A and B , as
AAAB
1
2 5 IW,ml,
(17.96)
where [ A , B ]= AB - B A is called the commutator of A and B . From the above equation, it is seen that unless A and B commute, AB = BA, they can not be measured precisely simultaneously. Furthermore, if two operators do not commute, AB # BA, then the order in which they are measured becomes important (Merzbacher). In some cases, we can ignore most of the parameters of a quantum system and concentrate on just a few of the variables. This is very useful when we are interested in some discrete set of states like the spin with up/down states or the polarization with vertical/horizontal or as in the quantum pinball machine, which we shall discuss in detail, the path with right/left. In such cases, we can express the state vector as a superposition of the orthonormal basis states, le,) and 1Q2) , corresponding t o the two alternatives as
le) = c1 l‘zll) + c2 led,
( 17.97)
QUANTUM INFORMATION THEORY
757
where 2 lCll
+ lc2l 2 = 1.
(17.98)
When a which-one or which-way measurement is done, pl = lcll 2 and p 2 = lc2I2 are the respective probabilities of one of the basis states, 1Q1) or 1 Q 2 ) , being seen. Statistically speaking, if we repeat the measurement N times on identically prepared states, pl and p 2 will be equal to the following limits:
Nl -,
pl = lim
p2
N
N-cc
N2 N N’
= lim N-cc
= Ni
+
N2,
(17.99)
where N1 and N 2 are the number of occurrences of the states IQl) and 1Q2), respectively. If we do not perform a which-one or which-way measurement on the system, the system will be in the superposed state [Eq. (17.97)) with the probability density given as
+ C2Q2)*
=
(ClQl
=
lC1l2 (Q1I2
= p1 1Q1l2
+
+
(ClQl
C;caQ;Q2
c;c29p2
+ + +
c2Q2)
+
C;clQ;Ql
lc2I2 IQ2I2
+ \q2.
C;clQ;Ql
p2
( 17.100)
The two terms in the middle are the interference terms responsible for the fringes seen in a quantum double slit experiment. In classical physics, the interference terms are absent and the joint probability density reduces to (Feynman et al.) 1QI2 = Pl 1Q1I2
+ P 2 1Q2I2.
(17.101)
In classical physics, regardless of the presence of an observer, the system is always in one of the states: IQ1) or 1Q2). When we make an observation, we find the system in one or the other state with its respective probability, pl or p 2 . However, in quantum mechanics, until a measurement is done and the state function has collapsed, the system is in both states simultaneously, that is, in the superposed state:
IQ)
= c1 IS,)
+c2 1Q2).
(17.102)
Evolution of a state vector is determined by the Schrodinger equation,
a IQ)
H IQ) = ih-
at
,
(17.103)
where H is the Hamiltonian operator, which is usually obtained from its classical expression by replacing 2 and p with their operator counterparts. For example, for a particle of mass m moving under the influence of a conservative
758
INFORMATION THEORY
force field, V(?), the Hamiltonian operator is obtained from the classical Hamiltonian: (17.104) with the replacements
7
-+
7. (17.105)
as
-v2 + V(?). 2m tL2
(17.106)
In technological applications, changes in the state vector,
are usually managed through the actions of reversible transformations, U, called gates, which are represented by unitary transformations satisfying the relation
UUt = I ,
(17.108)
where I is the identity operator and U t is the Hermitian conjugate defined as the complex conjugate of the transpose of U , that is,
Ut = g*.
(17.109)
17.3.2 Single-Particle Systems and Quantum Information To demonstrate the profound differences between the classical and the quantum information processing devices, we start with the experimental setup shown in Figure 17.12. In this setup, we have a light source emitting coherent monochromatic light beam with the intensity I . The beam impinges on a beam splitter, B , and then separates into two, each with intensity I / 2 . Aside from its intensity, the transmitted beam on the left goes through the beam splitter unaffected. The reflected beam on the right undergoes a phase shift of 7r/2 with respect to the transmitted beam. Naturally, the detectors Do and D1 receive the transmitted and the reflected beams, respectively. Next, we consider this experiment in the limit as the intensity of the beam is reduced to almost zero. Technically, it is possible t o control the intensity so that we are actually sending one photon at a time t o the beam splitter, which diverts these photons with equal probability to the left or the right channels. The detectors respond to individual photons with equal probability. For an
QUANTUM INFORMATION THEORY
759
DO Figure 17.12
Quantum pinball machine. T for transmitted and R for reflected.
experiment repeated many times, half of the time DO and the other half of the time D1 will click. So far everything looks like the pinball machine. The difference between the two cases begin to appear when we search an answer to the question: Which path did the photon take? In the case of the pinball machine, the ball has two possibilities; it will go to either the left or the right. The source of randomness is the pin, which diverts the ball to the left or the right with equal probabilities. Once the ball clears the pin, it has a definite trajectory and ends up in either the left or the right bins. The observer just doesn’t know it yet. Whether the observer actually checks the bins or not has no bearing whatsoever on the result. In the case of photons, there are two possible basis states, lQ0) and I Q l ) , corresponding to the eigenstates of the “which-way” operator. State IQO) corresponds to the photon following the left path and 1Q1) corresponds to the photon following the right path. When the detectors are turned off or absent, the photon is in the superposed state
I*)
= co
IQO)
+ c1 IQ1) ,
(17.110)
where co and c1 are in general complex numbers satisfying the normalization condition lcol
2
+ ICll2 = 1.
(17.111)
In other words, the photon is neither in the left channel, l q o ) , nor in the right channel, 1@1), but it is in both of them, simultaneously. To find out which way the photon goes through, we turn on the detectors. One of the detectors
760
INFORMATION THEORY
clicks and the state function collapses to either ~ Q o or ) IQ1). Once the state function has collapsed, the photon is in a pure state, IQo) or l q ~ )During . this process we gain l b of information about the system, but the price we pay is that we have lost (destroyed) the initial state function [Eq. (17.110)] irreversibly. It can no longer be constructed from the collapsed state:
IQ)
=
IQO)
or
IQ)
= 1Ql).
(17.112)
By repeating the experiment many times on identically prepared setups, all we can gain is statistical information about the initial state. That is, square of the absolute values of co and c1, which are related to the probabilities of the initial state vector collapsing to either I@,) or lQ1) as PO = lcol
2
and P I
2
= lc11
.
( 17.113)
In the case of a symmetric beam splitter, the probabilities are equal, po = pl = 112, and the state function can be given in any one of the following forms:
1
IW = 5 [IS,) + lQd1
1 or IS) = 2 [ P o ) - lQ1)l.
( 17.114)
In the classical pinball machine the ball is always in a “pure” state. It is following either the left or the right paths. In other words, it is either in state IQO) or in state I Q l ) . It has nothing to do with the presence of an observer, knowing or not knowing, measuring or not measuring, peeking or not peeking through the cracks of the pinball machine. In classical physics, observation or measurement never has the same dramatic effect on the state of the system that it has in quantum systems. In the following section we discuss how all this can be verified in laboratory through the use of the Mach-Zehnder interferometer.
17.3.3
Mach-Zehnder
Interferometer
In a Mach-Zehnder interferometer (Fig. 17.13), after the first beam splitter, B1, the transmitted and the reflected beams are reflected at the mirrors M L and M R , respectively, and allowed to go through a second beam splitter, B2, before being picked up by the detectors D1 and DO. We refer to the transmitted and the reflected beams of the first beam splitter as the left and the right beams, respectively. We first consider the case of coherent monochromatic beam with intensity I produced by the light source 5’. Keep in mind that each time light gets reflected, it leads the incident wave by a phase difference of 7r/2 and the transmitted wave suffers no phase shift. The beam that gets transmitted at B1 follows the left path and gets reflected at M A , and finally splits into its reflected and transmitted parts at B2. They are joined by the parts of the wave reflected at B1, which follows the right path. The left beam reflected
QUANTUM INFORMATION THEORY
761
*R
Figure 17.13
Mach-Zender interferometer.
at B2 meets the right beam transmitted at B2. Since they both suffered two reflections, they are in phase and interfere constructively t o shine on D1 with intensity I . The part of the left beam transmitted a t B2 is joined by the part of the right beam reflected at B2. Since the right beam has suffered three reflections, while the left beam has suffered only one, they are out of phase by T , thus interfering destructively t o produce zero intensity a t DO.In summary, D1 gets the full original beam with intensity I , while DOgets nothing. Note that all this is true when all the legs of the interferometer are equal in length. We now turn down the intensity so that we are sending one photon a t a time to B1. The experimental result is completely consistent with the macroscopic result; that is, all photons are detected by D1 and no photon is detected by DO. To understand all this in terms of the interference of electromagnetic waves is easy. However, in the case of individual photons we find ourselves in the position of accepting the view that the photon has followed both paths to interfere with itself t o produce no response a t DOand a sure response at D1, From the information theory point of view, we already know the answerthat is, the detector D1-responds for sure. However, we know nothing about which path the photon has followed to get there. To learn the path that the
762
INFORMATION THEORY
p=
114 + 114 = 112
Figure 17.14 Mach-Zehnder experiment with a pinball machine. mutually exclusive paths for the balls seen in Bin 1 are shown on the left.
The two
photon follows, we remove Ba, and either Do or D1 clicks. We now know which path the photon follows, and we gain l b of information in finding that out. In the Mach-Zehnder interferometer, we know exactly which detector responds, that is, D1, but we have no knowledge of how the photon gets there. There is a region of irremovable uncertainty about where the photon is in our device. There is no classical counterpart of this. If we try the same experiment with the classical pinball machine, we see the ball half of the time in bin 1 and the other half of the time in bin 2 (Fig. 17.14, left). This is because the events corresponding to the ball following the left path and the right path to reach bin 1 are mutually exclusive, each with their respective probability of 1/4 (Fig. 17.14, right). If one happens, the other one does not. Thus, their joint probability is given as their sum: = $. A symmetric argument works for the ball reaching bin 2. In other words, no matter how many times we run the experiment, we never see a case where the ball that followed the left path colliding or interfering with itselj, that followed the right path. This is the strange position that we always find ourselves in when trying t o understand quantum phenomenon in terms of our intuition, which is predominantly shaped by our classical experiences.
a+a
QUANTUM INFORMATION THEORY
763
Figure 17.15 Undisturbed paths for the eigenstates of the “which way” operator: I * l e f t ) = * L and l * r ~ g h t ) = *R.
17.3.4
Mathematics of the Mach-Zehnder
Interferometer
In a Mach-Zehnder interferometer let us choose the orthonormal basis states as (17.115) These are the eigenstates of the which-way operator. They correspond to the undisturbed paths, that is, the paths in the absence of both beam splitters (Fig. 17.13). For a photon incident from the right (Fig. 17.15) the left channel is defined as SB1 M L B ~ D owhile , the second one defines the right channel for the undisturbed path for a photon incident from the left: SB1 M R B D1. ~ Strictly speaking, these are meaningless statements. What we need is the full solution of the time-dependent Schrodinger equation, which exhibits the change to the superposition of the right and left path solutions after going through the beam splitter. However, for our purposes it is perfectly all right to work with the reduced degrees of freedom represented by the Left and the right phrases. Between the source, S, and the beam splitter, B1, the photon is in the basis state (Fig. 17.13) (17.116)
764
INFORMATION THEORY
After the first beam splitter, B1,and between B1 and B2, the solution is transformed into the mixed, that is, the superposed state of the transmitted and the reflected parts as / Q I B I B z )= co P l e f t )
+ c1 /QTi,ht).
(17.117)
We now write a mathematical expression for the action of the beam splitter, B1, which acts on state I Q l e f t ) [Eq. (17.116)] t o produce the superposed state [Eq. (17.117)]. Since we use a symmetric beam splitter, we have
( 17.118) We also know that the reflected photon in Figure 17.13 has suffered a phase shift of 7r/2 with respect to the state I Q l e f t ) ; hence without any loss of generality we can take 1 ei7r/2 - i co = - and c1 = -- -
Jz
Jz
(17.119)
Jz'
Thus,
="(;>.i(:)] Jz
(17.121)
We now write a matrix, B, that acts on the initial state [Eq. (17.116)] and produces the mixed state [Eq. (17.121)] as
( 17.122) We can easily verify that (17.123)
1
= - (I
Jz
+ 2X)
(17.124)
accomplishes this task, where (17.125) This can be understood by the fact that half of the incident wave goes unand the identity affected into the left channel, thus explaining the factor operator I, while the other half gets reflected into the right channel with a
&
QUANTUM INFORMATION THEORY
765
phase shift of 7~12,thus explaining the flip operator, X, and the phase shift factor eiTI2: (17.126) Note that the left channel is the path followed by the undisturbed photon incident from the right and vice versa. Some of the important properties of B are: (i) It is a transformation not a n observable. (ii) Since BBt = I, where B t = B*, it is a unitary transformation. (iii) Since it is a unitary transformation, its action is reversible with its inverse given as B-' = B t . (iv) B2 = iX. After the second beam splitter, B2, the final state of the photon between B2 and the detectors Do or D1 is given as
-
IQ'BzD1 or
2)
=
B IQB*Bz) = BB lQSB1)
ix I*SBI) =ix( =
:,)
=i(
y)
(17.127) ( 17.128) (17.129)
( 17.130)
Notice that the phase factor, i = eiTl2, is physically unimportant and has no experimental consequence. However, we shall see that the individual phases in a superposition are very important. We now introduce the channel blocker represented by the operators
E L ? = ( 01 0 ) a n d E L = ( 0 01 ) .
(17.131)
They represent the actions of blocking devices which block the right and the left channels, respectively. They eliminate one of the components in the superposition and lets the other one remain. Another useful operator is the phase shift operator @(4), (17.132) which introduces a relative phase shift of 4 between the left and the right channels. Note that we have written @(4)symmetrically so that the operator delays the left channel by 4/2, while the right channel is advanced by 4/2. In other words, its action is equivalent to (17.133)
766
INFORMATION THEORY
If we insert an extra phase shift device between the two beam splitters, we can write the most general output state of the Mach-Zehnder interferometer as IQoutput)
= BQ?(4)BI Q i n p u t ) .
For a wave incident from the right,
1
1)Jloutput)
=2 [(l - 2 4 )
(Qinput )
=
(17.134)
(3,
this gives
( ;) + i ( l + ( ;)] . 24)
(17.135)
Other commonly encountered operators are
I=
X
=
Z= Y=
( h ;) ( ; ;) (0 ) -1 XZ ( )
(17.136)
: identity, :
( 17.137)
shifts the phase of
(17.138)
:
-1
=
flips two basis states,
0
:
phase shift by
T
followed by a flip,
(17.139)
Among these, I, X, Y, Z, and @ are unitary operators, while E is an irreversible operator. Notice that unitary transformations produce reversible changes in the quantum system, while the changes induced by a detector are irreversible. In addition to these, there is another important reversible operator called the Hadamard operator, which converts a pure state into a mixed state:
=
1
-(X+ Z).
Jz
(17.140)
Hadamard operator, H, is a unitary operator with the actions
(17.141) and
l"L)=H(
!)=&[(;)-( ;)]
(17.142)
Hadamard operator is the workhorse of quantum computing, which converts a pure state into a superposition.
QUANTUM INFORMATION THEORY
17.3.5
767
Quantum Bit or Qbit
We have defined a Cbit as a classical system with two mutually exclusive states. It is also possible to design binary devices working at the quantum level. In fact, Mach-Zehnder interferometer is one example. A Qbit is basically a quantum system that can be prepared in a superposition of two states like 10) and 11). The word superposition on its own implies that these two states are no longer mutually exclusive as in Cbits. Furthermore, in general, quantum mechanics forces us to use complex coefficients in constructing these superposed states. We used the word forces deliberately, since complex numbers in quantum mechanics are not just a matter of convenience, as they are in classical wave problems, but a requirement imposed on us by nature. Real results that can be measured in laboratory are assured not by singling out the real or the imaginary part of the state function, which in general cannot be done, but by interpreting the square of its absolute value, 1912,as the probability density and by using Hermitian operators that have real eigenvalues to represent observables. The superposed state of a Qbit is defined as
where co and c1 are complex amplitudes, which satisfy the normalization condition:
lcol
2
+
2 lCll
(17.144)
= 1.
If we write co and c1 as
co = aoeial, c1 = boeibl,
a0
> 0, bo > 0,
(17.145)
we obtain
Since it has no observable consequence, we can ignore the overall phase factor
eial. To guarantee normalization [Eq. (17.144)], we can also define two new real parameters, 8 and 4, as
a. = cose, bo = sin8 and (bl
-
a l ) = 4,
(17.147)
so that 19)is written as
I 9)= cos 8 10)
+ sin 8e24 11).
(17.148)
Note that the probabilities, 2
2
2
po = lcol = cos 8 and p l = IclJ = sin20,
(17.149)
768
INFORMATION THEORY
are not affected by the phase,
4,at all. po =p1
When 0 = ~ =
1 2
-.
1 4we , have (17.150)
This is the equiprobable case with the Shannon entropy of information, H , of one bit, which is the maximum amount of useful average information that can be obtained from any binary system. In other words, whether we measure one Qbit or make measurements on many identically prepared Qbits, the maximum average information that can be gained from a single Qbit is one bit. This does not change the fact that we need both 0 and 4 to specify the state of a Qbit completely. What happens to this additional degree of freedom and the information carried by the phase 4? Unfortunately, all this wealth of information carried by the phase is lost irreversibly once a measurement is made and the state function has collapsed. Once a Qbit is measured, it behaves just like a Cbit. Given two black boxes, one containing a Cbit and the other a Qbit, there is no way to tell which one is which. If you are given a collection of 1000 Qbits prepared under identical conditions, by making measurement on each one of them, all you will obtain is the probabilities, PO and p l , deduced from the number of occurrences, No and N1, of the states 10) and 11), respectively, as (17.151) Furthermore, there is another restriction on quantum information, which is stated in terms of a theorem first proposed by Ghirardi. It is known as the no-cloning theorem, which says that the state of a Qbit cannot be copied. In other words, given a Qbit whose method of preparation is unknown, the no-cloning theorem says that you cannot produce its identical twin. If it were possible, then one would be able to produce as many identical copies as needed and use them to determine statistically its probability distribution as accurately as desired, while still having the original, which has not been disturbed by any measurement. In other words, there is absolute inaccessibility of the quantum information buried in a superposed state. You have to make measurement to find out, and when you do, you destroy the original state with all your expected gain as one bit. Is there then no way to harvest this wealth of quantum information hidden in the two real numbers 0 and 4? Well, if you do not temper with the quantum state, there is. The no-cloning theorem says that you cannot copy a Qbit but you can manufacture many Qbits in the same state and manage them through gates representing reversible unitary operations and have them interact with other Qbits. As long as you do not meddle in the inner workings of this network of Qbits, at the end you can extract the needed information by a measurement, which itself is an irreversible operation. If we act on a
QUANTUM INFORMATION THEORY
769
superposition,
(17.152) with the Hadamard operator [Eq. 17.140)], H, we get IQsup.2)
=H =
l%lp.l)
;)]
HZ[( i ) + e i 4 (
(17.153)
( 17.154) which is another superposition, ities for the state IQsup.l),
with different phases. The probabil-
IQsup.2),
1 P o = - ,2
1 2'
(17.155)
p1=-
has now changed with the second state,
I Q s u p. % ) ,
1 po = -(1+ cos4), p1 2
=
to 1 -(12
COS@),
(17.156)
thus demonstrating how one can manipulate these phases with reversible operators and with observable consequences. Other relations between reversible operators that are important for designing quantum computing systems can be given as (Mermin)
xz = -zx,
( 17.157) (17.158)
HXH = Z, HZH
=X =
(17.159) -iB2
( 17.160)
On the practical side of the problem, it is the task of quantum-computational engineering to find ways to physically realize and utilize these unitary transformations. For practical purposes, most of the existing unitary transformations are restricted to those that act on single or at most on pairs of Qbits. An important part of the challenge for the software designers is to construct the transformations that they may need as combinations of these basic elements. Any quantum system with binary states like 10) and 11) can be used as a Qbit. In practice, it is desirable to work with stable systems so that the superposed states are not lost through decoherence, that is, through interactions with the background on the scale of the experiment. Photons, electrons, atoms, and quantum dots can all be used as Qbits. It is also possible to use internal states like polarization, spin, and energy levels of an atom as Qbits.
770
INFORMATION THEORY
17.3.6
The No-Cloning Theorem
Using the Dirac bra-ket notation and the properties of inner product spaces introduced in Chapter 5, we can prove the no-cloning theorem easily. Let a given Qbit to be cloned, called the control Qbit, be in the state ( Q A ) . A second quantum system in state Ix) , called the target Qbit, is supposed to be transformed into ~ Q A )via a copying device. We represent the initial state of the copying device as I@). The state of the composite system can now be 1 ~ I@)). ) Similar to an office copying device, the whole process written as IQA) should be universal and could be described by a unitary operator U,. The effect of Uc on the composite system can be written as (17.161) where I @ A ) is the state of the copier after the cloning process. For a universal copier, another state, ~ Q B ) not , orthogonal to ~ Q A ) ,is transformed as
where all the states are normalized, that is,
(17.163)
s_',"
Note that in bra-ket notation the inner product of two states, Q*(z)@(z)dz, is written as (Q I@). Since IQA) and IQB) are not orthogonal, we have ( Q A IQB)
# 0.
(17.164)
From the properties of the inner product, we also have the inequalities
I(@A
I@B)I
51 and
IQB)~
~(QA
5 1,
(17.165) t
where the bars stand for the absolute value. Since a unitary operator, U, U, = I , preserves inner product, the inner product of the composite states before the operation:
(@I
(XI
(QAl
u,'uc1 Q B ) Ix) I@) = ( Q A
IQB)
( x Ix) (@ I@) = ( * A
IQB) 1
(17.166) has to be equal to the inner product after the operation. Hence we can write ( Q A I Q B ) = ( Q A I q B ) ( Q A I Q B ) ( @ A I@B)
= (*A 1@B)2 ( @ A (@B)
( 17.167)
QUANTUM INFORMATION THEORY
771
or = ( Q A IQB) ( @ A
I@B)
(17.168)
Taking absolute values, this also becomes (17.169) Since for nonorthogonal states the inequalities I ( Q A IQB) I 5 1 and I ( @ A I@B) 1 are true, the equality in Equation (17.169) can only be satisfied when I*A)
=
Is,).
I5
(17.170)
Hence, the unitary operator U, does not exist. In other words, no machine can make a perfect copy of another Qbit state, IXPB), that is not orthogonal to I Q A ) . This is called the no-cloning theorem (for other types of cloning see Audretsch; BruP and Leuchs).
17.3.7
Entanglement and Bell States
Let us consider two Qbits, A and B , with no common origin and interaction between them. We can write their states as
and
where 10) and 11) refer t o their respective basis states: (17.173)
Each pair of amplitudes satisfy the normalization condition separately as IQAl 2
+ IPAI2 = 1
( 17.174)
+ IPBI2 = 1.
(17.175)
and 2
IaB/
Since there is no interaction between them, both of them preserve their identity. Their joint state, ~ X A B ) is , given as the tensor product / X A )@ I x B ) , which is also written as ~ X A I x)B ) :
lo),
(17.176) lo), +PB I1)Bl = Q A Q B lo), lo), + ~ A P B lo), I1)B + PAQB I1)A lo), + PAPB l 1 ) A l l ) B .
IXAB) = l x A )
'8 I X B ) =
[QA
+ P A ll)A1 [aB
(17.177)
772
INFORMATION THEORY
However, this is only a special two-Qbit state, which is composed of noninteracting two single Qbits. A general two-Qbit state will be a superposition of the basis vectors:
which span the four-dimensional two-Qbit space as
Complex amplitudes,
aij , satisfy
the normalization condition
c 1
= 1.
(17.180)
J02jj2
i,j=O
In general, it is not possible to decompose ~ X A B in ) Equation (17.179) as the tensor product, I X A ) 63 I x B ) , of two Qbit states: ~ X A and ) I x B ) . Note that only under the additional assumption of QOOQll
(17.181)
= Q01Q10,
Equation (17.179) reduces to Equation (17.177). Qbit states that cannot be decomposed as the tensor product of individual single Qbit states are called entangled. As in the single Qbit case, a measurement on a two-Qbit state causes the state function to collapse into one of the basis states in Equation (17.178 ) with the probabilities given as P i j = IaijI
2
(17.182)
.
Maximum average information that can be obtained from a two-Qbit system, 2b, is when all the coefficients are equal. For entangled states, this may not be the case. As in the case of single Qbits, where we can prepare maximally superposed states, we can also prepare maximally entangled two-Qbit states. There are four possibilities for the maximally entangled states, which are also called the Bell states. Following Josza and Roederer, we write them for la011
2
+ 101012 = 1, a00 =
all
=0
(17.183)
as
(17.184) (17.185)
QUANTUM INFORMATION THEORY
773
and for boo1
2
+ la1112 = 1, a01 = a10 = 0
(17.186)
as (17.187)
(17.188) These four Bell states are orthonormal and span the four-dimensional Hilbert space. Hence, any general two-Qbit state can be expressed in terms of the Bell basis states as IXAB) = c 1 1
I*-)
-k c 1 2 Iq+)-k c 2 1
I@-)
-k c 2 2
I@')
1
(17.189)
where Cij are complex numbers. Since Bell states are constructed from the linear combinations of the original basis states in Equation (17.178), Cij are also linear combinations of aij. If we consider the actions of the unitary operators I, X, Y, and Z on Bell states, we find (17.190) (17.191) (17.192)
( 17.193) The subscript indicates on which Qbit the operator is acting on. To see what all this means, consider a pair of entangled electrons produced in a common process and sent in opposite directions to observers A and B. Electrons are also produced such that if the spin of the electron going toward the observer A is up, lo), , then the other one must be down, Il),, and vice versa. Obviously, the pair is in one of the two Bell states given by Equations (17.184 ) and (17.185). For the sake of argument, let us assume that it is the symmetric one, that is, (17.194)
A measurement by one of the observers, say A, collapses the state function to either lo), ll)Bor l1)A lo), with the equal probability of This removes all the uncertainty in any measurement that B will make on the second electron. In other words, when A makes a measurement, then any measurement of B will have zero information value, since all prior uncertainty will be gone. That is, despite the fact that there are two Qbits involved, this Bell state carries
i.
774
INFORMATION THEORY
only one bit of classical information. In this experiment, separation of A and B could be as large as one desires. This immediately brings t o mind action at a distance and the possibility of superluminal communication via quantum systems. As soon as A (usually called Alice) makes a measurement on her particle, spin of the electron at the location of B (usually called Bob) adjusts itself instantaneously. It would be a flagrant violation of causality if Alice could communicate with Bob by manipulating the spin of her particle. This worried none other than Einstein himself (see literature on E P R paradox). A way out of this conundrum is to notice that Bob has to make an independent measurement on his particle, and still he has t o wait for Alice t o send him the relevant information, which can only be done by classical means at subluminal speeds, so that he can decode whatever message was sent to him. Let us review the problem once more. Alice and Bob share an entangled pair of electrons. Alice conducts a measurement on her electron. She has a 50150 chance of finding its spin up or down. Let us say that she found spin up. Instantaneously, Bob’s electron assumes the spin down state. However, Bob does not know this until he performs an independent measurement on his electron. He still thinks that he has 50150 chance of seeing either spin, but Alice knows that the wave function has collapsed and that for sure he will get spin down. Bob conducts his measurement and indeed sees spin down. But to him this is normal, he has just seen one of the possibilities. Now, Alice calls Bob and tells him that he must have seen spin down. Actually, she could also call Bob before he makes his measurement and tell him that he will see spin down. In either case, it would be hard for Alice to impress Bob, since Bob will think that Alice has after all a 50/50 chance of guessing the right answer anyway. To convince Bob, they share a collection of identically prepared entangled electrons. One by one, Alice measures her electrons and calls Bob and tells him that she observed the sequence TJJTJJT . . . and that he should observe the sequence J T T I T T L . . . . When Bob measures his electrons, he now gets impressed by the uncanny precision of the Alice’s prediction. This experiment can be repeated this time with Alice calling Bob after he conducted his measurements. Alice will still be able t o predict Bob’s results with 100% accuracy. In this experiment, quantum mechanics says that the wave function collapses instantaneously no matter how far apart Alice and Bob are. However, they still cannot use this to communicate superluminally. First of all, in order to communicate they have to agree on a code. Since Alice does not know what sequence spins she will get until she performs her measurements, they cannot do this before hand. Once she does measure her set of particles, she is certain of what Bob will observe. Hence, she embeds the message into the sequence that she has observed by some kind of mapping. For Bob to be able t o read the Alice’s message, Alice has t o send him that mapping, which can only be done through classical channels. Even if somebody intercepts Alice’s message, it will be useless without the sequence that Bob has. Hence, Alice and Bob can establish spy-proof communication through entangled states. One of the
QUANTUM INFORMATION THEORY
775
main technical challenges in quantum information is decoherence, which is the destruction of the entangled states by interactions with the environment. It is for this reason that internal states like spin or stable energy states of atoms are preferred to construct Qbits, which are less susceptible to external influences by gravitational and electromagnetic interactions.
Example 17.7. Quantum cryptology- The Vernam coding: Alice wants to send Bob a message. Say the nine directions to open a safe, where the dial can be turned only one step, clockwise (CW) or counterclockwise (CCW),a t a time. They agree to use binary notation, CW=1, CCW=O, to write the message as
101010011 Afraid of the message being eavesdropped by a third party, Alice and Bob share 9 ordered entangled electrons. Alice measures her particles one by one and obtains the sequence
010010110, where 0 stands for spin up and 1 stands for spin down. Using this as a key, she adds the two sets of binary numbers according to the rules of modulo 2, which can be summarized as
o+o=o, o + 1 = 1 + 0 = 1, 1+1=0, to obtain the coded text, that is, the cryptograph as message key
cryptograph
1 0 1
0 1 1
1 0 1
0 0 0
1 1 0
0 0 0
0 1 1
1 1 0
1 0 1
Now Alice sends the cryptograph to Bob via conventional means. Bob measures his ordered set of electrons to obtain the key and thus obtain the message by adding the key t o the cryptograph with the same rules as cryptograph key message
1 0 1
1 1 0
1 0 0 0 1 0
0 1 1
0 0 0
1 0 1 1 0 1
1 0 1
Since the key is a completely random sequence of zeros and ones, the cryptograph is also a completely random sequence of zeros and ones. Hence, it has no value whatsoever t o anybody who intercepts it without the key that Bob has. This is called the Vernam coding, which cannot be broken. However, the problem that this procedure poses in practice
776
INFORMATION THEORY
is that for each message that Bob and Alice want to exchange they need a new key. That is, it can only be used once, which is also called a onetime-pad system. Another source for major concern is that during this process, the key may somehow be obtained by the eavesdropper. On top of all these, during the transmission, quantum systems are susceptible to interferences (decoherence), hence one needs algorithms to minimize and correct for errors. To attack these problems, various quantumcryptographic methods, which are called protocols, have been developed (Audretsch; BruP and Leuchs; Trigg, and more references can be found at the back of this book). 17.3.8
Quantum Dense Coding
We have seen that entanglement does not help t o communicate superluminally. However, it does play a very important role in quantum computing. Quantum dense coding is one example where we can send two bits of classical information by just using a single Qbit, thus potentially doubling the capacity of the information transfer channel. Furthermore, the communication is spy proof. Let us say that Alice has two bits of secret information t o be sent to Bob. Two bits of classical information can be coded in terms of a pair of binary digits as
00, 10, 01, 11.
(17.195)
First Alice and Bob agree to associate these digits with the following unitary transformations:
uoo = I, UOl = z, UlO
= XI
UIl
=Y.
(17.196) (17.197) (17.198) (17.199)
Then, Alice and Bob each receive one Qbit from an entangled pair prepared, say in the asymmetric Bell state I*-) . Alice first performs a unitary transformation with the subscripts matching the pair of the digits that she is aiming to send Bob safely and then sends her Qbit t o Bob as if it is a mail. Anyone who tempers with this Qbit will destroy the superposed state, hence the message. The unitary transformation that Alice has performed changes the bell state according to the corresponding formulas in Equations (17.190) - (17.193). When Bob receives the particle that Alice has sent, he makes a Bell state measurement on both particles to determine which one of the four states in Equations (17.190) - (17.193) it has assumed. The result tells him what Alice’s transformation was, hence the pair of binary digits that she wanted t o send him. Quantum dense coding was the first experimental demonstration of
QUANTUM INFORMATION THEORY
777
quantum communication. It was first realized by the Innsbruck group in 1996 (Matte et al.). The crucial part of these experiments is the measurement of the Bell state of the tangled pair without destroying the entanglement (Roederer; Audretsch).
17.3.9 Quantum Teleportation Consider that Alice has an object that she wants t o send Bob. Aside from conventional means of transportation, she could somehow scan the object and send all the information contained to Bob. With a suitable technology, Bob then reconstructs the object. Unfortunately, such a technology neither exists nor can be constructed because of the no-cloning theorem of quantum mechanics. However, the next best thing, which guarantees Bob that his object will have the same properties as the original that Alice has, is possible. And most importantly, they do not have t o know the properties of the original. We start with Alice and Bob sharing a pair of entangled Qbits, A and B , which could be two electrons or two photons. We assume that the entangled pair is in the Bell state l Q - ) A B . A third Qbit, the teleportee, which is the same type of particle as A and B and is in the general superposed state IX)T = QT
lo), + PT
I1)T
(17.200)
,
is available to Alice. Any attempt t o determine the exact state of 1 ~ will ) destroy it. Our aim is to have Alice transport her Qbit, I x ) ~ , to Bob without physically taking it there. In other words, Bob will have t o reconstruct 1 ~ at his location. In this process, due t o the no-cloning theorem, we have to satisfy the following two conditions: (i) At any time t o neither Alice nor Bob, the exact state 1 ~ is revealed. ) ~ (ii) At the end, the copy in Alice's hand has t o be destroyed. Otherwise, there will be two copies of 1 ~ ) ~ . We now write the complete state vector of the three particles as 1X)ABT = I'-)AB
Ix)T
We can express lxjABT in terms of the Bell states of the particles A and T held by Alice, that is, in terms of the set [Eqs. (17.184), (17.185), (17.187), and (17.188)] {I'-)AT
,
l'+)AT,
,
I'-)AT
I'+)AT).
(17.202)
The expansion coefficient for basis state 1
I'-)AT
=
-(lo),
Jz
ll)T
-
[')A
lo),)
(17.203)
~
)
~
778
INFORMATION THEORY
is found by projecting
Ix)ABT
along \Q'-)ATas
where we have used the Dirac bra-ket notation (Chapter 5) and the orthogonality relations
(01 0) = (11 1) = 1, (01 1) = (11 0) = 0,
(17.205) (17.206)
for both A and T . Similarly, evaluating the other coefficients, we write the complete state vector, I x ) A B T , as
Now, Alice performs a Bell state measurement on her particles A and T that collapses 1 ~ into one ) of the ~ four~ Bell states ~ in Equation (17.202) with the equal probability of In no way this process provides Alice any information about I x ) ~ , that is, the probability amplitudes (YT and PT,but the particle B , which Bob holds, jumps into a state connected to whatever the Bell state
i.
QUANTUM INFORMATION THEORY
779
that Alice has observed, that is, one of
None of these states is yet the desired 1 ~ measurement that Alice has performed on I of the transportee, 1 ~ ) ~ :
) However, ~ . due t o the Bell state
x ) ~ ~ they , are related to the state
(17.212) by the same unitary transformation that Alice has observed, that is, one of
+ aT lo), - aT lo), + PT lo), - PT lo),
+ PT I1)B
= I IX)T
+ PT ll)B = -z
+ QT I1)B + aT ll)B
1
IX)T
=
IdT ,
=
lx)T 1
,
( 17.213) ( 17.214) (17.215) ( 17.216)
where the operators are defined in Equations (17.136)-(17.139). At this point, only Alice knows which transformation t o use. That is, which Bell state the complete state function, J x J A B Thas , collapsed to. She calls and gives Bob the necessary two bit information, that is, the two digits of the subscripts of the operator Uij, which they have agreed upon before the experiment to have the components
(17.217) ( 17.218) ( 17.219) (17.220) corresponding to the four Bell states, I Q ' - ) A T , I Q ' + ) A T , I@-)AT, I @ ' + ) A T, respectively. Now, Bob uses the inverse transformation, UG', on particle B that he has and obtains an exact replica of the transportee, 1 ~ . For ) example, ~ if Alice observes I @ + ) A T when collapses, the two digits she gives Bob
780
INFORMATION THEORY
is 11, then Bob operates on the particle B that he has with Y-l, which is in state -PT lo), aT ll)B,t o obtain 1 ~ as) ~
+
y - l (-/&
lo), + aT
I1)B) = aT
+ PT ll)B
lo),
= Y-lY
Ix)T (17.221)
= IXJT.
Let us summarize what has been accomplished: (i) Since the teleportee is the same type of particle as A and B , we have obtained an exact replica of 1 ~ at) Bob’s ~ location who has the particle B . (ii) The original, lxjT, that Alice had is destroyed. That is, neither the particle A nor the other particle that Alice is left with, that is, the transportie whose properties has been transferred t o B a t Bob’s location, is in state 1 ~ ) (iii) If somebody had spied on Alice and Bob, the two bit information, that is, the subscripts of the unitary transformation, would have no practical value without the particle, B , that Bob holds. Notice that Alice and Bob has communicated a wealth of quantum information hidden in the complex amplitudes of IX)T = QT
lo), + PT l1)T.
by just sending 2 bits. Neither Alice nor Bob has gained knowledge about the exact nature of the state 1 ~ ) In ~ . other words, neither of them knows what CIT and PT are. This intriguing experiment was first realized by the Innsbruck group in 1997 (Bouwmeester et al.).
PROBLEMS 1. Consider two chance events, A and B , both with 2 possibilities, where p ( i , j ) is the joint probability of the i t h and j t h possibilities occurring for A and B , respectively. We write the entropy of information for the joint event as 2
2
For the individual events we write
rz and
2
1
~ .
PROBLEMS
781
Show that the following inequality holds:
H ( A ,B ) 5 H ( A )
+H(B).
Also show that the equality is true for independent events, where
P(A,B)= P(A)P(B). Apply this to the case where two fair coins are tossed independent of each other. 2. Two chance events ( A ,B ) have the following joint probability distribution:
AJ\B+
1
2
3
4 L
1
L
8
16
L
L
3
1 _ 32
1 _ 32
&
4
1 1 1 0 32 32 16
16
4
Find
and interpret your results.
3. Analyze the following payoff matrices [Eq. (17.42)] for zero-sum games for both players for optimum strategies: (i)
Player A
Player B
I b~ I PI
6
I I
1. l1 I
-3
Note that it is foolish for the player B t o choose bl, since b2 yields more regardless of what A decides. In such cases we say b2 dominates bl and hence discard strategy b l . Finding dominant strategies may help simplifying payoff matrices.
782
INFORMATION THEORY
(ii) Player A
Player B
)b11-2)
I
Ib2l
5 1-11
I
I
4. Consider the following zero-sum game: Player A
(i) Find the randomized strategy that A has to follow to minimize maximum expected loss.
(ii) Find the randomized strategy that B has to follow to maximize minimum expected gain.
5. The two-player competition game is defined as follows: Both players simultaneously choose a whole number from 0 to 3. Both players win the smaller of the two numbers in points. In addition, the player who chose the larger number gives up 2 points to the other player. Construct the payoff matrix and identify the Nash equilibria. 6. Identify the Nash equilibria in the following payoff matrices: (i)
PROBLEMS
783
(ii)
7. Two drivers on a road have two strategies each, to drive either on the left or on the right with the payoff matrix
I I
1 1 \2
+
Drive on the left
I Drive on the left I Drive on the right I 1 (100,100) I (0,O) I)
where the payoff 100 means no crash and 0 means crash. Identify Nash equilibria for (i) the pure strategy game, (ii) the mixed strategy game with the probabilities (SO%, 50%).
Which one of these is stable.
8. In a 2-Cbit operation, find the action of the operator
-(I 1
+ ZlZO)
2
on the 2-Cbit states
9. In a 2-Cbit operation, find the action of the operator
on the 2-Cbit states
10. Prove the following operator representation of the operator exchanges the values of Cbits 1 and 0:
S10, which
784
INFORMATION THEORY
or 1 SlO = 2 [I
+ ZlZO + XlXO + 1
-
,
Y-IYO]
where
Y = xz. 11. Another useful operation on 2-Cbit systems is the reversible XOR or the controlled-NOT or in short c-NOT, gate executed by the operator ClO as
ClO 1.1)
1 0.)
=
xgl 1.1)
1.0).
The task of Clo is to flip the value of the target bit, control bit, I Z ~ ) , has the value 1.
IQ),
whenever the
(i) Show that Clo can be constructed from single Cbit operators as
1 ClO = -(I 2
1 + XO) + -Z1(I 2
-
XO)
1 ClO = -(I 2
1 + Z l ) + -Xo(I 2
-
ZI).
or as
(ii) Show that the c-NOT operator can be generalized as czj =
1 1 z(I + Xj) + -Zz(I 2
-
Xj)
or as czj
1
= 2(1
1 + ZZ) + -Xj(I - Zz). 2
(iii) What is the effect of interchanging X and Z? 12. The Hadamard operator, 1 H = -(X+
fi
-"(' fi -
Z)
')
1 - 1 '
is classically meaningless. Show that it takes the Cbit states (0) and 11) into two classically meaningless superpositions:
1
-2 (10)
* 11))
'
PROBLEMS
785
13. Verify the following operator relations:
x2= I , x z = -zx, HX
=
1
-(I
Jz
+ ZX),
HXH = 2, HZH = X. 14. Find the effect of the operator
COl = (H1Ho)C,o(H,Ho), on Cbits. Using the relations
HXH = Z, HZH = X and 1
1
czj = 2 ( I + Xj) + -Zz(I 2 - Xj), also show that
Cji = (HiHj)Cij(HiHj). This seemingly simple relation has remarkable uses in quantum computers (Mermin). 15. Show that the Bell states
and
are unit vectors and that they are also orthogonal to each other.
786
INFORMATION THEORY
16. Show the following Bell state transformations:
IQ-)
=IA
I*-),
Iq+) = ZA lq-),
I@-) I a')
= - x A l!J-),
I
= Y A !J-)
.
The subscript A indicates the Qbit that the operator is acting on. 17. Given the states
and
can be written in terms of the Bell states,
of the particles A and T as
18. Verify the following transformations used in quantum teleportation and discuss what happens if Bob makes a mistake and disturbs the state of the particle, B , he holds?
+ QT lo), - QT
lo),
+ PT
I1)B = I IX)T >
+ PT ll)B
=
+ PT lo), + QT ll)B = -
PT
lo),
+ aT
ll)B =
-'
IX)T
IX)T > lx)T
'
1
References
Akhiezer, N.I., The Calculus of Variations, Blaisdell, New York, 1962. Ahlfors, L.V., Complex Analysis, McGraw-Hill, New York, 1966. Andel, J., Mathematics of Chance, Wiley, New York, 2001. Apostol, T.M., Mathematical Analysis, Addison-Wesley, Reading, MA, fourth printing, 1971. Appel, W., Mathematics for Physics and Physicists, Princeton University Press, Princeton, NJ, 2007. Arfken, G.B., and H.J. Weber, Mathematical Methods of Physics, Elsevier, Boston, sixth edition, 2005. Artin, E., The Gamma Function, Holt, Rinehart and Winston, New York, 1964. Audretsch, J., Entangled Systems, Wiley-VCH, Weinheim, 2007. Audretsch, J., editor, Entangled World: The Fascination of Quantum Information and Computation, Wiley-VCH, Weinheim, 2006. Basu, K., The Traveler’s Dilemma, Scientific American, p. 68, June 2007. Bather, J.A., Decision Theory, A n Introduction to Programming and Sequential Decisions, Wiley, Chichester, 2000. Bayin, S.S., Mathematical Methods in Science and Engineering, Wiley, Hoboken, NJ, 2006. Essentzals of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
5. Selquk Bayin 787
788
REFERENCES
Bell, W.W., Special Functions for Scientists and Engineers, D. Van Nostrand, Princeton, NJ, 1968. Boas, M.L., Mathematical Methods in the Physical Sciences, Wiley, Hoboken, NJ, third edition, 2006. Bouwmeester, D., J.W. Pan, K. Mattle, M. Eibl, H. Weinfurter, and A. Zeilinger, Experimental Quantum Teleportation, Nature, vol. 390, pp. 575-579, 1997. Bradbury, T.C., Theoretical Mechanics, Wiley, New York, 1968. Bromwich, T.J.I., A n Introduction to the Infinite Series, Chelsea Publishing Company, New York, 1991. Brown, J.W., and R.V. Churchill, Complex Variables and Applications, McGrawHill, New York, 1995. Brup, D., and G. Leuchs, editors, Lectures on Quantum Information, WileyVCH, Weinheim, 2007. Buck, R.C., Advanced Calculus, McGraw-Hill, New York, 1965. Butkov, E., Mathematical Physics, Addison-Wesley, New York, 1968. Byron, Jr. F.W., and R.W. Fuller, Mathematics of Classical and Quantum Physics, Dover, New York, 1992. Churchill, R.V., Fourier Series and Boundary Value Problems, McGraw-Hill, New York, 1963. Cover, T.M., and J.A. Thomas, Elements of Information Theory, Wiley, Hoboken, NJ, second edition, 2006. Csermely, P., Weak Links, Stabilizers of Complex Systems from Proteins to Social Networks, Springer, Berlin, 2006. Dirac, P.A.M., The Principals of Quantum Mechanics, Clarendon Press, Oxford, fourth edition, 1982. Dennery, P., and A. Krzywicki, Mathematics for Physics, Dover Publications, New York, 1995. Dwight, H.B., Tables of Integrals and Other Mathematical Data, Macmillan, New York, fourth edition, 1961. Erdelyi, A., Oberhettinger, M.W., and Tricomi. F.G., Higher 'Transcendental Functions, vol. I, Krieger, New York, 1981. Feynman, R., R.B. Leighton, and M. Sands, The Feynman Lectures on Physics, Addison-Wesley, Reading, MA, 1966. Franklin, P.A., A 'Treatise o n Advanced Calculus, Wiley, New York, 1940. Gamow, G., One Two Three ... Infinity: Facts and Speculations of Science, Dover Publications, 1988. Gantmacher, F.R., The Theory of Matrices, Chelsea Publishing Company, New York, 1960. Gasierowicz, S., Quantum Physics, Wiley, Hoboken, NJ, third edition, 2003. Ghirardi, G., Sneaking a Look at God's Cards, Princeton University Press, Princeton, NJ, 2004.
REFERENCES
789
Gnedenko, B.V., The Theory of Probability, MIR Publishers, Moscow, second printing, 1973. Goldstein, H., C. Poole, and J. Saf'ko, Classical Mechanics, Addison-Wesley, San Francisco, third edition, 2002. Griffiths, D.J., Introduction to Electrodynamics, Benjamin Cummings, third edition, 1998. Grimmett, G.R., and D.R. Stirzaker, Probability and Random Processes, Clarendon, Oxford, third edition, 2001. Harris, B., Theory of Probability, Addison-Wesley, Reading, MA, 1966. Hartle, J.B., A n Introduction to Einstein's General Relativity, Addison-Wesley, San Francisco, 2003. Hassani, S., Mathematical Methods: For Students of Physics and Related Fields, Springer Verlag, New York, 2000. Hassani, S., Mathematical Physics, Springer Verlag, New York, second edition, 2002. Hauser, W., Introduction to Principles of Mechanics, Addison-Wesley, Reading, MA, first printing, 1966. Haykin, S., Neural Networks, A Comprehensive Foundation, Prentice Hall, U p per Saddle River, 1999. Hildebrand, F.B., Methods of Applied Mathematics, Dover Publications, New York, second reprint edition, 1992. Hoffman, K., and R. Kunze, Linear Algebra, Prentice Hall, Upper Saddle River, NJ, second edition, 1971. Inan, U.S., and A.S. Inan (a),Engineering Electrodynamics, Prentice Hall, Upper Saddle River, 1998. Inan, US., and A.S. Inan (b), Electromagnetic Waves, Prentice Hall, Upper Saddle River, 1999. Ince, E.L., Ordinary Differential Equations, Dover Publications, New York, 1958. Jones, G.A., and J.M. Jones, Information and Coding Theory, Springer, London, 2006. Josza, R. in H.-K. Lo, S. Popescu, and T. Spiller, editors, Introduction to Quant u m Computation and Information, Word Scientific, Singapore, 1998. Kaplan, W., Advanced Calculus, Addison-Wesley, Reading, third edition, 1984. Kelly, J . J., Graduate Mathematical Physics, ments+ CD, Wiley-VCH, Weinheim, 2007.
With Mathematica Supple-
Kolmogorov, A.N., Foundations of the Theory of Probability, Chelsea Publishing Company, New York, 1950. Kusse B.R., and E.A. Westwig, Mathematical Physics: Applied Mathematics FOT Scientists and Engineers, Wiley-VCH, Weinheim, second edition, 2006. Kyrala, A., Applied Functions of a Complex Variable, Wiley, New York, 1972. Lang, S., Linear Algebra, Addison-Wesley, Reading, MA, 1966.
790
REFERENCES
Lebedev, N.N., Special Functions and Their Applications, Prentice-Hall, Englewood Cliffs, NJ, 1965. Lebedev, N.N., I.P. Skalskaya, and Y.S. Uflyand, Problems of Mathematical Physics, Prentice-Hall, Englewood Cliffs, NJ, 1965. Margenau, H., and G. M. Murphy, editors, The Mathematics of Physics and Chemistry, Van Nostrand, Princeton, NJ, 1964. Marion, J.B., Classical Dynamics of Particles and Systems, Academic Press, New York, second edition, 1970. Mathews, J., and R.W. Walker, Mathematical Methods of Physics, AddisonWesley, Menlo Park, CA, second edition, 1970. Mattle, K., H. Weinfurter, P.G. Kwiat, and A. Zeilinger, Dense Coding in Experimental Quantum communication, Phys. Rev. Lett., vol. 76, pp. 4656-4659, 1966. McCollum, P.A., and B.F. Brown, Laplace Transform Tables and Theorems, Holt, Rinehart and Winston, New York, 1965. McMahon, D, Quantum Computing Explained, Wiley-IEEE Computer Society Press, Hoboken, NJ, 2007. Medina, P.K., and S. Merino, Mathematical Finance and Probability, Birkhauser Verlag, Basel, 2003. Mermin, N.D., Quantum Computer Science, Cambridge University Press, Cambridge, 2007. Merzbacher, E., Quantum Mechanics, Wiley, New York, 1998. Miller, I., and M. Miller, John E. Freund’s Mathematical Statistics With Applications, Pearson Prentice Hall, Upper Saddle River, NJ, seventh edition, 2004. Morsch, O., Quantum Bits and Quantum Secrets: How Quantum Physics Is Revolutionazing Codes and Computers, Wiley-VCH, Weinheim, 2008. Morse, P.M., and H. Feshbach, Methods of Theoretical Physics, McGraw-Hill, New York, 1953. Murphy, G.M., Ordinary Differential Equations and Their Solutions, Van Nostrand, Princeton, NJ, 1960. Myerson, R.B., Game Theory, Analysis of Conflict, Harvard University Press, Cambridge, MA, 1991. Nagle, R.K., E.B. Saff, and A.D. Snider, Fundamentals of Differential Equations and Boundary Value Problems, Addison-Wesley, Boston, 2004. Osborne, M.J., an introduction to Game Theory, Oxford University Press, New York, 2004. Peters, E.E., Complexity, Risk, and Financial Markets, Wiley, New York, 1999. Pathria, R.K., Statistical Mechanics, Pergamon Press, Oxford, 1984. Rektorys, K., Survey of Applicable Mathematics Volumes I and II, Springer, Berlin, second revised edition, 1994. Roederer, J.G., Information and Its Role in Nature, Springer, Berlin, 2005.
REFERENCES
791
Ross, S.L., Differential Equations, Wiley, New York, third edition, 1984. Saff, E.B., and A.D. Snider, Fundamentals of Complex Analysis with applications to Engineering and Science, Prentice Hall, Upper Saddle River, N.1, 2003. Shannon, C.E., A Mathematical Theory of Communication, The Bell System Technical Journal, vol. 27, pp. 379-423, 623-656, 1948. Shannon, C.E., and W. Weaver, The Mathematical Theory of Communication, The University of Illinois Press, Urbana, IL , 1949. Sivia, D.S., and J. Skilling, Data Analysis: A Bayesian Tutorial, Oxford, New York, second edition, 2006. Spiegel, M.R., Advanced Mathematics f o r Engineers and Scientists: Schaum’s Outline Series in Mathematics, McGraw-Hill, New York, 1971. Stapp, H.P., Mind Matter and Quantum Mechanics, Springer, Berlin, second edition, 2004. Stolze, J., and D. Suter, Quantum Computing: A Short Course from Theory to Experiment, Wiley-VCH, Weinheim, 2004. Szabo, G., and G. Fath, Evolutionary Games on Graphs, Physics Reports, Vol. 446, pp. 97-216, 2007. Szekerez, P., A Course in Modern Mathematical Physics: Group, Hilbert Space and Differential Geometry, Cambridge University Press, New York, 2004. Titchmarsh, E.C., The Theory of Functions, Oxford University Press, New York, 1939. Thomas, G.B. Jr., and R.L. Finney, Thomas ’ Calculus, Addison-Wesley, Boston, alternate edition, 2000. Todhunter, I., A History of the Theory of Probability From the Time of Pascal to Luplace, Chelsea Publishing Company, New York, 1949. Trigg, G.L., editor, Mathematical Tools for Physicists, Wiley-VCH, Weinheim, 2005. Wan, F.Y.M., Introduction to the Calculus of Variations and its Applications, Chapman and Hall, New York, 1995. Wang, F.Y., Physics with Maple: The Computer Algebra Resource for Mathematical Methods in Physics, Wiley-VCH, Weinheim, 2006. Watson, G.N., A Treatise on the Theory of Bessel Functions, Cambridge University Press, London, second edition, 1962. Weber, H.J., and G.B. Arfken, Essential Mathematical Methods for Physicists, Academic Press, San Diego, 2003. Wilks, J., The Third Law of Thermodynamics, Oxford University Press, London, 1961. Whittaker, E.T., and G.N. Watson, A Course on Modern Analysis, Cambridge University Press, New York, 1958. Woolfson, M.M., and M.S. Woolfson, Mathematics for Physics, Oxford University Press, Oxford, 2007. Zeilinger, A., Quantum Information, Physics World, vol. 11 no. 3, March 1998.
792
REFERENCES
Zeilinger, A., Quantum Teleportation, Scientific American, pg. 32, April 2000. Ziemer, R.E., Elements of Engineering Probability and Statistics, Prentice Hall, Upper Saddle River, N J , 1997.
INDEX
Absolute maximum, 14 Absolute minimum, 14 Absolutely integrable, 591 Action, 653 Action at a distance, 109 Addition formula Bessel functions, 537 Alternating series, 313 Amplitude spectrum, 609 Analytic functions, 349 derivative, 384 Taylor series, 11 Antiderivative pirimitive, 36 Arc length, 83 Area of a surface, 173 Argument, 335 function, 3 Associated Laguerre polynomials, 566 Average function, 35 Baker-Hausdorf formula, 294 Basis states, 754 Basis vectors, 141, 167, 245 Bayes’ criteria, 738
Bayes’ formula, 675 Bell states entanglement, 771 Bernoulli equation, 423 Bessel function addition formula, 537 Jacobi-Agner expansion, 537 Bessel functions boundary conditions, 531 expansion theorem, 531 first-kind, 513 generating functions, 519 integral definitions, 521 orthogonality roots, 527 recursion relations, 518 second-kind, 514 third-kind, 517 Weber integral, 533 Wronskians, 522 Bessel’s equation series solution, 510 Bessel’s inequality, 590 Binomial coefficients, 681 Binomial distribution, 701 moments, 707
Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
5. SelGuk Bayin 793
794
INDEX
Binomial formula binomial coefficients, 323 Bit, 723 Boltzmann distribution gases, 686 solids, 684 Bose-Einstein condensation, 688 Bose-Einstein distribution, 687 Boundary conditions, 4 spherical coordinates, 558 Boundary point, 2 Bounded variation, 594 Bra-ket vectors, 297 Brachistochrone problem, 664 Buffon’sneedle, 677 Cartesian coordinates, 62 Cartesian tensors, 148 Cauchy criteria, 306 Cauchy integral formula, 382 Cauchy principal value, 41 Cauchy product, 316 Cauchy-Euler equation explicit solutions, 441 Cauchy-Goursat theorem, 376 Cauchy-Riemann conditions, 346 polar coordinates, 351 Cauchy-Schwartz inequality, 586 Cbit, 746, 767 operations, 750 Central moment, 705 Change of basis, 254 Channel blocker, 765 Characteristic equation, 258 Characteristic value eigenvalue, 257 Chebyshev’s theorem, 710 Chi-square, 700 Clairaut equation, 428 Closed set, 2 Collectively independent events, 675 Combinations, 681 Commutator, 756 Comparison test, 309 Completeness, 274, 754 Complex algebra, 332 Complex conjugate, 334 Complex correlation function modified, 615 Complex functions exponentials, 354 hyperpolic functions, 357 inverse trigonometric functions, 362
limits and continuity, 344 logarithmic function, 358 polynomials, 354 powers, 359 trigonometric functions, 356 Complex infinity, 339 Complex integrals contour integrals, 370 indefinite integrals, 379 Complex plane extended, 339 Complex series convergence, 393 Laurent series, 389 Maclaurin series, 388 Taylor series, 385 Components, 275 covariant /contravariant, 159 Compressible flow, 81 Conditional probability, 673 Conjugate harmonic functions, 351 Conjugate variables, 753 Conservative forces, 118 Constraints, 659 Continuity, 4 piecewise, 596 Continuity equation, 129 Contour integrals, 370, 373 Contraction tensors, 150 Contravariant components, 159 Control bit, 770 Convergence absolute, 309 conditional, 309 integrals conditionally convergent, 37 series, 309 uniform, 309 Convergence tests, 309 Convolution, 621 Coordinate axes, 155 Coordinate curves, 155 Coordinate surfaces, 155 Coordinates components, 246 Correlation coefficient, 610 modified, 611 Correlation function, 610 Coulomb gauge, 126 Covariant components, 159 Cramer’s rule, 226 Critical point, 16 Cross product
INDEX
vector product, 61 Cryptograph, 775 Cryptography, 775 Cumulative distribution, 697 Curl, 77 Curl-meter, 105 Curvilinear coordinates, 154 Cylindrical coordinates, 187, 191 Darboux sum, 33 De Broglie relation, 753 Decision theory, 735 Decoherence, 769, 775 Del operator gradient, 74 DeMoivre’s formula, 336 Dense coding, 776 Dependent variable function, 3 Derivatives chain rule, 22 Determinants, 220 Laplace development, 222 minor, 221 order, 221 properties, 223 rank, 222 Differential equations exact equations integrating factors, 442 explicit solutions, 408 first-order, 410 exact, 417 integrating factors, 419 linear, 416 methods of solutions, 412 F’robenius method, 452 first-order equations, 462 general solution, 408 harmonic oscillator, 435 homogeneous nonhomogeneous, 409 implicit solution, 408 initial conditions, 409, 452 linear and higher order, 450 operator approach, 437 particular solution, 408, 444 quadratures, 408 second-order, 429 methods of solution, 430 singular solution, 408 uniqueness of solutions, 452 Differential operators differential equations, 409
Diffusion equation Cartesian coordinates, 550 cylindrical coordinates, 572 heat flow equation, 541 spherical coordinates, 564 Dirac’s bra-ket vectors, 297 Dirac-delta function, 618, 699 Direction cosines, 142 Directional derivative, 75 Dirichlet boundary condition, 558 Discrete distributions, 700 binomial, 701 Poisson, 703 uniform, 701 Displacement vector, 168 Distribution function, 694 Distribution functions arcsine, 694 Cauchy, 717 chi-square, 700 double triangle, 717 exponential, 699 gamma, 699 Gaussian, 699 hypergeometric, 719 Polya, 718 probability theory, 696 Rayleigh, 718 uniform, 698 Distributions expected value, 705 mean, 705 standart deviation, 705 variance, 705 Divergence div operator, 77 integral definition, 82 Divergence theorem Gauss’s theorem, 82 Domain function, 3 Domain of definition, 343 Dominant strategies, 781 Double integrals, 47 properties, 49 Dual spaces, 297 Duality, 753 Dummy index tensors, 151 Duplication formula, 521 Eigenstates, 754 Eigenvalue characteristic value, 257
795
796
INDEX
degenerate, 257 Eigenvalue problem symmetric matrices, 277 degenerate roots, 278 distinct roots, 278 Eigenvectors, 258 Electrostatics, 128 Entanglement Bell states, 771 Entire function, 349 Entropy solids, 689 Entropy of information, 729 Equation of continuity, 81 Equilibrium, 16 Essential singular point, 394 Euler constant, 516 Euler equation alternate form, 642 variational analysis, 642 Euler’s formula, 354 Events certain, 667 collectively independent, 675 impossible, 667 independent, 674 mutually exclusive, 669 pairwise independent, 675 random, 667 Exact differentials path independence, 114 Expectation value, 755 Expected gain, 739 Expected loss, 739 Expected value, 705 Extensive forms, 739 Extremum local absolute, 15 maximum minimum, 15 with conditions, 18 Extremum points, 637 Fermi-Dirac distribution, 689 Fields, 242 Flip operator, 766 Fourier series change of interval, 602 convergence, 593 differentiation, 603 Dirichlet integral, 594 exponential form, 592 fundamental theorem, 596
generalized, 588 Gibbs phenomenon, 598 integral representation, 594 integration, 603 periodic extention, 600 Riemann localization theorem, 595 sine/cosine series, 602 square wave, 597 triangular wave, 599 trigonometric, 591 uniqueness, 597 Fourier transform correlation function, 615 derivative, 621 existence, 620 inverse, 615 properties, 621 Free index tensors, 151 Frequency of occurrence, 672 Frequency spectrum, 609, 617 Frobenius method, 452 Function, 2 Functionals, 638 Fundamental theorem averages, 704 calculus, 36 Game theory, 737 Gamma distribution, 699 Gamma function, 458, 521, 526, 536 duplication formula, 526, 527 Gates, 747, 752, 758 Gauss’s law, 118 Gauss’s method linear equations, 217 Gauss’s theorem divergence theorem, 82 Gauss-Jordan reduction, 218 Gaussian distribution, 699 moments, 706 Gaussian surface, 111 General boundary condition, 558 General solution, 409 Generalized coordinates, 154, 653 area element, 171 curl, 185 divergence, 182 gradient, 179 Laplacian, 186 orthogonal, 186 volume element, 177 Geometric probability, 677 Geometric series, 310
INDEX
Gibbs phenomenon, 598 Gradient del operator, 74 generalized coordinates, 179 Gram-Schmidt orthogonalization, 276 Gramian, 275 Gravitational field, 108 Birkhoff’s theorem, 112 stars, 111 Gravitational potential, 116 Gravitational potential energy uniform sphere, 121 Green’s first identity, 107 Green’s second identity, 107, 137 Green’s theorem, 91 Cauchy-Goursat theorem, 376 multiply connected domains, 96 Hadamart operator, 766 Hamilton’s principle, 651 Hamiltonian operator, 758 Hankel functions, 517 Harmonic functions, 350 Harmonic series, 310 Heat flow equation Cartesian coordinates, 550 cylindrical coordinates, 572 spherical coordinates, 564 Heisenberg uncertainty, 753 Helmholtz spherical coordinates, 563 Helmholtz equation, 542 cylindrical coordinates, 570 Helmholtz theorem, 122 Hermite equation series solution, 487 Hermite polynomials, 491 contour integral definition, 492 generating function, 494 Hermite series, 499 orthogonality, 496 recursion relations, 495 Rodriguez formula, 493 special values, 495 weight function, 497 Hermitian, 289 Hermitian operators, 294 Hilbert space, 296, 754 completeness, 754 orthogonality, 754 Homogeneous differential equation, 409
Identity matrix unit matrix, 209 Identity operator, 766 Identity tensor, 152 Implicit functions, 25 Implicit solution, 408 Improper transformations, 140 Impulse function Dirac-delta function, 618 Incompressible fluids, 129 Independent variable function, 3 Indicia1 equation, 453 Inflection point, 15 Information conditional probabilities, 733 continuous distributions, 733 H-function, 729 joint events, 732 unit, 723 Information content, 728 Information processing, 726 Information value, 728 Initial conditions boundary conditions, 409 Inner product, 272, 586 norm, 586 Inner product space, 274 Integral indefinite, 36 Integral test, 309 Integrals absolutely convergent conditionally convergent, 37 Cauchy principal value, 41 Darboux sum, 33 double triple, 47 improper, 37 M-test, 42 multiple, 50 with a parameter, 42 Integrating factor, 419 Integration by parts, 37 Integration constant, 409 Interference, 757 Interferometer Mach-Zehnder, 760 Invariants, 147, 178 Inverse basis vectors, 167 Inverse Fourier transform, 615 Inverse functions, 30 Inverse matrix, 230 Inverse transformation, 144
797
798
INDEX
Irrotational flow, 129 Isolated singular points, 394 Jacobi determinant implicit functions, 27 Jacobi identity, 130 Jacobi-Agner expansion Bessel function, 537 Jacobian, 157 inverse functions, 30 Jordan arc, 372 Kinetic energy, 87 Kronecker delta, 63 identity tensor, 152 L 'HBpit a1's rule limits, 6 Lagrange multiplier extremum problems, 20 Lagrange's equation, 426 Lagrangian, 653, 657 constraints, 659 Laguerre equation, 500 series solution, 500 Laguerre polynomials, 502 contour integral definition, 502 generating function, 504 Laguerre series, 506 orthogonality, 505 Rodriguez formula, 503 special values, 504 Laplace development, 222 Laplace equation, 119, 541, 650 Cartesian coordinates, 546 cylindrical coordinates, 569 spherical coordinates, 557 Laplace transform, 622 differential equation, 625 inverse, 623 transfer functions, 627 Laplacian, 105 Laurent series, 389 Law of large numbers, 712 ergodic theorems, 705 Left derivative, 6 Legendre equation, 470 polynomial solutions, 474 series solution, 470 Legendre polynomials, 474 generating function, 478 Legendre series, 484 orthogonality, 482 recursion formulas, 481 Rodriguez formula, 477
special values, 480 Leibnitz's rule, 43 Levi-Civita symbol permutation symbol, 152 Limit comparison test, 309 Limits, 5 Line element, 164, 168 Line integrals arc length, 83 Linear combination, 244 Linear equations, 216 homogeneous, 233 Linear independence, 244, 275 Linear spaces vector space, 242 Linear transformations matrix representation, 293 operators, 249 Lines, 68 Liouville theorem, 402 Lorentz gauge, 129 M-test integrals, 42 Mach-Zehnder interferometer, 760 mathematics, 763 Maclaurin series, 11, 324, 388 Magnet ost at ics, 128 Magnitude, 58 Mapping function, 2 Matrices adjoint, 232 algebra, 209 cofactor, 231 diagonal, 209 dimension, 207 Hermitian, 294 self-adjoint, 289 identity matrix, 209 inverse matrix, 230 h e a r equations, 216 orthogonal, 287 rectangular, 207 row matrix column matrix, 208 spur trace, 211 square order, 208 submatrix partitioned matrix, 215 symmetry, 209
INDEX
transpose, 208 unitary, 291 zero matrix null matrix, 209 Maxwell’s equations, 128 Mean, 705 function, 35 Mean square error, 590 Mean value theorem Rolle’s theorem, 36 Median, 706 Method of elimination, 218 Metric tensor, 165 Minimax criteria, 738 Minkowski inequality, 586 Minor determinants, 221 Mixed state, 755 Modified Bessel functions, 523 Modulus, 334 Moment of inertia scalar, 285 Moment of inertia tensor, 265 Multinomial coefficients, 681 Multiple integrals, 50 Multiple-to-one functions, 3 Multiplication theorem, 673, 674 Multiply connected domain, 381 Multivalued functions, 3 Multivalued functions complex functions, 358 principal value, 358 Mutually exclusive events, 762 Nash equilibrium, 742 Natural boundary conditions, 642 Necker cubes, 748 Neighborhood, 2 Neumann boundary condition, 558 Neumann function, 515 No-cloning theorem, 768 control Qbit target Qbit, 770 Norm, 58 magnitude, 274 Riemann integral, 34 Normal distribution, 699 Normal forms, 738 Novelty value, 728 Null matrix zero matrix, 209
Null set, 2 Numbers scalars, 242 Nyquist sampling frequency, 609 One-time-pad quantum cryptography, 776 Open set, 2 Operators on Cbits, 750 Ordinary derivative, 6 Orthogonal functions completeness, 590 convergence mean, 590 inner product, 586 linear independence, 587 theory, 586 Orthogonal matrices, 287 Orthogonal transformations, 140 Orthogonality, 274, 754 Orthogonality condition, 143 Outer product tensors, 149 Pairwise independent events, 675 Parceval’s formula, 590 Parceval’s theorem, 622 Partial derivative, 6 Particular solution, 409 Partitioned matrices symmetry, 214 Path independence, 113 Payoff matrix, 738 Permutation symbol, 65 Levi-Civita tensor, 152 Permutations, 681 Phase shift operator, 766 Phase spectrum, 609 Piecewise continuous, 5 Planck formula, 753 Planes equation, 69 Poisson distribution, 703 moments, 708 Poisson’s equation, 119 Poles singular points, 394 Potential energy gravitational, 117 Power series, 321 Primitive antiderivative, 36 Principal coordinates, 265
799
800
INDEX
Principal directions, 265 Principal moments of inertia, 265 Prior uncertainty, 729 Probability classical definition, 668 entropy, 689 Probability amplitudes, 755 Probability density, 754 Probability density function, 698 Probability theory basic theorems, 669 Bayes’ formula, 675 Buffon’s needle, 677 Chebyshev’s theorem, 710 combinations, 678 compound element, 669 conditional probability, 673 distribution function, 694 elementary event, 669 event, 669 frequency of occurrence, 672 fundamental theorem, 704 geometric probability, 677 law of large numbers, 705 multiplication theorem, 673, 674 permutations, 678 random variables, 693 sample space, 668 simple event, 669 statistical definition, 672 total probability, 675 Proper transformations, 140 Protocols quantum cryptography, 776 Pseudotensors, 178 Cartesian tensors, 153 Pure state. 756 Qbit, 767 Qbit operators, 766 Qbit versus Cbit, 767 Quadratic forms, 285 Quadratures differential equations, 408 Quantum cryptography protocols, 776 Quantum dense coding, 776 Quantum information cryptography, 775 Vernam coding, 775 Quantum mechanics, 752 Radius of convergence, 322 Random variables, 693
Range, 343 function, 3 Rank, 222 tensors, 147 Ratio test, 310 Residue theorem, 398 Riccati equation, 424 Riemann integral, 34 Riemann localization theorem, 595 Riemann sphere, 340 Riemann-Lebesgue lemma, 591 Right derivative, 6 Rolle’s theorem mean value theorem, 36 Root test, 310 Row matrix, 208 Sample space, 668 Sampling property, 618 Sampling theorem Shannon, 609 Scalar field, 71 Scalar product dot product inner product, 60 Schrodinger equation time-dependent spherical coordinates, 566 time-independent spherical coordinates, 565 Schwarz inequality, 37, 65 Selenoidal fields, 82 Self-adjoint operators, 294 Self-energy gravitational, 1 21 Separation of variables, 542 Cartesian coordinates, 542 cylindrical coordinates, 567 spherical coordinates, 553 Sequences Cauchy criteria, 304 upper/lower limit, 307 Series Cauchy product, 316 convergence, 309 grouping, 314 indeterminate forms, 325 infinite, 308 multiplication, 314, 316 rearrangement, 315 Series of functions uniform convergence, 316 Series operations, 314 Signals, 608
INDEX
Similarity transformation, 256 Simple closed curve Jordan curve, 372 Simple pole, 394 Simply connected domain, 381 Singular point, 349 Singular points classification, 394 Singular solution, 409 Smooth curve, 372 Smooth functions very smooth functions, 596 Spectrum eigenvalues, 258 Spherical Bessel functions, 525 Spherical coordinates, 193 Spherical pendulum, 666 Spur tensors, 150 trace, 211 Square matrix, 208 Standart deviation, 705 State function wave function, 297, 754 State vector collapse, 755 wave function, 754 Stationary functions, 639 Stationary points, 637 Stationary values, 638 Statistical information, 727 Statistical probability, 672 Stereographic projection, 340 Stirling’s approximation, 312 Stokes’s theorem, 97, 102 Strain tensor, 153 Streamlines, 71 Stress tensor, 153 Submatrices, 214 Summation convention Einstein, 163 Superposed state, 757 Surface integrals, 88 Tangent plane to a surface, 75 Target bit, 770 Taylor series, 11, 324, 388 radius of convergence, 388 remainder, 387 Teleportation, 777 Teleportee, 777 Temperature, 691
801
Tensor density, 178 Cartesian tensors, 153 Tensors algebra, 148 rank, 147 spur trace, 150 tensor product outer product, 149 transpose, 149 Total differential, 10 Total probability, 675 Trace spur, 211 tensors, 150 Trace formula, 295 Transfer functions Laplace transforms, 627 Transformation matrix, 143 Transformations active/passive, 286 algebra, 252 inverse, 254 linear, 249 matrix representation, 250 product, 253 similar, 255 unitary, 291 Transpose, 149, 208 ’Ikaveler’s dilemma, 742 Triangle inequality, 65 Triple product, 66 Uncertainty principle, 753 Uniform convergence M-test, 42 properties, 319 Weierstrass M-test, 318 Uniform distribution, 698, 701 Union, 2 Unitary matrices, 291 Unitary space, 274 Unitary transformation, 758 Variational analysis Euler equation, 642 functionals, 638 general case, 647 inverse problem, 650 Laplace equation, 650 minimal surfaces, 649 natural boundary conditions, 642 notation, 645 stationary functions, 639
802
INDEX
stationary paths, 638 Vector algebra, 62 Vector field, 71 Vector multiplication, 60 Vector product cross product, 61 Vector spaces, 242 basis vectors, 245 dimension, 246 generators, 244 Vectors addition, 60 differentiation, 72 magnitude norm, 58 vector spaces, 242 Velocity potential, 129 Vernam coding, 775
Wave equation, 541 Cartesian coordinates, 544 cylindrical coordinates, 570 spherical coordinates, 563 Wave function state vector, 754 Weak links, 746 Weber integral Bessel functions, 533 Work done, 84, 113 Wronskian, 429 differential equations, 440 Zero matrix null matrix, 209 Zero-sum games, 738