David
Applied and Numerical Harmonic Analysis Series Editor John J. Benedetto University of Maryland Editorial Advisor...
407 downloads
2631 Views
28MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
David
Applied and Numerical Harmonic Analysis Series Editor John J. Benedetto University of Maryland Editorial Advisory Board Akram Aldroubi N I H , Biomedical Engineering/ Instrumentation Ingrid Daubechies Princeton University Christopher Heil Georgia lnstitute of Technology James McClellan Georgia lnsitute of Technology Michael Unser NIH, Biomedical Engineering1 lnstrumentation M. Victor Wickerhauser Washington University
Douglas Cochran Arizona State University Hans G. Feichtinger University of Vienna Murat Kunt Swiss Federal lnstitute of Technology, Lausanne Wim Sweldens Lucent Technologies Bell Laboratories Marfin Vetterli Swiss Federal lnstitute of Technology, Lausanne
Applied and Numerical Harmonic Analysis J.M. Cooper: Introduction to Partial Differential Equations with MATLAB (ISBN 0-8176-3967-5) C.E. D'Attellis and E.M. Fernandez-Berdaguer: Wavelet Theory and Harmonic Analysis in Applied Sciences (ISBN 0-8 176-3953-5)
H.G. Feichtinger and T. Strohmer: Gabor Analysis and Algorithms (ISBN 0-8176-3959-4) T.M. Peters, J.H.T. Bates, G.B. Pike, P. Munger, and J.C. Williams: Fourier Transforms and Biomedical Engineering (ISBN 0-8 176-394 1- 1) A.I. Saichev and W.A. Woyczynski: Distributions in the Physical and Engineering Sciences (ISBN 0-8176-3924-1)
R. Tolimierei and M. An: Time-Frequency Representations (ISBN 0-81763918-7)
G.T. Herman: Geometry of Digital Spaces (ISBN 0-8176-3897-0) A. Prochazka, J. Uhlii, P.J.W. Rayner, and N.G. Kingsbury: Signal Analysis and Prediction (ISBN 0-8176-4042-8) J. Ramanathan: Methods of Applied Fourier Analysis (ISBN 0-8 176-3963-2)
A. Teol is: Computational Signal Processing with Wavelets (ISBN 0-81763909-8) W.O. Bray and C.V. Stanojevic: Analysis of Divergence (ISBN 0-81764058-4)
G.T. Herman and A. Kuba: Discrete Tomography (ISBN 0-8176-4101-7) J.J. Benedetto and P.J.S.G. Ferreira (Eds.): Modern Sampling Theory (ISBN 0-8176-4023-1) P. Das, A. Abbate, and C. DeCusatis: Wavelets and Subbands (ISBN 08176-4136-X)
L. Debnath: Wavelet Transforms and Time-Frequency Signal Analysis (ISBN 0-8176-4104-1) K. Grochenig: Foundations of Time-Frequency Analysis (ISBN 0-81764022-3) D.F. Walnut: An Introduction to Wavelet Analysis (ISBN 0-8176-3962-4)
David F. Walnut
An Introdution to Wavelet Analysis With 88 Figures
Birkhauser Boston Basel Berlin
D a v i d F. W a l n u t Department of M a t h e m a t i c a l Sciences George M a s o n University Fairfax, V A 22030 USA
Library of Congress Cataloging-in-Publication Data Walril~t,,David 12. Ari iritrod~lct~ion t,o wavelet analysis / David I". Walnut, p. crri. (Applied and n~irrlc!ricalllarri~o~lir itrlalysis) Iric111dc:sl)ibliug~-apl~iral refcl-eiices and indt:x. ISBN 0-81763-3962-4 (alk. paper) 1. Wavelets (Matliemat,ic:s) I. Title. 11. Series. QA403.3 .W335 2001 515'.2433 dc21 2001025367 CIP
Prir~t~ed 011 acid-f'rei: paper. @ 2002 Hirkhii~iserBost,ori
Birkhauser
All rig lit,^ reserved. This work irlay riot. I J ~t,la~lslat,eclor copicd ill whole or in part wit.tlout t,l~r:wl.it,t,t!~~ 1~r.1-lllission of t,he p~lblisher(Birkki&liser Bost,ol~,c/o Sprir~gcr-VcrlagNew York, Irir., 175 Fift,h Avenue, New York, NY 10010, USA), except for brief excerpts in corir~ection wit,11 reviews or scholarly analysis. lJse ill conncctiori with any form of information storage and rctricval, elect,ronic adapt,at,iori, cornplit,er soft,ware, or t)y sirnilar or dissirnilar methodology riow kriown or liereafter developed is forbidden. T h c 11sc of gelirral descript,ivo names, t,radc rlanles, tradcmarks, etc., i11 this publication, oven if t,l~eforrrier are not (:specially ide~~t.ified, is not t,o 1 ~ :t,a.kr~las a, sign that silcli ilalnes, as ulltlerstood l)y tlie Trade Marks anti Mcrcharldise Ma1.k~Act,, rrlay accordingly be used freely by anyoni:.
IS13N 0-8176-3962-4 ISBN 3-7643-3962-4
SPIN 10574019
Product,io~~ marlageti t)y Louise Farkas; riian~ifact,~~ring sl~pa.visedt?y .Jacqui Ashri Typc:sc:t, I)y t,lie aiitlior ill LaTeX2c. Printeti and I)oli~id1,y Edwards Rrot,hers, Inr., Arin Artlor, MI. Pririt,ed i r ~t,lit: l1liitc:d St,at,es of America.
L3irkliiiiist:r Host,o~l Basel
Berlin
A rrr ernbcr of /3erlr,lsn1,ur~nSpringer Science+ B7rs,irir ss hI(:d./:nG,nl.bH
To my parents
and to Megan
Unless the LORD builds the house, its builders labor in vain. -
Psalm 1 2 7 ; l a (NIV)
Contents xiii
Preface
I 1
Preliminaries
1
F'linctions and Convergence 1.1 Functions . . . . . . . . . . . . . . . . . . 1.1.1 Bounded (L") Functions . . . . . 1.1.2 Integrable (L1) Functions . . . . . 1.1.3 Square Integrable (L2) Functions . 1.1.4 Differentiable (Cn) Functions . . . 1.2 Convergence of Sequences of Functions . . 1.2.1 Numerical Convergence . . . . . . 1.2.2 Pointwise Convergence . . . . . . . 1.2.3 Uniform (L") Convergence . . . . 1.2.4 Mean ( L 1 )Convergence . . . . . . 1.2.5 Mean-square (L2) Convergence . . 1.2.6 Interchange of Limits and Integrals
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
2 Fourier Series
2.1
Trigonometric Series . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Periodic Functions . . . . . . . 2.1.2 The Trigonometric System . . 2.1.3 The Fourier Coefficients . . . . 2.1.4 Convergence of Fourier Series . 2.2 Approximat.e Identities . . . . . . . . . 2.2.1 hlotivation from Fourier Series 2.2.2 Definition and Examples . . . . 2.2.3 Convergence Theorems . . . . . 2.3 Generalized Fourier Series . . . . . . . 2.3.1 Orthogonality . . . . . . . . . . 2.3.2 Generalized Fourier Series . . . 2.3.3 Complet.eness . . . . . . . . . . 3 The 3.1 3.2 3.3
. . . .
. . . .
. . . .
. . . .
. . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
Fourier Transform Motivation and Definition . . . . . . . . . . . . . . . . . . . Basic Properties of the Fourier Transform . . . . . . . . . . Fourier Inversion . . . . . . . . . . . . . . . . . . . . . . . .
3 3
3 3 6 9 11 11 13 14 17 19 21 27 27 27 28 30 32 37 38 40 42 47 47 49 52
59 59 63 65
Contents
viii
3.4 3.5 3.6 3.7 3.8 3.9
Coilvolutior~ . . . . . . . . . . . . . . . . . . . . . . Plancherel's Formula . . . . . . . . . . . . . . . . . The Fourier Trarlsfornl for L~ Functions . . . . . . Smoothiless versus Decay . . . . . . . . . . . . . . Dilation, Translation, ancl Modulation . . . . . . . Bandlirnitetl Functiorls and the Sarrlpling Formula.
. . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . .
4 Signals and Systems 4.1 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Systerris . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Causality and Stability . . . . . . . . . . . . 4.3 Periodic Signals a r ~ dthe Discrete Fourier Transform 4.3.1 The Discrete Fourier Transform . . . . . . . . 4.4 The Fast Fourier Transform . . . . . . . . . . . . . . 4.5 L2 Fourier Series . . . . . . . . . . . . . . . . . . . .
I1
. . . . . . . .
. . . . . . . .
. . . . . . . . . . . .
68 72 75 76 79 81
87 88 90 95 101 102 107 109
The Haar System
5 T h e Haar System 5.1 Dyadic Step Functions . . . . . . . . . . . . . . . . . . . . . 5.1.1 The Dyadic Intervals . . . . . . . . . . . . . . . . . . 5.1.2 The Scale j Dyadic Step Functions . . . . . . . . . . 5.2 The Haar System . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 T h e H a a r Scaling Functions and the Haar Functions . . . . . . . . . . . . . . . . . . . . . 5.2.2 Orthogonality of the Haar System . . . . . . . . . . 5.2.3 The Splitting Lemma . . . . . . . . . . . . . . . . . 5.3 Haar Bases on [O, 11 . . . . . . . . . . . . . . . . . . . . . . . 5.4 Comparison of Haar Series with Fourier Series . . . . . . . . 5.4.1 Representation of Functions with Small Support . . 5.4.2 Behavior of Haar Coefficients Near J u m p Discontinuities . . . . . . . . . . . . . . . . . . 5.4.3 Haar Coefficients and Global Smoothness . . . . . . 5.5 Haar B a s e s o n R . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 The Approximation and Detail Operators . . . . . . 5.5.2 The Scale J Haar System on R . . . . . . . . . . . . 5.5.3 The Haar System on R . . . . . . . . . . . . . . . .
115 115 115 116 117
6 The Discrete Haar Transform 6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 The Discrete Haar Transform (DHT) . . . . . . . 6.2 The DHT in Two Dimensions . . . . . . . . . . . . . . . 6.2.1 The Row-wise and Column-wise Approximations and Details . . . . . . . . . . . . . . . . . . . . .
141
117 118 120 122 127 128 130 132 133 134 138 138
. . 141 . . 142 . . 146 . . 146
Contents
6.2.2 6.3 Iinage 6.3.1 6.3.2 6.3.3
I11
ix
The DHT for Matrices . . . . . . . . . . . . . . . . . Analysis with the DHT . . . . . . . . . . . . . . . . . Approximation and Blurring . . . . . . . . . . . . . Horizontal, Vertical, and Diagonal Edges . . . . . . "Naive" Image Compression . . . . . . . . . . . . . .
Ort honormal Wavelet Bases
147
150 151 153 154
161
7 Multiresolution Analysis 7.1 Orthonormal Systems of Translates . . . . . . . . . . . . . . 7.2 Definition of Multiresolution Analysis . . . . . . . . . . . . 7.2.1 Some Basic Properties of MRAs . . . . . . . . . . . 7.3 Examples of Multiresolution Analysis . . . . . . . . . . . . . 7.3.1 The Haar MRA . . . . . . . . . . . . . . . . . . . . . 7.3.2 The Piecewise Linear MRA . . . . . . . . . . . . . . 7.3.3 The Bandlimited MRA . . . . . . . . . . . . . . . . 7.3.4 The Meyer MRA . . . . . . . . . . . . . . . . . . . . 7.4 Construction and Examples of Orthonorrnal Wavelet Bases . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Examples of Wavclct Bases . . . . . . . . . . . . . . 7.4.2 Wavelets in Two Dimensions . . . . . . . . . . . . . 7.4.3 Localization of Wavelet Bases . . . . . . . . . . . . . 7.5 Proof of Theorem 7.35 . . . . . . . . . . . . . . . . . . . . . 7.5.1 Sufficient Conditions for a Wavelet Basis . . . . . . . 7.5.2 Proof of Theorem 7.35 . . . . . . . . . . . . . . . . . 7.6 Necessary Properties of the Scaling Function . . . . . . . . 7.7 General Spline Wavelets . . . . . . . . . . . . . . . . . . . . 7.7.1 Basic Properties of Spline Functions . . . . . . . . . 7.7.2 Spline Multiresolution Analyses . . . . . . . . . . . .
163 164 169 170 174 174 174 179 180
8 The Discrete Wavelet Transform 8.1 Motivation: From MRA t o a Discrete Transform . . . . . . 8.2 The Quadrature Mirror Filter Conditions . . . . . . . . . . 8.2.1 Motivation from MRA . . . . . . . . . . . . . . . . . 8.2.2 The Approximation and Detail Operators and Their Adjoints . . . . . . . . . . . . . . . . . . . . . 8.2.3 The Quadrature Mirror Filter (QMF) Conditions . . 8.3 The Discrete Wavelet Transform (DWT) . . . . . . . . . . . 8.3.1 The DWT for Signals . . . . . . . . . . . . . . . . . 8.3.2 The DWT for Finite Signals . . . . . . . . . . . . . . 8.3.3 The DWT as an Orthogonal Transformation . . . . 8.4 Scaling Functions from Scaling Sequences . . . . . . . . . . 8.4.1 The Infinite Product Formula . . . . . . . . . . . . . 8.4.2 The Cascade Algorithm . . . . . . . . . . . . . . . .
215 215 218 218
185 186 190 193 196 197 199 203 206 206 208
221 223 231 231 231 232 236 237 243
x
Contents 8.4.3
The Support of the Scaling Functio~i. . . . . . . . . 245
9 Smooth. Compactly Supported Wavelets 249 . . . . . . . . . . . . . . . . . . . . . . . 9.1 Vanishing Moments 249 Vanishing Moments and Smoothness . . . . . Vanishing Moments and Approximation . . . Vanishing hilomcnts and thc Reproduction of Polynomials . . . . . . . . . . . . . . . . . 9.1.4 Equivalent Conditions for Vanishing h!Iornent.s The Daubechies Wavelets . . . . . . . . . . . . . . . 9.2.1 The Daubechies Polynomials . . . . . . . . . 9.2.2 Spectral Factorization . . . . . . . . . . . . . Image Analysis with Smooth Wavelets . . . . . . . . 9.3.1 Approximation and Blurring . . . . . . . . . 9.3.2 "Naive7' Image Compression with Smooth Wavelets . . . . . . . . . . . . . . . .
9.1.1 9.1.2 9.1.3
9.2
9.3
IV
. . . . 250 . . . .
254
. . . . 257 . . . . 260
. . . . 264 . . . . 264 . . . . 269
. . . . 277 . . . . 278
. . . . 278
Other Wavelet Construct ions
10 Biort hogonal Wavelets 10.1 Linear Independence and Biorthogonality . . . . . . . . . . 10.2 Riesz Bases and the Frame Condition . . . . . . . . . . . . 10.3 Riesz Bases of Translates . . . . . . . . . . . . . . . . . . . 10.4 Generalized Multiresolution Analysis (GMRA) . . . . . . . 10.4.1 Basic Properties of GMRA . . . . . . . . . . . . . . 10.4.2 Dual GMRA and Riesz Bases of Wavelets . . . . . . 10.5 Riesz Bases Orthogonal Across Scales . . . . . . . . . . . . 10.5.1 Example: The Piecewise Linear GMRA . . . . . . . 10.6 A Discrete Transform for Biorthogonal Wavelets . . . . . . 10.6.1 Motivation from GMRA . . . . . . . . . . . . . . . . 10.6.2 The QMF Conditions . . . . . . . . . . . . . . . . . 10.7 Compactly Supported Biorthogonal Wavelets . . . . . . . . 10.7.1 Compactly Supported Spline Wavelets . . . . . . . . 10.7.2 Symmetric Biorthogonal Wavelets . . . . . . . . . . 10.7.3 Using Symmetry in the DWT . . . . . . . . . . . . . 11 Wavelet Packets
289 289 290 293 300 301 302 311 313 315 315 317 319 320 324 328
335 11.1 Motivation: Completing the Wavelet Tree . . . . . . . . . . 335 11.2 Locs.1iza.t.ion of Wavelet Packets . . . . . . . . . . . . . . . . 337 11.2.1 Time/Spatial Localization . . . . . . . . . . . . . . . 337 11.2.2 Frequency Localization . . . . . . . . . . . . . . . . 338 11.3 Orthogonality and Completeness Properties of Wavelet Packets . . . . . . . . . . . . . . . . . . . . . . . . 346 11.3.1 Wavelet Packet Bases with a Fixed Scale . . . . . . 347
xii
Conterlts B .1.2 Wavelets with Rational Noninteger Dilation Factors . . . . . . . . . . . . . . . B . 1.3 Local Cosine Bases . . . . . . . . . . . . . . B . 1.4 The Co~ltinuousWavelet Transform . . . . B . 1.5 Non-MRA Wavelets . . . . . . . . . . . . . B . l.G Multiwavelets . . . . . . . . . . . . . . . . . B.2 Wavelets in Other Domains . . . . . . . . . . . . . B.2.1 Wavelets on Intervals . . . . . . . . . . . . B.2.2 Wavelets in Higher Dimensions . . . . . . . B.2.3 The Lifting Scheme . . . . . . . . . . . . . B.3 Applications of Wavelets . . . . . . . . . . . . . . . B.3.1 Wavelet Denoising . . . . . . . . . . . . . . B.3.2 Multiscale Edge Detection . . . . . . . . . . B.3.3 The FBI Fingerprint Compression Standard
C References Cited in the Text Index
. . . . . 434 . . . . . 434 . . . . . 435 . . . . . 436 . . . . . 436 . . . . . 437 . . . . . 437 . . . . . 438 . . . . . 438 . . . . . 439 . . . . . 439 . . . . . 439 . . . . . 439
441
Preface These days there are dozens of wavelet books or1 the market, sonie of which are destiried t o be classics in the field. So a natural question to ask is: Why another one? I11 short, I wrote this book t o slipply tlie particular rieeds of students in a graduate course on wavelets that I have taught several tirnes since 1991 a t George Mason University. As is typica.1 with sllcli offerings, the course drew an audience with widely varying backgrounds and widely varying expectations. The difficult if not inlpossible task for me, the instructor, was t o present the beauty, usefiilriess. arid matliernatical depth of the sul~jectt o such an auclience. It would be insaiie t o claiin that I have been entirely successful in this task. However, through much trial ant1 error, I have arrived a t sorrle basic principles that are reflected in the stri~ctilreof this book. I believe that this makes this book distinct frorn existiilg text,s. and I hope that others may find the book useful. (1) Consistent assumptions of mathematical preparation. In sonle ways, the subject of wavelets is deceptively easy. It is not difficult to understand and implement a discrete wavelet trarlsforni and from there tlo analyzc and process signals arid irriages with great success. However, the underlying itleas and conrlectiorls that rrlake wavelets such a fascinating subject require some considera1)lc rriathematical sophistication. There have bceil some excellent books written on wavelets cnlphasizirlg their eleinelitary nature (e.g., Kaiser, A I;i-iendlv Glzide to Wavelt:ts; Strang and Nguyen, Wavelets and Filter Banks; Walker, P r i m ~ ron Wavelets and their Scientific Applications: Frazier, Irltroduction to Wavelets through Linear Algebra; Nievergelt, Ilitzvelets Made Easy; Ateyer, Wavelets: Algorithms and Applications). For rriy own purposes. such texts required quite a bit of "filling in the gaps" in order to make some conrlections and to prepare the student for rnore advanced books and research articles in wavelet theory. This book assuriies an upper-level undergraduate semester of advanced calculus. Sufficient preparation would corrle from, for example, Chapters 1-5 of Buck, Advanced Calculus. I have tried very hard not t o depart from this assumption a t any poirit in the book. This has required at times sacrificing elegance and generality for accessibility. However, all proofs are completely rigorous and conta.in the gist of the more general argunient. In this way, it is hoped that the reader will be prepared to tackle niore sophisticated books and articles on wavelet theory. (2) Proceeding from the continuous to the discrete. I have always found it more meaningful and ultirnately easier to start with a presenta-
xiv
Preface
tion of wavelets arid wavelet bases in the continuous dornain and use this to motivate the discrete theory, even thougli the discrete theory liarlgs together in its own right and is easy to understaiid. This can he frustrating for the student whose priniary int,erest is ill applications, but I believe that a better understanding of applications can ultirrlately be achieved by doing things in this order.
(3) Prepare readers to explore wavelet theory on their own. Wavelets is too broad a subject to cover i ~ai single book and is iriost interesting to study when the students have a particular interest in what they are studying. 111 clloosirig what to include in the book, I have tried t o ensure t,hat students are equipped to pursue more advanced topics on their own. I have included an appendix called Excursions in Wavelet Theory (Appendix B) that gives sorrie guidance toward what T consider to he t,he iriost readable articles on sorrle selected topics. The suggested topics in this appendix ca,rl also be used as the basis of serrlester projects for the students.
Structure of the Book The book is divided into five part,s: P~elirni7~n7..ies, Thc Hu,nr. Systcrrl, hfultiresol~~tion, Ar~nlysisand Orthonormal Wavelet Bases, Other Wavelet Constr.uctions, and Applications. Preliminaries Wavelet theory is really very liarcl to appreciate outside tlio context of t,hc language and ideas of Fourier Analysis. Chapters 1-4 of tlie book provide a background in sorrie of these ideas arid include cverytl.iing that is subsequerltly used in the text. These chapters are designed to be more than just a reference but less tllan a 'bbook-witliin-a-book" or1 Fourier analysis. Depenclirig on the background of the reader or of the class in which t,liis book is being used, these chapters are intended to t ~ edipped into eitllcr sliperficially or in detail as appropriate. Naturally there are a great rliarly books on Fourier analysis that cover the same rriaterial better and rnore thoroughly than do Chapters 1-4 and a t the sarne level (more or less) of Inathenlatical sophistication. I will list some of my favorites below. Walker, Fourier Analysis; Ka~nmler,A First Course in Fourier Analysis: Churchill and Brown, h u r i e r Series a i d Boundary Value Problems: Dym and McKean, Fourier Series a,nd Integrals; Korner, Fourier Analysis; and Benedetto, Harmonic Analysis and Applications. The Haar System Chapters 5 and 6 provide a self-contained exposition of the Haar systern. the earliest example of an orthor~orrrialwavelet basis. These chapters could
Preface
xv
be presented as is in a course on a d v a ~ ~ c ecalculus, d or an undergraduate Fouricr analysis coursc. In the context of the rest of the book, these chapters are designed to motivate the search for rrlore general wavelet bases with different properties. and also t o illustrate some of the more advanced concepts such as multiresolutiorl analysis that are used throughout the rest of the book. Chapter 5 contains a description of the Haar basis on [O, 11 and on R, and Chapter 6 shows how t o implernerlt a discrete version of the Haar basis in one and two dimensions. Some exa~rlplesof inlages analyzed with the Haar wavelet are also included.
Ort honor~nalWavelet Bases Chapters 7-9 represent the heart of the book. Chapter 7 contains an exposition of the general notion of a multiresolutiorl analysis (MRA) together with several examples. Next, we describe the recipe that gives the construction of a wavelet basis from an MRA, and then construct corresponding cxarrlples of wavelet orthonorinal bases. Chapter 8 describes the passage from the continuous domain to the discrete domain. First, properties of MRA are the11 used to niotivate and define the quadrature mirror filter (QMF) conditions that any orthonormal wavelet filter must satisfy. Then the discrete wavelet transfornl (DWT) is defined for infinite signals, periodic signals, and for finite sets of data. Finally the techniques used to pass from discrete filters satisfying the QhlF conditions to continuously defined wavelet functions are described. Chapter 9 presents the cor~struction of compactly supported orthornomal wavelet bases due to Daubechies. Daubechies's a.pproa.ch is motiva.ted by a, lengthy disclissiori of the importance of vanishing moments in the design of wavelet filters.
Other Wavelet Constructions Chapters 10 and 11 contain a discussion of two inlportarlt variations on the theme of the const,ri~ct,ion of orthonormw,l wavelet, ba.ses. The first. in Chapter 10, shows what happens wlleri you consider nonorthogonal wavelet systems. This chapter contains a discussion of Riesz bases, and describes the serni-orthogonal wavelets of Chui and Wang. as well as the notion of dual MRA and the fully biorthogonal wavelets of Daubechies. Cohen. arid Feauveau. Chapter 11 discusses wavelet packets. another natural variation on orthnorrnal wavelet bases. The motivation here is to consider what happens to the DWT when the '.full wavelet tree" is conlputed. JVavelet packet functions are described, their time and frequency localizatio~iproperties are discussed, and necessary and sufficierit conditions are give11 u n d e ~which a collection of scaled and shifted wavelet packets constitutes an orthonormal basis on R. Finally, the notion of a best basis is described. and the so-called best basis algoritliln (due to Coifman and Wickerhauser) is given.
xvi
Preface
Applications Many wavelet books have been written emphasizing applications of the theory, most notably, Strang and Nguyen, Wavelets and Filter Banks, and Mallat's comprehensive, A Wavelet Tour of Signal Processing. The book by Wickerhauser, Applied Wavelet Analysis from Theory to Software, also contains descriptions of several applications. The reader is encouraged to consult these texts and the references therein to learn more about wavelet applications. The description of applications in this book is limited to a brief description of two fundamental examples of wavelet applications. The first, described in Chapter 12, is to image compression. The basic components of a transform image coder as well as how wavelets fit into this picture are described. Chapter 13 describes the Beylkin-Coifman-Rokhlin (BCR) algorithm, which is useful for numerically estimating certain integral operators known as singular integral operators. The algorithm is very effective and uses the same basic properties of wavelets that make them useful for image compression. Several examples of singular integral operators arising in ordinary differential equations, complex variable theory, and image processing are given before the BCR algorithm is described.
Acknowledgments I want t o express my thanks t o the rnany folks ~ 1 . 1 0made this book possible. First and foremost, I want to thank my advisor and friend John Benedetto for encouraging me to take on this project and for graciously agreeing to publish it in his book series. Thanks also to Wayne Yuhasz, Lauren Schultz, Louise Farkas, and Shosharina Grossman at Birkhauser for their advice and support. I want to thank Margaret Mitchell for LaTeX advice and Jim Houston and Clovis. L. Tondo for modifying some of the figures to make them more readable. All of the figures in this book were created by me using MATLAB and the Wavelet ToolBox. Thanks to the Mathworks for creating such superior products. I would like also t o thank the National Science Foundation for its support and to the George Mason University Mathematics Department (especially Bob Sachs) for their constant encouragement. I also want to thank the students in my wavelets course who were guinea pigs for an early version of this text and who provided valuable feedback on ~rganiza~tion and found numerous typos in the text. Thanks t o Ben Crain, James Holdener, Amin Jazaeri, Jim Kelliher, Sami Nefissi, Matt Parker, and Jim Tirnper-. I also want t o thank Bill Heller, Joe Lakey, and Paul Salamonowicz for their careful reading of the text and their useful comments. Special thanks go to David Weiland for his willingness to use the manuscript in an u~ldergraduatecourse at Swarthmore College. The book is all the better
Preface
xvii
for his insights, and those of the unnamed students in the class. I want give special thanks to my Dad, with whom I had many conversations about book-writing. He passed away suddenly while this book was in production and never saw the finished product. He was pleased and proud to have a ~ ~ o t h published er author i11 the family. He is greatly missed. Finally, I want t o thank my wife Megan for her constant love and support, and my delightful children John and Genna who will someday read their names here and wonder how thcir old man actually did it.
Fairfax, Virginia
David F. Walnut
Albrecht Diirer (1471-1528), Melencholia I (engraving). Courtesy of the Fogg Art Museum, Harvard University Art Museums, Gift of William Gray from the collection of Francis Calley Gray. Phot,ograph by Rick St,a,fford, @President and Fellows of Harvard College. A detail of this engraving, a portion of the magic square, is used as the sample image in 22 figures in this book. The file processed is a portion of the image file detail.mat packaged with MATLAB version 5.0.
Part I
Preliminaries
Chapter 1 Functions and Convergence 1.1 Functions I . I . 1 Bounded
(L")Functions
D e f i i t i 1 . 1 A pzece,wise C O ~ I L L ~ ~ ~ ~ L U O ~~ UUS, I L C ~ ~ , Of T( Z5 ) defined 01%a n interval I is bounded ( o r L m ) o n I i f there i s a number A1 > O such that If (z)l 5 A1 for allx E I. T h e L m - n o r m of a functzor~f ( 2 ) i s defined b y IlfI2 = s 1 1 p { l f ( x ) I : : rE I ) .
(1.1)
Example 1.2. (a) If I is a closed. firlit,e interval, tlien nriy fil~lctio~l f (cc) contiriuous on I is also Lm 011 I (Theorern A.3). (b) Tlie fiirictiori f (z)
=
1/x is corltinuous and has a finite value at each
point of the irlterval ( O , l ] but is not bounded on (0, 11 (Figure 1.1).
(c) The functiorls f (x:) = sin(z) arid f (z) = cos(z) are Lm on R. Also. the complex-valued function f (z) = eix is Lm on R. 111 fact, 11 sin 1 1 , = 11 cos 1 1 , = Ileirllm = 1.
(d) Ariy polynornial f~lrictionp ( z ) is not Lw on R but is L" on every finite subinterval of R. (e) Any piecewise coritinllous fiirlctiori with orily jump discontirinities is L" ariy firlite interval I.
011
I . 1.2 Integrable (L1) Functions Definition 1.3. A piecewise continuous function f ( x ) &finned o n n,n, i,n,terual I is integrable ( o r of class L' o r simply L') o n I if the integral
is finite. The L'-norm of a function f ( x ) i s defined b y
4
Chapter 1. Functions and Convergence
FIGURE 1.1. Left: f (x) = 1/x is finite-valued but unbounded on (0, 11. Right: sin(x) (solid) and cos(jc) (dashed) are L" on R.
Example 1.4. (a) If f (z) is LW on a finite interval I, then f (z)is L1 on I. (b) Any function continuous on a finite closed interval I is L' on I . This is because such a. function must be Lm on I (Theorem A.3). ( c ) Any function piecewise continuous with only jump discontinuities on a finite closed interval I is L' on I.
(d) For any 0 < a < 1, the function f(z)= Izl-" is L1 on the interval [- 1, I]. Clearly f (z) is picccwise continuous with a11 i~~firiite discontinuity 1 ( f(x)ld z is improper and must be evalua.t,ed at x = 0. Thus the integral as an improper integral a,s follows: Is/-"dz
-
-E
12)"
ds
1
+ lirn
/xi-" dl:
E+O
1
1-a
lim (1 -
E-+O
=
2
The above example shows that an L1 function need not be Lm. If a then f(x) is not L1 on [-I, I]. ( e ) If a > 1, the function f (x) = xp" is L1 on the interval improper Riemann integral
converges.
> 1,
[I,ocl) since the
1.1. Functions
5
( f ) If 0 5 a 5 1, then f (x)= xpLYis not L1 on [ I ,co).But f (x)is L" on [I,co).This shows that an Lm function need not be L' on I if I is infinite. (g) The function f (x)= e-ixl is integrable on R since the improper Riemann integral
1, 00
e-1.1
converges. In fact,
JTme
x dx
d.
= 2.
We present below our first approximation theorem. It says that any function L1 on R can be approximated arbitrarily closely in the sense of the L1-norm by a function with compact support. Theorem 1.5 is illustrated in Figure 1.2.
Theorem 1.5. Let f (rc) be L' o n R, and let E > 0 be given. T h e n there exists a number R such that 2f
Then
Proof: Since f (x) is integrable, the definition of the improper Riemann integral implies that there is a number v such that rr
Hence, given
E
> 0, there
Pick a nurrlber R
Then,
roo
is a number ro
> 4.0, and define
> 0 such that
if r
> ro, then
6
Chapter 1. Functions and Convergence
FIGURE 1.2. Illustration of Theorem 1.5. Left: Graph o f f (x). Area of shaded region is < F. Right: Graph of g(x) with R = 10.
Definition 1.6. A piecewise continuousfunction f (x) defined o n a n interval I is square-integrable ( o r of class L' or simply L') on I i f the integral
i s finite. The
norm of a function
f (x) is defined by
Example 1.7. (a) Any function bounded on a finite interval I is also L2 on I. This includes functions continuous on closed intervals and functions piecewise continuous on closed intervals with only jump discontinuities. (b) Any function that is La and L1 on any interval I (finite or infinite) is also L~ on I.
(c) For any 0 < a < 112, the function f (x) = x-" is L2 on the interval [-I, 11. Therefore an integrable function need not be bounded. If a 112, then the corresponding f (x) is not L2 on [-I, 11.
>
(d) If a > 1/2, the function f (x) = xp" is L2 on the interval [I,m ) . If 0 Q 1/2,then the corresponding f (x) is not L2 on [I,m ) .
<
0, there is a n N > 0 such that if n N , then (a,, - a1 < t. In this case, we uirite a , t a as n t m, o r lim,,,, a,, = a . A numerical series, denoted a,,, convcrgcs to a number S i f the sequence N of partial sums { S N ) N € N defined b y s~ = E n = l an converges t o S . I n this case, we write E F = I an = S. W e will frequently denote the series a, b y
>
zr=l
xr=,
-
L n E N
A series
CnEN a , converges absolutely if EILEN la,,I converges.
Remark 1.22. (a) A fundamental property of the real numbers is known as the completeness property. The completeness property for the real numbers says that every set of real numbers bounded above has a supremum (or least upper bound). (b) A sequence of real numbers {an)nENis an increasing sequence if a, 5 a,+l for all n E N. The completeness property of the real numbers implies
"
'Since we assume that f (a)is undefined if f (x) has a discontinuity at x = a,then f (x) s On means "f (x) = 0 a t each point of continuity of f (x)."
12
Chapter 1. Functions and Convergence
that any bounded, increasing sequence of real numbers always converges to its least upper bound. a,, The partial sums of a series with nonnegative terms, (i.e., CnEN where a, 0) form an increasing sequence. Therefore, it follows that if a series of nonnegative terms is bounded, then it converges.
>
(c) A sequence of real or complex numbers is Cauchy if for every e > 0, there is an N > 0 such that if n , m _> N , then la, - a,l < t. Another consequence of the conlpleteness property for real numbers is the following: Every Cauchy sequence of numbers converges.
Example 1.23. (a) Consider the series quence can be computed since
If Iri < 1, then
SN
-+ 1/(1
-
r ) as N
Cr--nr n . The partial ,-
+ 00.Therefore, if
Irl
surn sc-
< 1, then
(b) Consider the series C,q"=l l / n ? Clearly,
Therefore,
for all N. Since each of the terms l / n 2 is positive, { s N l N E Nis a bounded, increasing sequence. Therefore, it converges t o its least upper bound and the series Cr=l1/n2 converges. Note that we have proved that the series converges but we have made no statement about the iralue of its limit. The same argument can be used to show that the series C:=l l/nP converges for every p > 1, but again does not give the value of its limit. (c) The Weierstrass M-test is a well-known test for convergence of a series. Consider the series En a,. The Weierstrass M-test says that if la, 1 5 b, for all n and if CntNLT1 converges, then CTrEN an converges. For example, consider the series cos(n)/n2. Since 1 cos(n)/n21 5 l / n 2 for all n and since Cr=,1/n2 converges, so does C:=, cos(n)/n2. Note that again we have proved that the series converges but have not given the value of its limit.
,,
En=,
1.2. Convergence of Sequences of Functions
13
(d) A consequence of the Weierstrass M-test is the following. If a series converges absolutely, then the original series also converges. Absolute convergence is equivalent to saying that the series converges regardless of the order in which the terms are summed. It is not true that all convergent series are absolutely convergent. For example, it is shown in most calculus books that the series x~=,(-l)"/71 converges but the harmonic series C z = l l / n does not. In dis(e) A doubly infinite sequence is a sequence of the forrn cussing the convergence of such sequences, we look at two lirnits, namely, linl,,-too an and lim,,,, a_,. If both converge to the same number, say, to a , then we write lirnl,,l+, a,,= a.
(f) A doubly infinite series is a series of the forrn C r = - m a,. In discussing cc the convergence of such series, we look at two series, rlxnlely, Cr,=, arband 00 El,=,a P n. If both of these series converge, then tliere is no problem. If C T00L =a, , = St and C:IP=, a p r 1= S ,then Cr=-, a,, = on+St+S- = S . In this case, we write also l i ~ n ~ , ~ ~ , ,r = a,,- =~S. ~ We will frrqllently denote the series a,, by C r L E artZ 01. simply by C,,a,, .
xr=-,
(g) If a doubly infinite series converges absolutely, then it converges regardless of the order in which the terms are surnmed. This is not tkic casc with series that do not convergc absolutely. Consider the series l / n ,where the n = 0 term is understood to be zero. Clearly, this series does not conN verge absolutely. However, because of cancellst ion, s~ = C rl=- 1/71 = 0. Hence, the sgmmetric partial sums corivergc to zcro. However, if we define
x:=-,
N2
S
N
>
~
1/x dz
= 111 N
+ oo
as N + ca. Therefore, if a doubly infinite scries does rlot converge absolutcly, then the form of the partial sums rnust be givcrl explicitly in order to discuss the convergence of the series. This is true of any series that converges but not absolutely.
2
. Pointwise Convergence
Definition 1.24. A sequence of functions {f, ( x ) ) ~ defined ~ ~ N o n a n interval I converges pointwise t o a function f (x) i f f o r each xu E I , the numerical sequence { f n ( x O ) ) n E N converges t o f (xu). W e write f,(x) + f (x) pointwise o n I, as n
4 00.
T h e series
Cr==l l T L ( x= ) f (x) po%nt,wiseo n a n inter~1alI
i f ,for each
XU
E I,
C:y, fn(x0) = f (20). Example 1.25. (a) Let f,,(x) = zn, x E [0,1) for all 71 E N. Then fn(z)+ 0 poilltwise 011 [O, 1) as 12 + a. See Figure 1.4(a).
14
Chapter 1. Furlctiorls and Convergence
(b) Let
2n:c 2 - 2nz 0
+ 0 pointwise on
Theri f,,(.r.)
ifx~[O.l/2n) if T E [1/2n,l / n ) if x E [ l / n , 11.
[0, 11. See Figure 1.4(1-I).
(c) Lct
2n2x
Tlle11 f ,,( J : )
+ 0 pointwise oil
if z E [O, 1/211) if x E [1/21~,l / n ) if x E [ l / n , 11.
[0, 11.See Figlil-t. 1.4(c)
1
zr' = -poir~twiseon (- 1. 1).
(d) The series
1-
r,=O
3(,
COS 'I).[.'
- converges poiritwise on R to its liniit by the
( f ) Tlie serics
11,2
00
COS 7) :X'
-- coilvergcs at odd lrlult,iplesof
( g ) The series r,=l
;.r
(since it reduces
'It '
t o thc: :tltcrnatirig scrics C r = l ( - l ) " / n ) but divergcs a t even m~iltiplcsof sr (since it rcduccls t o the liarrr~o~lic series). In h c t . it can be shown that t h e series coiivcrgcs for all .c that are riot ever1 i ~ i u l t i p l ~ofs K .
1.2.3
Unrifornl ( L m ) Convergence
Definition 1.26.
T h e sequc,rLce { f,, (z)},, E~ converges ~lrliforrnlyon I to t h e hirlctiorl f (.c) i f for r:uery c > 0, t h ~ v is : u n N > 0 .such that i f 11 2 N , t / ~ e , r ~ 1 f,L(x)- f (r)I < 6 for all z E I . We write f r L ( . r )+ f ( x ) ~ n i f o , r m l yo n 1 ass 71 + m . T h e series f , , ( z ) = f ( z ) u n i f o m l ~o n I zf the sequence of partial sums SN(X)
=
xr=,
zit=l fn N
( 2 ) converges
~ ~ n i f o m to l y f (x) on I .
Remark 1.27. (a) With uniforl~lconvelgence. for a given E the sanie N works for all x E I, whereas with pointwise Convergence N may depend on both F arid x. In other words, unifornl convergerice says that given E > 0 there is an N > 0 such that for all n ',N , the maximum difference between f,,(x) and f ( J ) on I is smaller than E . Because of this. uniform convergence
1.2. Convergence o f Sequences of F~lnctions
15
FIGURE 1.4. Top Left: Graph of , f , , ( x ) = .I." on [0, 1) for 11 = 2 , 4.8. Top Right: Graph of f,,(:x:) on [ 0 ,1 ) where f , , ( . ~ : is ) defined in Example 1.25(b). Bottom: Graph of f , , ( x ) on [O. 1 ) where f , , ( s ) is defined in Example 1 . 2 5 ( ~ ) .
is also called LX con?)f>.rgcJncr. Tliat is, f,,(.r.) only if 11 f,, f ,1 + O as 11 + 3 ~ .
--+
f(.r.)
unifori~~ly on I if ant1
-
(11) 111 Exainple 1.25(1)), thr. coilvcrgcnce of' f,, (s)t o 0 is pointwisc but not uniforrn. This is t)ecausc the nlaxirllur~~ cliffcrerlcc between ,f,,(r)ant1 t,hc liniit filrlctiol~f (1:) = 0 is 1 110 ma.t,t,e~. what I , is. 111 otller words. 11 f r ,- f I m = 11 fr,llno = 1 for all n,, aild so I f r ,- f 11% f i O as 11 + m-
(c) 111 Exall~ple1.25(c), t,llc convergc~llrbe of f,, (.I.) t o O is also poiiltwisc 1)ut not ~inifornl.111 fact, in this cast‘, 11 f,, - .f llcx: = Ilf,, llx = 7 1 h r a11 1 1 . Tliereforc 11 f,, - f ,1 + cc as n + m. 111fact,. there. an.c3 110 exainplcs of sequences that converge ul~iforrrllyon a n interval but not poiiltwise. 111 otlier words, the followil~gtheorel11 11olds.
Theorem 1.28. I f f , , (.r) + f
(.c)
in L"
071
a n ~ntcmnlI . then f,,(.r)
+ f (z)
poi7~t7uise an I .
Proof: Exercise 1.44.
An irllportai~ttheorcn~fro111 ;idvailced calc111usis t,he followillg. Its proof
16
Chapter 1. Functions and Convergence
is left as an exercise but can be found in almost any advanced calculus book (for example, Rurk, p. 266, Theorem 3).
Theorem 1.29, If fn(x) + f (x) unzformly on the interval 1 , and if each f , , ( x ) is continzious on I , t h . ~ nf , (z) is con,tin.uo?sson I .
Proof: Exercise 1.45.
Example 1.30. As an illustration of Theorem 1.29, let
Then each f n (x) is continuous on [- 1,1]and { frL( x ) } , , converges ~~ pointwise to the function f ( z ) defined by
which has a jump discontinuity at x = 0 (see Figure 1.5). It car1 be shown directly t,hat f,,(z) does not converge to f (z) in Lm on [-I, 11,but a different argument utilizing Theorem 1.29 would be as follows. If f l L (x) + f (z)in Lm on [-I, 11, then since each f, (z) is continuous, Theorem 1.29 irmplies that f (z)shou,ld also be continuous. Since this i s not
the case, the convergence cannot be i n La.
Example 1.31. (a) The sequence { x n I r L E N converges iiniformly to zero on [-a,a] for all O < a < 1 but does not converge uniformly t o zero on (-111). -
~
(b) The series
1 I" = - uniformly
1-x
n=O
on [-a, a] for all 0
< a < 1, but
not on (- 1 , l ) . 30
(c) The series
1n!
IC -=
ex uniformly on every finite interval I, but not on
n=O
R. rX,
cos n x
- converges
(d) Tlie series n=l
Weierstrass &!-test.
n . .2
uniformly t o its limit on R by the
1.2. Convergence of Sequences of Functions
FIGURE 1.5. Left: Graph of f,,(x) of Example 1.30 for Right: Graph of the limit function f (.r).
1.2.4
71.
17
= 2, 4 , 8.
Mean ( L 1 )Con,veq:rlmce
Definition 1.32.
T h e sequence { f , , ( in ruean to the function f ( s ) 071, I if
x ) ) , ~ ~defined N
o n a n interval I coilverges
W e write f,,(x)+ f(r) in mean orL I a s TL + a. Mean convergence is also referred to as L' convergence because f,,( x ) + f (.r) in m e a n o n I as rl + oo is identical t o the staternent that lim,,,, 11 f,,- f 111 = 0 . The series fi,(x)= f (s) in rnean o n I if the seyuer~ceof yar.tir~lsums N S N (n.) = f,, (x) converges i n mean to f (x) 0 7 , I .
Cr=, CIL=,
Llean convergence can be interpreted as saying that the area between the curves y = f,,(z)and y = f (z) goes t o zero as 7 1 + oc. This type of convergence allows point values of f,,(.~.) and f (x) to differ considerably but says that o n averagc t,hc functions f',,(x) and f ( z ) are close for large n.
Example 1.33. (a) Let J;,(le) = .c". :r* E [O,1)for all r~ E N. As we have seen in Example 1.25(a),this sequence converges to f (z) = 0 poirltwisc on [O, 1) but not uniformly on [0, 1). Since
as n
+ oo:f,,(z)+=O in mean on [o. 1).
18
Chapter 1. Functions andconvergence
-
(b) Consider the sequence {fn(z)),,EN defined in Exarnple 1.25(b). The sequence converges pointwise but not uniformly t o f ( z ) 0 on [0, 11. Since the area under the graph of f,(z) is 1/2n for each n. the sequence also converges in mean t o f (z) on [0, 11. In this example, we can see the character of mean convergence. If n is large, the function f,(z) is close to the limit function f (z) 0 (in fact identical to it) on most of the interval [ O , l ] , specifically on [1/n, 11, and far away from it on the rest of the interval [O. 1/72?. However, 011 average, f, (z) is close t o the lirnit function.
-
(c) The sequence { f , ( ~ ) )defined , ~ ~ in Example 1.25(c) tells a different story. The sequence converges pointwise but not uniformly to f (z) Y 0 on [O. 11, but since the area under the graph of f,, (x) is always 1, f,(z) does not converge to f ( z ) in mean. The width of the triangle under the graph of f (z) decreases t o zero, but the height increases to infinity in such a way that the area of the triangle does not go t o zero. The above examples show that sometimes pointwise convergence and mean convergence go together and sometimes they do not. The proof of the followirlg theorern is left as ail exercise (Exercise 1.47).
Theorem 1.34. If f,(x)
-+ f (x) in Lm
on n finite interval I , then f,(x)
+
f ( x ) in L' on I .
Remark 1.35. (a) The conclusion of Theorem 1.34 is false if the interval I is infinite. Consider fur exarrlple the sequence f , (x) = ( l / n )X [ o , n (x). ~ Then f,(s) i 0 in L" on R but Jyx I f , (z) - 01 dx = 1 for all n, so that f,(z) does not converge to zero in L1. (b) The converse of Theorem 1.34 is also false, as can be seen by considering Example 1.33(b). In this example, f,,(x) convcrges to 0 in L' on [ O , 1 ] but does not converge to 0 in L" on [0, 11. (c) In all of the examples of mean convcrgcncc considered so far, the sequences have also converged pointwise. hfust this always be the case? The by the following example. answer turns out t o be "no," as is ill~st~rated
Example 1.36. Define the interval I i . k by I,i.k = [2-jk, 2 - J ( k + I ) ) , for j E Z+ and 0 5 k 5 2J - 1. Let us make some elementary observations about the intervals I j , k . (a) Each
Ij.k
is a subinterval of [0, 1).
(b) The length of
is 2-3; that is, I I i ~ k= 2 - j .
(c) Each natural number n corresponds t o a unique pair (j,k), j E Z+ and 0 k 5 2.7' - 1, such that n = 2.7' + k. For each n E N, call this pair (j,, , k,). As n --+ co,j, -+ co also.
0, f,, (x)3 f (x) in L" or in L' o n [-R, R ] . T h a t i s , ,for each R > 0 ,
I f f (z)i s L1 on a n interval I and 2f there i s a function g(x), L' o n I , such that for all x E I and all n E N , I f , (x)1 I g(x), then
Proof: If I is a, finitmeinterval, then there is nothing to do by T h e e rem 1.40(b) and (c), so wc Inay assume that I is infinite, and for convenience we will take I = R. By Theorem 1.40(a), it will be sufficient t o prove that f,, (z) + f (z) in L1 on R. Let e > 0. Since f (x)a,nd g ( z ) are L' on R, by Theorem 1.5, there is a number R > 0 such that
If
(o.)( d x
< 1/3
and
/
J,q(x) 1 di < r/3.
rl>R
Therefore, usirig the triangle inequality for the L1-norm (Exercise 1.18(c)),
By 'l'heorem 1.34 and Theorem 1.38(b),if f,,(x) + f (z) in LtX or L~ on [-R, R], then it also converges in L1 on [-R, R]. That is,
Hence, there is an N such that if n
LR1 R
fn
Therefore, if n
> N, then
> N, then
(x) - f ( x i I dz
< d3.
24
Chapter 1. Functions and Convergence
and (1.7) follows.
Next we present a variant of Theorem 1.41.
+ f (x) in L" o r in L~ and a n N E N such that for all
Theorem 1.42. Suppose that f o r every R > 0, f,,(x) o n [-R, R]. If for every n 2N,
t
> 0 , there
is an R
>0
Then
Proof: The proof is the same as that of Theorem 1.41, except that we choose R > 0 and N E N such that for all n 2 N.
Then (1.8) hecomes
from which (1.9) follows.
Exercises Exercise 1.43. Prove each of the statements made in Example 1.25. Exercise 1.44. Prove Theorem 1.28. Exercise 1.45. Prove Theorern 1.29. Exercise 1.46. Prove each of t,he claims made in Example 1.31. Exercise 1.47. Prove Theorern 1.34. Exercise 1.48. Prove Theorem 1.38.
1.2. Convergence of Sequences of Functions
25
Exercise 1.49. Prove that if f,(x) is defined as in Example 1.33(b),then f,(x) 4 0 in L~ on [ O , l ] . Exercise 1.50. (a) A sequence of functions { f , ( ~ ) ) ,defined ~ ~ on an interval I is said to be unzformly Cauchy on I if for every E > 0, there is an N > 0 such that if n, m 2 N then 11 f n - f,lI, < E . Prove that any sequence that converges in L" on I is uniformly Cauchy on I.
(b) A sequence of functions { f n ( x ) j n E Ndefined on an interval I is said to be L1 Cauchy on I if fur every E > 0, there is an N > 0 such that if n , m 2 N , then 11 f, - f,lll < E for all x E I. Prove that any sequence that converges in L1 on I is L1 Cauchy on I . (c) A sequence of functions { f n ( x ) j n E Ndefined on an interval I is said to be L2 Cauchg on I if for every E > 0, there is an N > U such that if n , m 2 N , then I(f , - f, ( I z < E for all x E I. Prove that any sequence that converges in L2 on I is L~ Cauchy on I.
Chapter 2 Fourier Series 2.1 Trigonometric Series 21.1
Periodic Functions
A fur~ctionf (x) defined o n R has period p > 0 Lf f ( r+ p ) = f (x) for all x E R. Such a function is said t o be periodic.
Definition 2.1.
Remark 2.2.
(a) Tlie functions sin(x) arid cos(x) have periocl 2n. The functions sin(ax) and cos(a,z), a > 0. have period 27rla. (b) If f (x) has period p > 0. it also lias period kp. for k E N. IIeilce a periodic function car1 have many periods. Typically the sirlallest period of f (x) is referred to as the pernod of f ( x ) .
Definition 2.3.
Given a function f(z)on R. n.n,d periodization o f f (x) is defined as the fun,ction
0,
number. p
> 0 , the
p
provzded that the sum makes sense. See Figure 2.1
Remark 2.4. (a) It is easy to verify that in fact the function f,,(x) lias period p by nlakirig a change of sunimatiori index in the sun1 on the right side of (2.1). Specifically,
where we have made the change of suinnlatiorl index n e n + 1.
(b) If f (z) is conlpactly supportcd, then the surri in (2.1) will converge poiritwise on R. This is because for each x the slirn will have only finitely riiaiiy terms. (c) If f (x) is supported in an interval I of length p. then f,(a.) is referred to as the period p extension of f (x). This is because for z E I. f,(x) = C,LEz f ( x + np) = f (x) since all terms in the sum besides the n = 0 tern1 are zero. (Whys?)Another way of thinking of this is that we a,re taking ilifi~iitel~ Inany copies of the fuiictiorl f (.c) and placing thein side-by-side on the real line.
28
Chapter 2. Fourier Series
FIGURE 2.1. Top Left: Graph of f (x). Top Right: Graphs of f (z + n p ) for -2 n 2 and p = 1. Bottom: Graph of the 1-periodization of f (z).
<
0,
the collection of functions (2.2)
{e2711r'T'a )nt~
ss called the (period a ) trigonometric system.
Remark 2.6.
+
(a) Recall Euler's formula: ei.' = cos(x) i sin(x). This formula can be proved by expariding both sides of the equation in a Taylor series (Exercise 2.20). Therefore
and it follows from this that each element in the trigonometric system has period a.
(b) The period a trigonometric system is sometimes given in the form
Systems (2.2) and (2.3) can be obtained from each other by forming simple
2.1. Trigonometric Series
29
linear combinations. Specifically, for n E Z,
and for n E N,
and
(c) A function that can be written as a finite linear combination of elements of the (period a) trigonornctric system is called a (period a ) t r i g o n o m e t ~ i c polynomial. That is, a trigonometric polynorrlial has the forni
for some h1,N E Z and some coefficients c ( n ) .
Theorem 2.7. T h e period a trigonometric s y s t e m ( 2 . 2 ) satisfies the following orthogonality relations:
Proof: Exercise 2.22. Remark 2.8. Note that since the functions e2"inxl" a11 have period a. the integral in (2.4) can be taken over any interval of length a. For example,
A fundamental problem in Fourier series is the following: Given a function f ( x ) with period a
> 0. can we write
,for some choice of coeficients { ~ ( n ) )? , , ~ ~ This problem leads t o three related questions that will be answered in the following subsections:
30
Chapter 2. Fourier Series
( a ) I n order ,for. (2.5) to hold, what m u s t the coeficients c ( n ) be? (b) Assuming we know the answer to question (a), in what serlse doea the s e ~ i e so n the right side of (2.5) converge? (c) Assurniny we know the answers to que.stions ( a ) and ( b ) ,does the series o n the right of (2.5) converge to f (x),or to some other funct,io*n,?
2.1.3 Let
The Fourier Coeficients
11s begin
hy answering question (a) above.
Definition 2.9.
G i v e n a function f ( x ) w i t h period a , the Fourier coefficier~ts o f f ( z ) are defined by
provided that those zntegrals m a k e sense. For example. i f f ( - r ) zs L' on [O. a ] , t h e n the integral in (2.6) converges for each 7 1 .
Remark 2.10. Tlie definition of the Fourier coefficients of a function f (x) is by no means arbitrary. In fact we are essentially forced to define them that way by the followilig a r g u m ~ n l . Sr~ppusethat in fact f(x) = C r l E c(7L) Z e2ai,1x/u. T1lei1 in light of' Theorern 2.7. for rrr 6 Z fixed,
since by (2.4), the only nonzero terni in t,he sum is the rz = m term. Note that the above argument is not a rigorous proof since we interchanged an integral and an irifiriite surn without having any idea liow or even if the slim converged. However. the argument is sufficient motivation for defining the Fourier coefficients as in Definition 2.9.
Definition 2.11.
G w e n a function f ( z ) with period a , L' o n [O, a ] , the Fourier series associated witahf ( z ) is defined as the formal series
where the c ( n ) are defined b y ( 2 . 6 ) . W e refer to (2.7) as a 'fformal series" since w e d o n o t yet know how or i f the series converges. W e write
2.1. Trigonometric Series
31
Remark 2.12. It is possible to rewrite the Fourier series of a function in terms of the real trigonometric system defined by (2.3). To see this, note that
Conversely, a series of the form
can be rewritten as
where
Example 2.13. (a) Let f (x) be the period 2 extension of the f ~ ~ n c t ~ i o n X,-1/2,1/21 ( 2 ) .The Fourier coefficients of f (x) are
=
i
0 if n is even, n # 0, 1 -( - I ) ( ~ - ' ) / ~ if n is odd, nn 1 if n = 0. 2
32
Chapter 2. Fourier Series
The Fourier series associated to f ( x ) is
See Figures 2.2 and 2.3.
(b) Let f ( x ) be the period n extension of the function x X(,,,)(x). Then
and for n
# 0,
Therefore,
f(x,
-.+,
i
C I,
e2inx =
~ E Z
-
C sin (n2 n x ) LEN
See Figure 2.4. (c) Let f ( x ) be the period n extension of the function x X(-,/2,,/2)(x). Then c(O) = 0, and for n # 0, c(n,)= ( - l ) n1;n/2n8, so t,ha,t,
(d) Let f ( x ) be the period 2n extension of the function 1x1 Xc-,,,,
1 . 4
( 2 ) .Then
Convergence of Fourier Series
Definition 2.14. A function f (x) o n a finite inten)u,l I i s piecewise differentiable o n I i f ( a ) f (x)i s piecewise continuous o n I with only j u m p discontinuities (zf a n y ) , ( b )f f ( z ) exists at a21 but finitely m a n y points in I and ( c ) f ' ( z ) i s piecewise continuous o n I with only jump discontinuities (zf a n y ) . A function f (z)i s piecewise dzfferentiable o n a n infinite interval I if i t is piecewise differentiable o n every finite subinterval of I .
2.1. Trigonometric Series
33
FIGURE 2.2. Top left: Graph of f (z) from Example 2.13(a). Top right: Graph of Fourier coefficients of f (z).Bottom lcft: Graph of f (z) from Example 2.13(b). Bottom right: Graph of absolute value of Fourier coefficients of f (z) .
Example 2.15. on I .
(a) Any function C1 on I is also piecewise differentiable
(b) If I is any finite interval, then the function Xr(x) is piecewise differentiable on any i~ltervalJ with I C J . (c) The tent function Bl(x) is piecewise differentiable on R because it is linear on the intervals (-m, -I), (-1, O), (0, I ) , and (1, GO).
(d) Any piecewise polynomial function is piecewise differentiable on R. The following convergence result is due to Dirichlet.'
Theorem 2.16.
>0
and is piecewise diflerentiable on R . T h e n the sequence of partial sums of the Fourier series (Dirichlet) Suppose that f (x)has perzod a
'The proof of Theorem 2.16 will not be given here but can be found for example in Walker, Fourier Analysis, Oxford University Press (1988), p. 19 (Theorem 4.5) and p. 48ff.
34
Chapter 2 . Fourier Series
of f (z),{SN( ~ ) } N E Nwhere ,
-
converges pointwise to the function f (z),where
FIGURE 2.3. Partial sums S N ( X )of the Fourier series f (z) from Example 2.13(a). Top left: A; = 10, top right: N = 20, bottom: N = 60.
-
Note that F ( a ) = f ( a ) if f (z) is continuous at x = a and that f ( a ) is the average value of the l e f t and right-hand limits of f ( x ) at x = a when f ( x ) has a jurrlp discontinuity. If we assume that f (z) has 110 discontinuities, then we can make a stronger statement as in the following heo or em.^
Theorem 2.17.
Suppose that f (z) has period a
>
0 and is continuous and
2 ~ h proof e of Theorem 2.17 can be found in Walker, Fourier Analysis, Theorem 4.4, p. 59.
2.1. Trigonometric Series
35
FIGURE 2.4. Partial sums S N ( Z )of the Fourier series f(x) from Example 2.13(b). top left: N = 10, top right: N = 20, bottom: N = 60.
piecewise dzfferentiable o n R. T h e n the sequence of partial s u m s S N ( X )gzuen b y (2.8) converges t o f ( z ) i n L" o n R.
What if the function f (x) is continuous but not piecewise differentiable? What can be said about the convergence of the Fourier series of such a function? It is by no means obvious that such functions exist, but they do. The most famous example is due to Weierstrass, who constructed a function continuous on R but not differentiable at any point of R. This function is defined by f (x) = CrLCN 3-n ~ o s ( 3 ~ xThe ) . Weierstrass AT-test can be used t o show that this function is continuous, but the proof that it is nowhere differentiable is hard.3 By the Weierstrass M-test, the Fourier series of the Weierstrass function converges uniformly on R. However, this is not the case for all periodic functions, continuous on R. The following theorem is due to ~ u ~ o i s - ~ e ~ r n o n d . ~ 3An example of a continuous, nowhere differentiable function similar t o the Weierstrass function, together with a very readable proof, can be found in Korner, Fourier Analysis, Cambridge University Press (1988). Chapter 11. 4 ~ x c e l l e n texpositions and proofs of this theorem can be found in Korner, Fourier Analysis, Chapter 18. and also in Walker, Fourier Analysis, Appendix A.
36
Chapter 2. Fourier Series
Theorem 2.18.
(DuBois-Reyrnoud) There exists a function f ( : E ) continuous on. R. n.nd e~rifhperiod 27r .ssuch that the Fourier serzes o f f (x) diucrgcs at .r = 0 . T h a t is, lim~,,
SN(O)does not exist where S N ( X )is given by (2.8).
In fact. it is possible to find a continuous, period 27r function whose Fourit>r s e r i ~ sdiverges at every rational nlultiple of 27r.5 Therefore, it is iinpossil~leto make the st,ateiiieilt that the Fourier series of every coiltiliuous fuilctioii coilverges pointwise t o tliat function. The ~ i e x ttheoreill, Theorem 2.19, is due to Fejitr and makes a geilera,l st,ate~nentahout the convergence of the Fourier series of a continllolls function. The idea behind Fejbr's Thcorcrn is tlie following. Instead of looking at tlw part,iixl sums (2.8), coilsider the arithmetic m e a n s of those part)ial suins; that is. coi~sicterthe sequence
It is oftell the case that when the corivergence of a sequence fails due t o oscillatioli ill the terms of the sequence, the arithmetic rrleans of the sequence will have better convergerlee behavior. Take the simple exariiple of the sequeiice { ( ~ ( n ) } ,where ~ ~ . a ( n ) = (- 1 ) " . Clearly lim,,, a ( n ) does not exist hecause the t,erms sirrlply oscillate back and forth between 1 and -1. Homrever. if mre coiisider the sequence of aritllrrletic means, { ~ ( T z ) } ~ ~ ~ . given 1 ) ~ .
so that linl,,, ~ ( 7 1= ) 0 (Exercise 2.25). If t,he original sequence { ~ ( n , ) } ,already , ~ ~ converges. taking the arithmetic means will not affect the convergence; that is, if lim,,,, a ( n ) = a. then also lim,,,, g ( n ) = a (Exercise 2.26). (Ft:j&r'sTheorerri) Let f (x) be a f7~nctionwith period n > O con,tiri,vovs o n R. and define for each 71, E N the functior~a ~ ( x by ) (2.9), where Sl;(cr) i s give77, by (2.8). T11,enCT.~T (z)converges uniformly to f (x) o n R as N -+ GO.
Theorem 2.19.
"bValkcr. Fourier .Arlal,ysis, Xppcrltlix A.
37
2.2. Approximate Identities
Exercises Exercise 2.20.
Prove Euler's formula: For every x E R, em
=
cos(x) +
i sin(x).
Exercise 2.21.
Prove that for every real nilmher o,,
Exercise 2.22. Prove Theorem 2.7. Exercise 2.23.
Prove each of the statements made in Remark 2.12.
Exercise 2.24.
Prove each of the statements made in Example 2.13.
Exercise 2.25.
Show that if a ( n ) = n
1 ~ ( n =) - x a ( k ) = n
n E N, then 0 -l/n
k=l
Exercise 2.26.
Show that if lini,,, where a ( n ) is given by 2.10.
2.2
a(n)
if n is even, if n is odd.
=
a , then liin,,,
a(n)
=
a,
Approximate Identities
The notion of an approximate identity or summability kernel is used extensively in all branches of analysis. The idea is t o make precise the notion of a "delta funct.ionn that is well known and widely used by physicists. engineers, and mathematicians. The delta function, 6(z),has the property that for any continuous function f (x),
or more generally,
f (t) d(x
-
t) d t = f (4
for. every z E R. From sorrle elernenlary considerations he reader. may fill in the details), any function b(t) satisfying (2.11) must satisfy, b(t) = 0, t
#0
and
b(t) dt = 1.
38
Chapter 2. Fourier Series
It is impossible for any ordinary function to satisfy these conditions since the Riemann integral of a function, f (z), vanishing at every 17: # 0 must be zero. This must be true even under more general definitions of the integral (such as the Lebesgue integral). Therefore, 6(t) is not an ordinary function. So the question remains: How are we t o make sense of this concept? There are two ways to do this.
1. Extend the definition of function. This has been done by L. Schwartz who defined the notion of a distribution or generalized f u n ~ t i o n . ~ 2. Approximate the delta by ordinary functions in some sense. This more elementary approach has its natural completion in the theory of distributions alluded to above, but can be understood without any advanced concept,s. The idea is t o replace the single "function" d(t) by a collection of ordinary functions {KT(t)},>" such that for every continuous function f (z),
and more generally,
where the limit is interpreted in some sense and described in Section 1.2. The purpose of this section is to explain the theory of approximate identities.
2.2. I
Motivation from Fourier. S e ~ i e s
In order t o further motivate the notion of an approxirrlate idenlily, let us consider how one might prove Theorems 2.16 and 2.19.
Definition 2.27. For each lc E N , and a > 0, define the Dirichlet kernel Uk (x) b y k
D&)
=
C
,2nimx/a.
(2.12)
m=-k
See Figure 2.5. good exposiliorls uf this theory can be found in Horvath, A n introduction t o distributions, The American Mathematical Monthly, vol. 77 (1970) 227-240, and Benedetto, Harmonic Analysis and Applications, CRC Press (1997).
2.2. Approximate Identities
Theorem 2.28. For each k
E
39
N , and a > 0, the Dirichlet kernel, D k ( x ) , can
be written as Dk(x)=
+
sin(-ir(k l ) x / a ) sin(~x/a) '
and for any period a function f ( x ) ,
Proof: Eqlxakion (2.13) is a,n exercise (Exercise 2.38) and reqi~iresonly t,he formula for summing a geometric series. As for equation (2.14),
Sk ( r )
=
C c ( n )e2K"x/U
The result follows by making the change of variables t F+ x - t in the above int,egral a.nd remembering t,ha.t bot,h Dk (a)a.nd f (z) ha.ve periocl a.
Definition 2.29.
For each n E N , and a > 0, define the Fejhr kernel F,(x)
bY n- 1
CD~(X).
1 ~ ~ (= x ) k=O
(2.15)
See Figure 2.6.
Theorem 2.30. For each n E N , and a > 0 , the Fej6r kernel, F,(x),can he written as
and for any period a function f ( x ) ,
:La
a ( n )( x ) = -
f
(X -
t ) F,(t) d t .
Proof: Equation (2.16) is an exercise (Exercise 2.39) and requires only the formula for summing a geometric series and some manipulation.
40
Chapter 2. Fourier Series
FIGURE 2.5. The Dirichlet kernel D k ( x ) (2.14) for a
= 1.
Equation (2.17) is also a,n exercise (Exercise 2.40), ancl the derivation is similar to (2.14).
17
From Theorems 2.28 and 2.30, we see that the proofs of Theorems 2.16 (Dirichlet) arid 2.19 (Fejer.) amoul~tto sliowing that
pointwise for every period a f~inct~ion f (z),piecewise differentiable on R, ancl
in LDO on R for every period a function f (x) continuous on R. Such convergence results depend on properties of the sequences { D k ( ~ ) ) k E and N { p ? 2 ( ~ ) ) 7Consideration 26N. of tJhe required properties of these sequences leads to the notion of an approximate identity or summability kernel.
2.2.2 Definition and Elcamples Definition 2.31. A collection of functions {Kr(x)),>o o n a n interval I
=
( - a , a ) ( a = m is permitted) is a n approximate identity or a summability kernel
2.2. Approximate Identities
20
i
i
i
i
i
i
i
i
i
'
O
i
i
i
i
i
I s . ~ ~ j ~ - - j - - - ~ - - i - - - ; - - - ~ - - j - - - ; . . . ~,s-.-j---i-..;..;...j...;.. ..l l l l 16--.]---+...t..]...i...t.--
l
l
I I--
I
I
I
I
I
I
I
I
I
I
I
I
I
I
l
l
l
I
l
l
l
6~~~d~-~l...L..A...l..-l~~~l...l...l...
- 0 5 0 . 4 - 0 3 -0.2 -0.1
0
I 1
l
l
l
01 02 03 04 05
I
I
I
I
I
-05 - 0 4 0 . 3 - 0 2 - 0 1 0
i I
01
i
A
;' :1 . -
i --
41
i
-
-. .
0.2 0.3 0 4 0 5
FIGURE 2.6. The Fej6r kernel F,,(x) (2.14) for
a =
1.
on I if the following conditions hold. (a,) For all
7
> 0,
L
KT (x)d z = 1 .
( b ) There exists M > 0 such that for all
7-
> 0,
S_:
/ K T ( x )dl x 5 M
( c ) For every 0 < 6 < a ,
IKT( x ) 1 d x = 0.
lim ~--t'+.
S 0. T h e n there i s a function g(x), C: o n
llf ( b ) Let f (x) be L~ o n R, and let R such that
E
-
9111 < E .
> 0. T h e n there i s a function g(x), C: o n
Ilf
-
9/12 < t.
Proof: (a) By Theorem 1.5, there is a compactly supported function h(x), L1 on R, such that 11 f - hill < €12. Now, let
> 0 (SCC
Examplc 2.32(b)). Thcrl { K T ( ~ ) ) T > isoan approximatc idcntity on R. By Theorem 2.36(a), T
lim h, (x)= lim r+o+
7+0+
h(t)K T ( x - t)dt = h(x)
47
2.3. Gcncralizcd Fourier Series
in L1 on R. Hence there is a 70 > 0 such that Ilh, - hill < ~ / 2 .Let g(x) = h, (x). That g ( x ) is compactly supported follows from Exercise 3.25, and that g (x) is C0 on R follows from Theorem 3.18. The proof of (b) is similar (Exercise 2.44).
Exercises Exercise 2.38. N-1 r n
'1
CrL=O
-
Prove equation (2.13). (Hint: Recall that for any number
1-r N
~
-
1
Exercise 2.39.
Prove equation (2.16).
Exercise 2.40.
Prove equrttiorl (2.17).
Exercise 2.41.
Prove each of the statements made in Example 2.32.
Exercise 2.42. Prove that if f (x) is continuous at x = a:then there is a > 0 such that If (xjl 5 M for all x E [a - b, a 61. b > 0 and a number
+
Exercise 2.43. (a) Prove that i f f (x) is C: on R, then f (x) is uniformly continuous on R. (b) Prove that if f ( x ) is C0 on R and limlslj30 f (x) uniformly contir~uouson R.
=
0, then f (z) is
Exercise 2.44. Prove Corollary 2.37(b).
2.3 Generalized Fourier Series 2.3. I
Orthogonality
Definition 2.45.
A collection of functions { g n ( ~ ) ) , E L~ ~ , o n a n interval I is a (general) orthogonal system o n 1 provided that (a) / gI n ( X ) S r n ( x ) d x = O L f n j i m , and
Part ( b ) says i n particular that none of the gn(x) can be zdentically zero. Th.e collection { ~ , ( x ) ) , ~ Nis a (general) orthonormal system o n I provided t l ~ u ti L ,is ur1 orthogonal system o n I and
48
Chapter 2. Fourier Series
It is not nccessary that the set {gn(x)) be indexed by N, and in fact we have seen an example (the trigonometric system) that is indexed by Z. In all future examples, the index set will either be specified or will be clear from the context. Whenever a generic system of functions is considered, the index set will be assumed t o be N.
Remark 2.46.
(a) Any orthogonal system can be normalized so that it becomes an orthonormal system. Tha,t, is, if {gn(.x)) is a11 orthogonal system, then we may define the functions
Then the system {&(x)) is an orthonormal system. (b) The Cauchy-Schwarz inequality guarantees that each of the integrals in Definition 2.45 exists as a finite number. That is, since f (x) and g(x) are L2 on I,
(c) Throughout t,he hook, we will use inner product notation to represent the integrals in Definition 2.45. That is, we write for any functions f (x), and g(x) L2 on I,
This means in particular that
Example 2.47.
(a) Given any a
> 0, the collection
is an orthogonal system over [-a, a ] . I t is also orthogonal over [O,2a]and in fact over any interval I of length 2a. The collection
is an orthonormal system over [-a, a ] . I t is also ort'honormal over [O, 2a] and in fact over any interval I of length 2a.
2.3. Generalized Fourier Series
(b) Given any a
49
> 0, the collections
{sin(-irnz/a)},~~
and
{cos(~nz/aj},~~
are each orthogonal systems over [-a, a ] . The collections
arc each orthonormal systems over [ a , a ] . (c) Given a > 0, the collection
is an orthogonal system over [0, a], and in fact over any interval I of length a. The collection
is an orthonormal systcm ovcr [0,a], and in fact ovcr any intcrval I of lcngth a.
2.3.2
Generalized Fourier Series
Definition 2.48.
Given a fun,ction f (z), L 2 o n a n interval I , and a n orthonormal system {gn(x)) o n I , the (generalized) Fourier coefficients, {c(n)) of f (x) with respect t o {g,(x)) are defined by
The (generalized) Fourier series o f f (x) with respect to {g,(z)) is
The fundamental problem is to determine under what circumstances the in the above definition becomes a "=" and, if so, in what sense the infinite series on the right side of the equality converges. It turns out that the most convenient form of convergence in this case is L~ convergence on "N"
T. Theorem 2.49. {y,(z))
(Bessel's inequality) Let f (x) be L 2 o n a n interval I , and let be wrt, outhonownul sysLe,ln U,IL I . T J L ~ ~ I L
50
Chapter 2. Fourier Series
The proof of Bessel's inequality will require the following lemma. Lemma 2.50. Let { g , (x)} be an, orthon.orm.al system, on, o,n, in,ten~o,lI . Then for every f (x),L~ on I , and every N E N ,
Proof: The proof is just a calculation making use of thc orthonormality of
{gn(x>>.
+
C C (f'g n ) (f.Smj
n=l m=l
I
~n ( 2 )gm
( x )d:r
which is (2.28).
Proof of Theorem 2.49: Let f ( x ) be given, and let {gn(x)) be an orthonormal system. Then by Lemrrla 2.50, for. each fixed N € N,
2.3. Generalized Fourier Series
51
Therefore, for all N E N,
Since I (f, g,) 1' > 0 for all n , the partial sums of the series CncN( f ,gn) l 2 form an increasing sequence bounded above by JI 1 f (x)I2d x . Thus the series CnEN 1 ( f , gn)I2 converges so that we can allow N to go to infinity Thus;
which is (2.27). Closely related to Lemma 2.50 is another very important inequality that will be very useful in the next subsection.
Lemma 2.51. Let { g , ( x ) } be a n orthonormal system o n I . Then for every f (x),L~ o n I , and every finite sequence of numbers { a ( n ) ) ~ =, ,
Proof: Let f ( x )be given, and let { g n ( x ) )be an orthonormal system. Then
52
Chapter 2 . Fourier Series
by Lemma 2.50.
We are now in a position t o answer the fundamental question about Fourier series, namely: When is an arbitrary function equal to its Fourier series and in, ~r~h,nf; sense does t h t Fourier series converge? The answer lies in the notion of a complete orthonormal system.
Definition 2.52.
Given a collection of functions { g , ( z ) } , L~ o n a n interval 1 , the span of {g,(x)), denoted span{g,(x)), is the collection of all finite linear combinations of the elements of { g , ( x ) ) . In other words, f (z) E span{g,(x)) zf N and only i f f ( x ) -a ( n )y n ( x ) for some finite sequence {a(n)};=, . Note that N is alwal~sfinite but m a y he arbitrarily large.
E x a m p l e 2.53. (a) Let PI denot,e t,he set of all polynomials on the interval I. Then PI = span{xn),"==,. (b) span{e-2TznLjn,,Z is the set of all period 1 trigonometric polynomials. ( c ) Lel p(z) = (1- 1x1) X r l , l l ( x ) .Then span{cp(x - n)InEz is the set of all functions that are ( i ) continuous on R, (ii) linear on intervals of the form
[n, n
+ l ) ,n E Z, and (iii) compactly supported.
R e m a r k 2.54. (a) For any collection of functions {g, (x)} , span{gn (x)} is a linear space: that is, it is closed under the formation of lincar combinations. Specifically, if {f,(x))E=, & span{g,(x)), then for any finite seN quence { a ( n ) ) L , the function f (x) = a(n)f, (x) is in span{gn (z)) (Exercise 2.61). (b) The definition of span involves only finite sums. Without additional assumptions on the collection {g,(x)}, there is no guarantee that any sum of the form CnEN a(n) g,(x) will converge in any sense. For example, if g,(x) = xn-l for n E N, then the series n! x r V o e s not converge except at x = 0, and the series Cr=o 2-n x n does not convcrgc if 1x1 2. See also Theorem 2.55 below.
E,"&
>
(c) Related to the notion of span{g,,(x)) is the notion of the mean-square (or L') closure of span(g,(x)), denoted @ZTi{g,,(x)} which is defined as follows. A function f (x) E SjZZTi{g,(x)) if for every E > 0, there is a function g(z) E span{gn(x)) such that 11 f - gllz < E .
2.3. Generalized Fourier Series
53
As a partial answer to the question of when finite sums can be replaced by infinite sums, we have the following theorem. Theorem 2.55. Lct { g , , ( x ) )bc a n orthonormal s y s t e m on a n zr~tervalI . T h e n a function f (z), L~ on I , i s in SjZiiT{g,(z)) i f and only i f
Proof: (+)
(2.29) is equivalent t o the statement that
Therefore, given
E
> 0,there
is an N
> 0 such that
and f (x) t w { g n (x)).
(==+) Suppose that f (x)E m { g , (z)), and let c > 0.Then by definition there is a finite sequence {a(n)):l,, some NO t N, such that
Since
is a, decrea,sing secliience (Exercise 2.63), it follows t,hat for every N
> No,
54
Chapter 2. Fourier Series
and (2.29) follows. If every function L2 on I has a representation like (2.29), then we say that the collection { g , ( 2 ) )is complete on I. This means that every furiction L~ 011 I is equal t o its Fourier series in L2 on I.
Definition 2.56. Let {g,,(x)) be a n orthonormal system o n I. T h e n {g,(x)} is complete o n I provided that e v e q function f (x),L' o n I, is i n m { g , ,( x ) } . A complete orthonormal system is called a n orthonormal basis. The next theorem gives several equivalent criteria for an orthonormal system to be complete.
Theorem 2.57.
Let {g,(x)} be a n orthonormal system o n I. Then, the ,following are equivalent.
(a){gn(x)) is complete o n I ( b ) For ever9 f (x),L ' on
I,
( c ) Every f u n c t i o n f (z), C: o n I , i s i n span{g,,( z ) } .
(d) For every function f ( x ) , C: o n I,
Remark 2.58. (a) Note that Theorem 2.57(c) is precisely the definition of completeness but with C: functions replacing more general L2 functions. It is often easier t o work with cont,inuous compactly supported filnctions, and the theorem states that this is sufficient.
(b) Theorern 2.57(d) says that Bessel's inequality is an equality for conlplete orthonormal systems. This eyualily is referred t u as Pluszcherel's FOI-mula.
Proof of Theorem 2.57: (a) of Theorem 2.55.
(b). This follows exactly as in the proof
(a) + (c). This follows immediately from the fact that every furiction C: on I is also L2 on I. (a) +== (c). Let f (a)be L~ on I, and let E > 0. Then by Corollary 2.37 there exists a function g(x), C: on I such that I f - 9112 < 612. By (c).
2.3. Generalized Fourier Series
55
there exists N E N such that
Applying Minkowski's inequality, we obtain
Therefore,
so that f (x) E span{g,(x)) and (a) follows. (c)
* (d). By Theorem 2.55, (c) holds if and only if
for all functions f (x) C: on I. But by Lcmma 2.50,
Therefore, (c) is equivalent t o the statement that
and (d) follows. To illustrate an application of this theorem, we will prove the following result about trigonometric Fourier series.
Theorem 2.59.
T h e trigonometric system {e2""x}nEz is complete o n [ O , l ] .
Proof: Wc will usc Thcorcm 2.57(c). To that end, let f (x) be continuous on [0,1] (note that it is also compactly supported), and let E > 0. By
Chapter 2. Fourier Series
56
Exercise 2.65, we call find a functioil f ( x ) that has period 1, is C' on R, and such that - 7112 < t/2. By %er's Theorem (Theorem 2.19), G N ( z ) converges in L" on R t o as N i m, where
If
S(2)
and
1
j ( t ) e-2Tmxdz.
c(n)=
By Theorem 1.38(a), ZN(z) also converges t o Note that
-
f (z) in L~ on [0,11.
(Exercise 2.66). Therefore, for N large enough
arid by the triangle inequality,
But the function
2
(1
-
y)
11") 1.2"znx
n=-N
is in
s p a n { e 2 ~ z r ~ z }nEZ
Hence, f (z) is in span{e2Tinz}n,Ezand by Theo-
rem 2.57(c), the trigonometric system is cornplete on [O, I].
Exercises Exercise 2.60. Prove that if {g,(x)} is an orthonormal systerrl on an interval I and if {a(n));=, is any finite sequence of numbers, then
2.3. Generalized Fourier Series
57
Exercise 2.61. Prove that if {g,(x)) is any systerrl of L2 ~ U I I C L I O I ~Llle11 ~, span{g,(x)) is a linear space (that is, it is closed under the formation of linear combinations, see Remark 2.54 (a)). Exercise 2.62.
f (x)E
Prove that if {g,(x)) is any system of L2 functions, then
(x)) if and only if there is a sequence of fiinctions {fk(z)}
such that fk(2) E span{g,(x)) and such that limk,, 11 f - f k ( I 2 = 0. (Hint: For the "only if" direction, choose f k ( x ) E span{g,(x)) such that 11 f
-
fkll2
< Ilk..)
Exercise 2.63. Prove that if {g,(x)} is an orthonormal system on an interval I, then for any f (x), L2 on I, the sequence
is
il,
decreasing sequence. (Hint: Use Lemma 2.50 or 2.51.)
Exercise 2.64. Show that if {g, ( x ) ) is a complete orthonormal system on an interval I, then (2.30) holds for every f (x), L' on I . Exercise 2.65. Let E > 0, and let f (x) be C0 on [0, 11. Then there is a function f(x) that has period 1, is C0 on R, and such that (J: 1 f d ~ ) ' / 0 sufficiently small, you can construct r ( x ) by modifying f jx) only on the interval [l - 6, I ] , and then extending periodically.)
fi2
Exercise 2.66.
Prove equation (2.31).
Exercise 2.67. Let {g,(x)) be a complete orthonormal system on an interval I. Show that the Fourier series of any function f (x), L2 on I, can be integrated term-by-term in the following sense. For any numbers a < b, such that [a,b] I,
(Hint: If the sum converges in 1,' on I , it converges in L2 on [a,b]. Then use Theorem 1.40(c).)
Chapter 3
The Fourier Transform 3.1 Motivation and Definition We have seen that if f (x) is a function supported on an interval [-L. L] for some L > 0, then f (x) can be represented by a Fourier series as f
=
C
e27Mx/2L)
where
l
2L c(n)= -
S_, f ( t ) L
e-"7t(n/2L) d t .
,"
(3.1)
Of course the Fourier series actually equals the 2L-periodization of f (x) (Figure 3.1). What happens to this representation if we let L + co? In order to answer this question, define for each L > 0 and each integer
- 1;
n the number
T((nl2~)
f (t) c-~""('"'")
dt: A
so that f^(n/2L) = (2L) c ( n ). If we were to plot the numbers { f (n/2L) inEz for very large values of L , then the resulting graph would begin to resemble a function of a continuous variable on R (Figure 3.2). This function would naturally be defined by
In addition, we could also write
for large L, since the last sum is a Riemann sum for the last integral. Therefore, we have formally established the duality
fiY)
/
R
f(t) e
dt.
f (x)
-
~ ( 7e2nzx7 ) d7.
(3.3)
The discussion in the remainder of this chapter will focus on two general questions: In what way are the properties of f (x). for example, continuity.
60
Chapter 3. The Fourier Transform
FIGURE 3.1. 2L-Periodizations of a function f (x) assumed to be supported in [-L. L]. Top left: L = 1, top right: L = 2 , bottom: L = 4.
differentiability. int,egral)ility or square-integral~ility. reflect ecl in t h~ corresponding properties of f (y)? and What properties lrl~lstf (s) and f (7) satisfy in order for the *.N" in (3.3) to be replaced by "="'! Let us first make a defiiiitio~~. A
A
Definition 3.1. The Fourier trarlsforrn o f n function f (z). L' on R. i s also n function o n R, denoted f^(-y) defined by
Remark 3.2. The assumption that f (x)is L1 oil R is rrlade in order to ensure that the integral in (3.4) converges for each riuinber 7 . This convergence liolds by virtue of tlie fact that for each 2 E R, we call establish a Cauclly condition on the numbers
La 0
d.r
and
s,
=
f (.r)c - ~ " ~ 'd. ~ . :
cr.
>0
3.1. hlotivation and Definition
61
FIGURE 3.2. Fourier coefficirrrts for the functions graphed in Figure 3.1. Note how the graphs of the sequences begin to resemble the graph of a continuously defined function. as follows. If b
> u > 0, then
63
0. then
A
Exercise 3.6. If f ( z ) = e-"lzI, a
2a > 0, then f (y) = (a,+(2x712) .
= ~ c ( " ~ ) ~(Hint: / ~ Exercise 3.7. If f (x)= ePaz2, a > 0, then f^(?) See, for example, Kammler, A First Course in Fourier Analysis, PrenticeHall (2000) p. 132-133 for the a = 7r case.)
3.2 Basic Properties of the Fourier Transform In this section, we will present two basic properties of the Fourier transform of an L1 function. A
Theorem 3.8. I f f (z)is L' on R, then f (?) is unzformly continuous on R. Proof: Given 71: 7 2 E R,
1fix)l
e-2""~1-~z)x
-
1( dx.
Note that the last term depends only on the difference yl - 7 2 and not on the particular values o l yl a ~ 7~2 . dHer~ceto show unzform continuity on R. it is enough to show that P
We will use Theorem 1.41 t o do this. Since 1 - 1 15 2 ,
If
(x)1 le-2"iax
-
11 5 2 If
(XI/,
and 2 1 f (z)l is L1 on R since f (x)is. By Taylor's Theorem, given any A > 0,
for some [ t [-A, A] and all x t [-A, A].Therefore,
.
64
Chapter 3. The Fourier Transform
for all x E [-A, A]. Therefore, A
1 f (z)J/e-'""""
A
-
11 dx
5 5
Thus, for every A by Theoreni 1.41,
> 0, ( f (x)1 e-2""x
-
lirn 27r)al
a!-0
lim 2 ~ A l a l lfJ111 = 0.
a-0
11
+ 0 in L~ on
[ - A , A ] ,so that
The next Theorem is known as the Riemann-Lebesgue L e m m a and describes the decay a t infinity of f (7). A
Theorem 3.9.
(Riemann-Lebesgue Lemma) I f f (x) is L'
on
R, then
A
IY
lim f ( 7 ) = 0. l--too
Proof: We will present an outline of the proof. The details are left to the reader in Exercise 3.10. S t e p 1. Show that if f (x) = X [ a , b l ( ~ )then , (3.6) holds. This can be done by direct calculation. S t e p 2. Show that if f (x) is a step function of the form
for some coefficients c(n) and intervals [a,, b,], then (3.6) holds. Step 3. Show that if f ( x ) is C: on R, then given E > 0, there is a step function g(x) of the form (3.7) such that 11 f -g(Il < t. Then show that this implies that (3.6) holds for f (x) (cf. Exercise 5.26). S t e p 4. Show t,hat (3.6) holds for any function f (x), L' on R.
Exercises Exercise 3.10. Complete the proof of Theorem 3.9. (Hint: For Steps 3 and 4, use the estimate if^?) 1 5 11 f - 9/11.)
ls(-y)l+
3.3. Fourier Inversion
65
3.3 Fourier Inversion The purpose of this section is to investigate the conditions under which equality holds in (3.3). From the definition of the Fourier transform, we can write
where we have exchanged the order of integration in the double integral. This formal calculation is not valid, strictly speaking, because the integral
does not converge for any particular value of x or t. Nevertheless, this calculation provides a starting point for investigating (3.3). The idea will be to place a "convergence factor" in (3.9) so that it converges for each value of x and t ; that is, we write instead of (3.9),
A
for some function K ( x ) chosen so that its Fourier transform, K ( y ) , forces the integral in (3.10) t o converge and so that equality holds in (3.3) for
K(x). We now obtain
If K ( t ) is some element in an approximate identity, then
which gives us a valid approximate inversion formula for the Fourier Transform. It orlly relllairls lo choose arl approximate identity satisfying the required conditions. There are many valid choices, but a very convenient one is to let ,-.
In this case KT(?) = epm27' (see Exercise 3.7). and the same calculation shows that equality holds in (3.3) in this case, that is, that
66
Chapter 3. Tlie Fourier Transform
It is also easy to see that K,(z) is L' on R for each 7 > 0 and that {KT( x ) ) , , ~ is an approximate identity on R (Example 2.32(c)). Now we are in a position to prove the following theorem.
Theorem 3.11. If f (x) is C O and L1 on R, then for each z
E
R,
Proof: Repeating the calculation in (3.8), we obtain
But since {KT(x)),>o is an approximate identity on R, Theorem 2.33 says that
dt
=
f (4,
for each x E R . A
With an additional assumption on f (y), we can get equality in (3.3) in a pointwise sense.
Corollary 3.12. V f (x) is
SR
for each x 1: R,
CO
and L' on R, a n d iff^(?) i s L' on R, t h e n
f^(y) e2Ti7xdy = f ( : r ) .
(3.14)
Proof: By Theorem 3.11, it will be enough to show that
2
2
But since lim,,o+ e-"' 7 = 1, the proof amounts to justifying the interchange of the limit and the integral in (3.15). This is accomplished using Theorem 1.41 in a similar way to the proof of Theorem 3.8. We leave the details as an exercise (Exercise 3.14). Corollary 3.12 does not cover all the cases that will be of interest to us in this book. For example, in Example 3.3(a), we saw that if f (x) = X~-,,,](x), A
then f (?) = s i n ( 2 ~ a x ) / ( ~ c cIn) . this case, f (x) is L1 but is not continuous, and f^(x) is not L1, though it is L~ (Exercise 3.15). Therefore, neither
3.3. Fourier Inversion
67
Theorem 3.11 nor Corollary 3.12 apply. The answer is to replace pointwise convergence of the limit in (3.13) with L2 convergence. In this case, we have the following theorem.
Theorem 3.13.
If f (z) is L' and L2 on R, and ifT ( y ) i s L' on R, then liln 7+0+
JR~ ( yeTT272 ) e Z n a ydxy = f (x),
(3.16)
Proof: Since f (z) is L1, Theorem 3.18 implies that the function
is continuous on R and Theorem 3.21(a) implies that f T ( x ) is L1 for each r > 0. Since f (z)is L2, Theorem 2.36(b) says that f,(x) i f (x)in L2 on R as r i o+. Therefore,
Since f T ( x ) satisfies all of the hypotheses of Theorem 3.11, it follows that
) e2ni7xdy =
rw
2 y 2 f (?) e2rr-yr A
and (3.16) follows.
d 71
n
Exercises Exercise 3.14.
Complete the proof of Corollary 3.12.
Exercise 3.15.
Prove that the function f (x) = s i n ( ~ x ) / ( ~isz )L2 on R
but not L1 on R.
68
Cl~aptel-3 . The Fourier Transform
3.4 Convolution Definition 3.16.
G i v e n functions f ( x ) and g ( x ) , t h e convolution o f f (x) and g(x), denoted h ( x ) = f * g ( x ) , is defined by
,tuheneve7. the integral m a k e s sense.
Remark 3.17. (a) We have encountered integrals like (3.17) before, namely in thc dcfinition of approxirnate identity. There it was shown that under specific hypotheses on f (x), the integral
is a good approxiirlatiori to f (x) as long as {K,(z))T>o is an approximate identity.
The above observatioil can provide good insight into the action of convolution. Take, for example, the approximate identity defined by (11)
(Example 2.32(a)). In this case, we can see that for any function f ( x ) , the value of f * KT(xo)is just the average value of f (z) on an interval of length T centered at xo. If f (x) is continuous, then these averages are good approxirrlations to the actual point values of f (z). If we consider K T ( z ) = (llr)(1- I x ~ / T ) X,-,,,](x), T > 0, then f * KT(xo) can be interpreted as a "weighted average" of f (z) around the point xo, where points close to zo are given more "weight" than are points further from zo. Thus. the convolution f * g(x) can be interpreted as a "moving weighted average" of f(x),where the "weighting" is determined by the function g(x). See Figure 3.4. By changing variables, it can be shown that convolution is commutative, that is, that f * g(z) = g * f (x) (Exercise 3.22). Then f * g(x) can also be iriterpretecl as a nloving weighted average of g(x),where the weighting is determined by the function f (x). (c) If the function f (x) has large variations, sharp peaks, or discontinuities. then averaging about each point x will tend to decrease the variations, lower the peaks, arid smooth out the discontinuities. In this sense, convolution is often referred t o as a smoothing operation. A more precise statement of this idea is contained in Theorems 3.18 and 3.19.
3.4. Convolution
69
FIGURE 3.4. Illustration of convolution. Top left: Graph of f (x). Top right: Graph of g(x). Bottom: Integral of the product of the solid and dashed function is f * g(1).
Theorem 3.18. If f(x) is L" o n R, and volution f * g(x) is continuous o n R.
zf
g(x) is L' o n R, t h e n the con-
Proof: Given z, y E R,
If * 9 ( x ) - f * 9(y)l
=
li,f s S ,If
=
5
If llm
(t)(9(x - t )
- 9(y -
t))dt
(t)l Ig(x - t ) - 9(y - t)l dt
1 R
9 ( t - ('
-
Y))
-
9(t)l d t .
By Lemma 2.35(a) (continuity of translation for L' functions), lim
if * g(x) f * g(y)I 4 IIf I , -
X+Y
and the result follows.
lim X+?/
j
R
lg(t - (z - Y ) )
- g(t)l dt =
0%
70
Chapter 3. The Fourier. Trarlslurm If f (z)and g ( x ) are both L2 o n continuous o n R.
Theorem 3.19. f
* g ( x ) is
R, t h e n the convolution
Proof: Let e > 0. Then given x, y E R,we calculate as above, but this timc using thc Cauchy-Schwarz inequality,
(1I f
-
(
Idt)
'(
1' 2
9
-
(x - Y ) ) - li(tl12 dt)
R
By Lemma 2.35(b) (continuity of translation for L~ ~ u I I c ~ ~ o ~ ~ B ) ,
and the result follows. We have seen that the convolution of a bounded function with an integrable function and the corlvolutiorl of two L~ functions produces a continuous function. The next theorem addresses the issue of the decay at infinity of a convolution.
Theorem 3.20. ( a ) I f f( x ) and g(z) are both L1 o n
on
R,then the convolution f * g ( x ) is also L'
R,and
llf * 9111 2 Ilf 111 119111. (b) If f ( x ) is L1 o n R,and g ( x ) is L' o n R,t h e n the convolution f L2 o n
(3.18)
* g ( x ) is
R,and llf * 9112 5 llf 111 I d s
( c ) I f f( x ) and g ( x ) are both L2 o n
(3.19)
R,then the convolution f * g ( x ) is Lw o n
R,and
[If * d m 5 Uf ( d ) If f (x) is L"
on Lm o n R, and
R,and
g(z) is
112
119112.
L1 o n R, t h e n the convolution f
IIf * gIIm 5 IIf IIm IlgII1.
(3.20)
* g ( x ) is (3.21)
3.4. Convolution
71
Proof: We will prove (a) and (b) and leave (c) and (d) as exercises (Exercise 3.24). (a) Let f (x) and g(z) be L1 on R. Then
and (3.18) follows. (b) Let f (z)be L1 on R, and g(x) be L~ on R. By the Cauchy-Schwarz inequality,
Therefore,
and (3.19) follows.
Theorem 3.21.
(The Convolution Theorem) If f (x) and g(z) are L' o n R,
then fG-ig(7) =
37).
Proof: Let, f (z) and g(x) be L1 on R. Then
(3.22)
72
Chapter 3. The Fourier Transform f ( t )c~(z- t) e - 2 T i ~dt x dz =
/R IR
Exercises Exercise 3.22. 9
Show that if f ( x ) and g(x) are L1 on R, then f
* g(z) =
* f (4.
Exercise 3.23. Show t h a t under the hypotheses of Theorems 3.18 and 3.19, f * g ( z ) is actually uniform,ly continuou,s on R. Exercise 3.24.
Prove Theorem 3.20(c) and (d)
Exercise 3.25. (a) If f (x) and g ( z ) are compactly supported and L1 on R, prove that f * g(x) is also. (b) If f (z)and g ( z ) are compactly supported and L%n R, prove that f * g(x) is also.
3.5 Plancherel's Formula Theorem 3.26. (Plancherel's Formula) Iff (i-) is L' and L' o n R, t h e n f^(y) i s also L~ o n R and (3.23)
Proof: Define f(z) = f (-z). Then
3.5. Plancherel's Formula
73
where we have made the change of variable x e -z in the last step. Since f (z) is L1 and T , ~on R, so is By the Convolution Theorem (Theorem 3.21),
F(r).
A
A
f * .f'((ri
l2
= F(0)5(7)= I~^((T)) *
y(n:) *7(x)
Since f (z) and are both L1 on R, Theorem 3.20(a) implies that f *F(r) is also L' on R, and since f (x)and f x ( ( r ) are both L~ on R, Theorern 3.19 implies that f is continuous on R . Therefore, we can apply the Fourier inversion formula (3.13) and conclude t,hat for each x E R,
=
IR
f (t) f(x
- t) d t
Evaluating the above equality at z = 0 gives
It remains only to show that in fact,
We will do t,his in t,wa steps.
Step 1. We will show that
then
f^((y)
is L~ OII R by showing that if
74
Chapter 3. The Fourier Transform
contradicting (3.24) in light of the assumption that f (x) is L2. If (3.25) holds, then given any number 111 > 0, there exists a number A > 0 such tl~at
d m ) .But
whenever r > 0 is small enough (specifically, if 0 < r < this is exactly the meaning of (3.26). Therefore, f (y) is L~ on R. A
Step 2. Since f^(?) is L' on R, lf^(y)12is L1 011 R. We leave it as an exercise (Exercise 3.29) to prove that
From this, (3.23) follows.
A related result is the following formula. Theorem 3.27. R, t h e n
(Parseval's Formula) Iff (z) and g(x) are both L' and L~ o n
Proof: Exercise 3.30. One easy consequence of Theorem 3.26 is to sinlplify the statement of the L v o u r i e r inversion formula (3.16). Specifically, we no longer need to state explicitly the llypothesis that f ^ ( ( ~ is ) L2 on R since by Theorem 3.26 tjhis is a,i~t,orna,t,ic given the assumption that f (x) is L1 and L2 on R. Theorem 3.28.
(Theorem 3.13) I f f (z) is L' and L%n R, t h e n
3.6. The Fourier Transform for u unctions
75
Exercises Exercise 3.29. Complete the proof of Theorem 3.26. (Hint: Use Theorem 1.41 and Corollary 3.12.) Exercise 3.30. Prove Parseval's Formula (Theorem 3.27). (Hint: Consider the function g(x) = g ( - r ) , and repeat t,he a,rgllment in the proof of Theorem 3.26 with appropriate modifications.)
Exercise 3.31.
Prove that sin2( t )
dt = T ,
where the first integral is interpreted as
k i sin(t)
dt
= 2 1im )'+a2
sin ( t ) -dt
t
since sin(t)/t is not an Ls function. Hint: Prove the first equality by integrating the second integral by parts, and prove the second equality using Plancherel's Forrxlula and Example 3.3(a,).(See Benedetto, Harmonic AuaI.ysis and Applications, p. 25.)
3.6 The Fourier Transform for L2 Functions Until now, we have been rnaking the assunlption that a function f (x) must be L1 on R in order for its Fourier transform to be defined. This assumption was rnade in order t80gi~a~rantee that the integral in (3.4) converges absolutely for each y. However, we have seen examples that suggest that we need to expand the definition to a larger class of functions. Specifically,
y(?)
f ( L ) is L1 on R, but is not, and in order for equality to hold in both parts of (3.3), we would like to be able to make the statement that if
that is, that
The question is: How do we interpret the integral in (3.29) since it does not converge absolutely?
76
Chapter 3. The Fourier Transform
We have sccn the answer already in Theorem 3.28, which asserts in this case that
in L~ on R. That is, we interpret the nonconvergent integral (3.29) as a limit (in the L~ sense) of convergent integrals. The remaining question is: C a n we do this with any L2 function? The answer is "Yes," but the proof of this assertion is beyond the scope o l this book and involves knowledge of the theory of Lebesgue measure and the Lebesgue integral. We state the relevant theorem for completeness. C L T ~function f (x),L~ o n R, there exists a function Theorem 3.32. Gi~uer~ ,f^(?), L~ o n R (in the sense of Lebesgue), such that
iirn 7+0+
(Z)e-"'
2
2
' e2"77x d s = f ( i )
(3.31)
zn L~ o n R. I n this case, Plancherel's formula holds; that is,
and the Fourier inziersiorz holds in the sense of Theorem 3.28; that is,
3.7 Smoothness versus Decay One of the basic principles of Fourier Transform theory can be loosely stated as follows: T h e smoother f ( z ) is, the more rapidly f^(?) will decay a t infinity, and conversely, the more rapidly f decays at infinity, t h e smoother f (y) will be. There are many ways to measure the smoothness of a given function f (s), but for the purposes of this book, we will measure smoothness of f (x) by counting the number of continuous derivatives it has. We have already seen an illustration of this principle in Theorem 3.8, which asserts that if f (z) is L1 on R (a statement about its decay at infinity), then j^(-y) is uniformly continuous on R ( a statement about its smoothness). In light of the Fourier inversion formula (Corollary 3.12), we can assert that if an L1 function f (z) has an L' Fourier transform (decay of
(XI
A
3.7. Smoothness versus Decay
77
f^(?) at infinity), then f (z) is also uniformly continuous on R (smoothness of f ( x > ) . A more precise statement of this duality starts with the following theorem. Theorem 3.33. (Differentiation Theorem) If f (x) and x f (x) are L' o n R, t h e n f^((r) is continuously differentiable o n R, and
Proof: We wish t o show that for each y,
First, form the difference quotient for
Since
e-2~ihx
f^(?) and calculate.
-
1
lim
= -2nix, h the proof reduces t u justifying the interchange of a limit and an integral. Specifically, we must prove that h-tO
) dx. We will make two estimates on thc quantity ( l l h ) (e-2nihx - 1). First, we expand the function g ( h ) = e-2Tihx about h = 0 in a Taylor series and use Taylor's formula (keeping only one term in the expansion) t o obtain the estimate
Taking now two terms in the expansion, we obtain the estimate
5
I h 2'
-
max
olsih
1
d2 -e
ds2
x
n
2
x
2
(3.37)
Chapter 3. The Fourier Transform
78
Using (3.36), we estimate
By hypotllesis, 277- 1x1 If (x)l is L1 on R. Using (3.37), we note that for any R > 0,
as h 3 0. Therefore, by Theorern 1.41, the interchange of lirrlit and integral is justified and (3.34) follows. The following corollary to Theorem 3.33 can be proved by induction (Exercise 3.37).
Corollary 3.34. f^((y)
If f (z)and x N f (x) are L' o n R for s o m e N E N , t h e n is cN o n R , a n d f o r 0 5 j 5 N ,
We can state ;L partial converse of Theorem 3.33 relating srnoothrless of the Fourier transforrrl of a function to the decay at infinity of the function itself. A
Suppose t h a t f(x) i s L1 o n R, and t h a t following hypotheses.
Theorem 3.35.
( a ) F o r s o m e N E N, (b) B o t h ?(?) and (c)
f^((y)
is
f(?)
cN o n R.
jlN)(?) are L'
f?")(?) or o 5 j 5 N , I Iirn rl-)~
o n R. = 0.
Then Iirn zN f(x) = 0 IX+W
Proof: Consider the function F ( x ) defined by the integral
satisfies the
3.8. Dilation, Translation, and Modulation
79
Integrating by parts N times and using (a) and (c), we obtain
Using ( b ) and the Fourier inversion formula,
Hence F (z)= (2niz) f (x).By (b) and the Riemann-Lebesgue Lernrna (Theorem 3.9), lim,,*, F ( x ) = 0 and (3.39) follows. Finally, we present a theorern relating decay at infinity of the Fourier transform of a function to smoothness of the function.
Theorem 3.36.
If f ( x ) and?(?) are L' o n R and i f T N a y ) is L' o n R for some N E N , the,,&J ( z ) i s c No n R, and for 0 j 0 such that f^(y) is supported in the interval [ - R / 2 , 0 / 2 ] . In this case, the function f (x) is sazd to have bandlirnit 0 > 0. The function f (x) has bandwidth B > 0 if there is an interval I such that II/ = B and such that f (7)is supported in I . A
Remark 3.48. (a) A furlclivn J(x) with bandlimit R > 0 also has bandwidth R > 0. However, in general the numbers are not the same. For example, let f (x) be the function whose Fourier transform f^(?) equals (?). (Whet is f (x) in this case?) Then the bandlirnit of f (x) is 2, whereas the bandwidth is 1.
(b) The bandlimit and bandwidth of a function f (x) are not unique numbers. For example, iff (x) has bandlimit fl > 0, then f (x) also has bandlimit 0' > 0 for any number 0' > fl. Similarly, if f (x) has bandwidth B > 0, then f (x) also has bandwidth B' > 0 for any number B' > B. (c) Intuitively, if f (x) is bandlimited, then f (x) does not contain arbitrarily high-frequency components. The Fourier inversion formula for a function with handlimit fl looks like
82
Chapter 3. The Fourier Transform
That is, f (z) consists only of "frequencies" e2"irz of period 2/R or greater. Thus, one might expect that a bandlimited function would be slowly oscillating and not have any sharp jumps or disconlinuities. In fact, the following theorern holds.
Theorem 3.49. Let f (x) be a bandlimited function with bandlimit R . Then: (a) T h e Fourier inversion formula holds for f (x);that is, for each z E R,
(b) f (x) is C" o n R.
Proof: We will prove (b) first, given the assumptiori that (a) holds. We would like to use Theorem 3.36, since f (x)being bandlirnited iniplies lllat f^(?) is L1 arid that rN is L' for every N F N. However, since we have only assuriied that f (x) is L2 on R and not necessarily L1, we carinot use the theoreni directly. However, if we examine the proof of Theorern 3.36, we see that all that is required is that the Fourier inversion formula hold. Then the argument in the proof of Theorem 3.33 may be applied. But this is exactly (3.44). In proving (a), we agairi run into the difficulty that f (z) has been assumed only to be L2 and not L1 on R. This is certainly not an insurmountable obstacle, but it does require some rather subtle argumentation. According to Theorem 3.32, the Fourier inversion formula holds in the L2 sense for f (x);that is,
y(?)
in L2 on R. By Plancherel's Formula for the L2 Fourier Transform (3.32), we know that f^(?)is L2 on R Since f ( y ) is also compactly supported, Theorem 1.9 says that is also L1 on R. Therefore, we can prove (see Exercise 3.51) that in fact, A
y(?)
lim
r+o+
Y,
in L" on R. Let us call this uniform lirrlil function g ( x ) ; that is,
Thus, we have an L2 limit function f (x) and an L" limit function g ( x )for the same sequence of functions. So we must show that in fact, these limit functions are the same.
3.9. The Sampling Formula
83
In order to do this, define the functions f, (a)by
and fix a number A
> 0.Then by Minkowski's inequality,
The left side of the inequality is independent of r , and the right side can be made as srnall as desired by choosing r > 0 srna.11 enollgh. Therefore for every A > 0, A
which implies that
If
(x)- 9(2)12dz
= 0.
Since f (x) is piecewise continuous by assumption, and g(z)is in fact continuous by the argument used in the proof of 'l'heorem 3.8, f (z) = g(z), except possibly at the discontinuities of f (x).Since there is no problem redefining f (x)at these points, we can conclude that f (x) = y(x) for every x E R. But this is (3.44). One of the fundamental results in Fourier analysis is the Shannon sampling theorem. The theorern asserts that a bandlimited function can be recovered from its sarrlples on a regularly spaced set of points in R, provided that the distance between adjacent points in the set is srnall enough. The formula is also very important in digital signal processing applications. Theorem 3.50. (The Sharlrlorl Sarrlplir~gTheurem) I1 S ( x ) with bandlimit Cl, t h e n f (x)c a n be w r i t t e n as
where the s u m converges in L2 and L"
,is b~7idli,11L%led
o n R.
'This theorem has a long and interesting history that is recounted beautifully in the article by Higgins, Five short stories about the cardinal series, Bulletin of the AMS, vol. 12 (1985) 45-89.
84
Chapter 3. The Fourier Transform
Proof: Since f^(y) is supported in the interval 1-R/2, fl/2], we can expand f^(y) in a Fourier series and obtain
for y E [-fl/2,fl/2], where
But by (3.44), it follows that
Making the change of summation index n
+ -n
leads to
Again applying (3.44), we obtain
where we have used the fact that Fourier series can be integrated term-byterm (Exercise 2.67) and that for any numbers a > 0 and b # 0,
(Exercise 3.4). To see that the convergence of (3.47) is uniform on R, let N, M E Z be fixcd.
3.9. The Sampling Formula
85
where we have used the Cauchy-Schwarz inequality and where the c(n) are t,he Fourier coefficients of f^(y). But since the Fourier series of f (?) , . coriverges to f (y) in L~ on [-012, fl/2], A
The L2 convergerice of (3.47) follows from the fact that the collectio~i
is an orthonormal system on R (Exercise 3.52).
Exercises Exercise 3.5 1. Prove equation (3.46). Exercise 3.52. Prove that the collectiorl
is an ortllonormal system on R. (Hint: Use Parseval's formula.)
Exercise 3.53.
x E R,
Show that if f (x) has bandlimit fl > 0, then fbr every
86
Chapter 3. T h e Fourier Transform
Exercise 3.54. Show that if f ( x ) has bandwidth B > 0, then f (x)can be completely recovered from the samples { f (TL~B)),,~. Exercise 3.55. Let f (x) be L~ on R, and let r > 0 be given. Prove that there exists a handlimited function g(x) such that $(y) is C0 on R and 11 f - g1I2 < r. (Hint: Use Corollary 2.37(b) in the Fourier transform domain.)
Chapter 4
Signals and Systems In the previous chapter, we considered piecewise continuous functions with period 1 and showed that it is possible to represent such functions as an infinite superposition of exponentials en(t) = e2"int, n E Z. Each such exponential has period l / n and hence completes n cycles per unit length (which we can interpret as measuring time). If the exponentials are interpreted as "pure tones" of n cycles per second, then each f (t) has a "frequency representation" of the form
where
We also know that
Conceptually, there is nothing stopping us from changing our perspective and regarding the sequence { f (n)InEz as the object to be given a "frequency representation." In this case, (4.2) is such a representation in ,. which we consider f (n) to be a coritinuous superpositiorl of "pure tones" on Z, e,(n) = e2T"Lx,which complete about one cycle every l / x time steps (indexed by r ~ ) Equation . (4.1) now gives a forrnula for the coefficients in this continuous frequency representation. This new perspective is very well suited for digital signal processing (DSP) applications in which data are necessarily in the form of arrays of numbers. These arrays are of course always finite but can be of arbitrary length. Hence, it is convenient t o regard these objects (to which we will give the name signals) as being infinite sequences. A related perspective regards signals of length N as periodic sequences with period N . This chapter provides a discussion of some of the ternlinology and basic results of the mathematical theory of DSP from both perspectives. A
88
Chapter 4. Signals and Systems
4.1
Signals
Definition 4.1. A signal is a sequence of numbers {~(n)),~z satisfying
Remark 4.2. By basic results on convcrgcnt scrics, any signal must be bounded, that is, there is a number M > 0 such that /x(n)1 < M for all rL E Z. It is also true that any signal satisfies C , Ix(n) 1' < m. Such sequences are said to be (Exercise 4.6).
Definition 4.3.
t2 sequences or somctimes to have finite energy
The frequency domain representation of a signal x(n)is the
function
Remark 4.4. (a) Since En 1x(n)1 < a, the sum defining ?(w) converges ilniformly to a continuous function with period 1.
(b) Recall that the set {e2""}w,,[o,I) is the unit circle in the complex plane. This is because if z = e2"7w,then lzl = 1. Hence the function X(eZKLW) car1 be tlwught of as the restrictioii t o tlie unit circle of some function X ( z ) defined on some portion of the conlplex plane containing the unit circle. Specifically, we can define
wherever the surn makes sense. The function X ( z ) is referred t o as the z t m n s f o r m of z(n,).
Example 4.5.
Also,
(a) Let z ( n )=
1 ifO 0, take S = E I C . ) Exercise 4.28.
(a) Prove Lemma 4.11 by showing that for each n t Z, s(k)S(n-k) =x(n).
lirn
N-+co
k--N
(b) Prove that in fact (4.4) holds in the following strong sense:
lirn
N+m
1Ix(n) k=-N C z(k) b(n -
-
k) = 0.
~LEZ
In the notation of Remark 4.9(b), we are being asked to prove that lim IIx - :xNllel,
N+m
where
N
Z N ( ~= )
C
x(k)h(n - k).
k=-N Exercise 4.29. (a) We say that x ( n ) is a finite signal if there exist numbers N, M E Z such that z ( n ) = 0 if n, < I\/[ or n > N. In other words, x ( n ) is a finite signal if it has only finitely many nonzero entries. Prove that, given a signal x(n), there is a sequence of finite signals {xk ( n ) j k E N such that lirnk,, Ilx - xlc = 0.
(b) Note that all calculations in the proof of Theorem 4.12 are legitimate for finite signals ~ ( n ) Show . that the stability of T implies that in fact, Tx(n) = (x * h)(n) for all signals x ( n ) (finite or not).
4.3. Periodic Signals and the DFT
101
A
Exercise 4.30.
Prove that if h ( n ) is a real valued signal, then h(-ci?) =
A
h(4. , Exercise 4.31. Show that if R ( s ) = s P mm system T is givcn by T x ( n ) = z ( n - n ~ ) .
> 0, then tlie correspoiiding
Periodic Signals and the Discrete Fourier Transform A different inode1 for thinking lnathenlatically about finite sigrlals is to corisider the finite signal to be infinite in length but periodic. Iri other words, given a finite data set of leligtli N , ( ~ ( 0 )x(1). . . . . , . r ( N l ) } .define a corresporidirig irifi~iitesequence F(n),n E Z, by F ( n ) = . c ( r ~11iod N ) . Iii this case, . c ( n ) = J:(TI,)wlierlever 0 < rl, < N , so that .?(?I,) is considcrctl an extension of
n.(~i,).
FIGURE 4.4. Left: A signal tension F(n).
~ ( 7 1 )of
length 5. Right: its period-5 ex-
Definition 4.32. G i v e n N E N , a sequence { ~ ( n ) tz ) , ,is a period N sigrial "x(n+ N ) = z ( n ) t o r all 7~ g Z . I n this case x(n) is said to be periodic. Remark 4.33. (a) It is clear that a period N signal, unless it is identically zero, can never be a sigrial in t,he sense of Definition 4.1, since the absolute values of its entries will always surn t o infinity. However, a periodic signal is bounded in the sense that there is a number A f such that I.c(n)1 A1 for all n E Z.
j', and let k , k' E Z. Then by Lelrinia 5.2, there are three possibilities.
(1)Ij/,k/ f' Ij,k= 8. In this case, h,,k(x)hj/,k/(x) = 0 for all z and
I;, ,,.
(x)is identically 1 on I:,,,,.Since Ij.*C In this case, I:,,,, it is also identically 1 on I j , , Sirrce hi,,( s )is supported on I,>r,
(2)
C
(3) I j , k c I;, Thus,
k,.
111this case, hj,,kt(x) is identically -1 on Ij;,,, arid on
IJ,k.
Theorem 5.14.
Given any j E Z , the collection of scale j Haar scaling functions is a n orthonormal system o n R.
Proof: Exercise 5.19. Although it is by no means true that the collectiori of all Haar scaling functions is an orthonormal system on R, the following theorem holds.
Theorem 5.15.
Given J E Z , the following hold.
( a ) T h e collection {p.~,k(x), h,,,k(x):j > .I, k E Z ) is a n orthonormal system o n R. (h) T h e collection {P.I,k(x),h.~,k(x): k E Z) is a n orthonormal system o n R.
Proof: Exercise 5.20.
120
5.2.3
Chapter 5. The Haar System
The Splitting Lemma
Lemma 5.16. (The Splitting Lemma) Let j E Z , and let g,(x) be a scale j dyadic step function. T h e n g, (x) can be written as g, (x)= r,-I (x) g,-I ( x ) ,
+
where r, - 1 (x) has the form
for some coeficients { a , -1 ( k ) ) k E z ,and g, -1 (x) i s a scale j - 1 dyadic step function.
Proof: Since gj(x) is a scale j dyadic step function, it is constant on the intervals I j , k . Assume that g j (x) has the value cj (k) on the i~lt~erval Ij,k. For each interval I j W l , k , define the scale j - 1 step function gj-l(z) on I j - l , k by
In other words, on I j - l , k , g3- 1 (x) takes the average of the values of y j (z) on the left and right halves of Ij-l,k(see Figure 5.2(a)). Let r,-l (z)= gj (z) - gjPl(x). By Remark 5.5(a), gj-l (x) is a scale j dyadic step function, and by Remark 5.5(c), so is T ~ - ~ ( xFixing ). a dyadic interval Ij-l,l., recall that lIj-l,k= 2-(i-'). hen'
1 2-(j-l) (cj(2k)
2
=
+ cj (2k + I ) )
0.
Therefore, on Ij-l,k, r j p l ( x ) must be a rnultiple of the Haar function hj-l,k(x) and must have the form (5.4) (see Figure 5.2(c)).
Exercises
5.2. The Haar System
121
FIGURE 5.2. Illustration of one step in the Splitting Lemma. Top left: Solid: Scale 4 dyadic step function y4(z). Dashed: The scale 3 dyadic step function g3(x) constructed as in the Lemma. Top right: Graph of g3(x). Bottom: Graph of the residual rs(x).
Exercise 5.17.
Prove the statements rrlade in Remarks 5.10(a) and 5.12(a)
Exercise 5.18. Prove that po.o(x) = 2-112 P,.,(x)
+ 2-'12
ho,o(") = 2-'12 p 1 , 0 ( x )
-
p1)L,1(2)
and
Exercise 5.19. Prove Theorem 5.14. Exercise 5.20.
Prove Theorem 5.15.
2-112
p1.1 (x).
122
Chapter 5. The Haar System
5.3 Haar Bases on [ O , 1 ] Definition 5.21. For any integer J 2 0, the scale J Haar system on [ O , 1 ] is the collection
W h e n J = 0, this collection will be referred to simply as the Haar system on [0, 11. See Figure 5.3.
Remark 5.22. (a) The Haar system on [O,1] consists of precisely those Haar functiorrs h j , s ( x )corresponding to dyadic intervals Ij,k that are subsets of [0,I], together with the single scaling function po,0(x).
(b) For J > 0, the scale J Haar systern on [0, I] consists of precisely those Haar functions h j , k ( ~corresponding ) to dyadic intervals Ij.k for which j 2 J and that are subsets of [0, I], together with those scale J Haar scaling functions that are supported in [O,l].
Lemma 5.23. Given f(x) continuous o n
[U, 11, and
t
> 0, there is a J
E Z,
and a scale J dyadic step function y (x) supported in [O,1]such that If (x)-y (x)1 < for all x E [0, I]; that is, 11 f - g J ( , < E .
E,
Proof: Exercise 5.26. See also figure 5.5. Theorem 5.24. For each integer J > 0, the scale J Haar system o n [0,I] is a complete orthon,omn,al system o n [0,11. Proof: That the scale J Haar system on [0, I] is an orthonormal system on [0, I] follows from the fact that it is a subset of the collectio~l { p ~ , ~ ( hj,k(z): x), j J , k E Z), which is an orthonormal system on R by Theorern 5.15(a). For completeness, it is sufficient, by Theorern 2.57(c), to show that if f (x) is Ci' on [ O , l ] , then
>
Let E > 0, and let f (x) be C0 on [0, I]. By Lemma 5.23 there exists j and a scale j dyadic step function on [0, I], gj(x) such that
> 0,
Since any scale j dya,dic st,ep function is also a scale j dyadic step function at all higher scales, we can assume that j 2 J.
5.3. Haar Bases on [O, 11
123
FIGURE 5.3. Some uf the Haar functions h j S k ( 5 on ) [O, 1).
By the Splitting Lenima (Lemma, 5.16), g,i(z) m a y be written g j (x) = rJ- 1(x) gj- 1(x), where rj- (x) 1-ias the form (5.4), and is suppurled in [O, 11 and gj-1 (x) is a scale j-1 dyadic step function. Repeating this process j - J times, we conclude that
+
where each r e ( x )can be written
for some constants a p ( k ) and where g~(x)is a scale .7 dyadic step function (Figure 5.6). But this just means that g J ( z ) is a finite linear combination of the collection {pJ , (~s)}:Lil. Thus gi ( z ) is in the span of the scale J Haar system on [ O , l ] and I f - gj1I2 < t.
0
124
Chapter 5. The Haar System
FIGURE 5.4. Some of the Haar scaling functions p , , k ( x ) on [0,1).
Example 5.25. (a) Let f (.c) = X[0.3/4)( 2 ) . Taking J = 0, we see that = 314, and that ( f ,11,,7.k)= O wheiiever Ij,kC [O: 314) or I j , k 5 (f, [3/4,1). This is true for every j > 2 arld all 0 5 k: 5 2J - 1. Thus, the only ilorizero Haar coefficieiits are (f , ho.") = 114 arid ( f ,h l T l )= 2 - v 2 . Notice that the P ( ~ , ~ ( Xterrn ) is simply the average value of the function on 10, I ) , and that the only nonzero Haar coefficients correspond to the Haar functions that "strad(lleVthe discontinuity of f (z). (lr). Again, assuine that J = 0. Then (.f, po,o) = (b) Let f ( z ) = X p l 11/16, which is the average value of f (x) on [0, 11, and ( f ,hj,k) = 0 whenever IjSk C: [O, 11/16) or I j , k C: [11/16,1). This is true for every j 2 4 and all 0 k 5 2j - 1. The only nonzero Haar coefficients are ( f ,ho.o)= 5 . T4, ( f ,h l , l ) = 3 . r 7 i 2 ,(f,h2,2)= T3,and ( f ,h3,5)= 2-5/2. See Figure 5.7.
130
Chapter 5. The Haar Systcm
so that
Therefore, we conclude that t l ~ efraction of possibly rlorlzero Haar cocfficierlts for a fulictio~lvstrlisliing outside an interval is approximately proportional t o the lengtli of that interval (see also Figure 5.10).
5.4.2 Behavior 0.f Huur Coeficients Ncur Jump Disco~~tinuilies Suppose that f (z) is a furictiorl defined on [ O , l ] , with a j u ~ i ~discontinuity p a t 2 0 E ( 0 , l ) arid contiriuous a t all other points in [0, I]. The fact that the Haar functions hj,k(x) have good localization iii tirrie leads us t o ask tlic question: Do the H a a ~coc:ficient.s (f,I L ~ . ~such ) that x0 E 13,k behave differer~tlythan do the H(l,ar. coefjicients S ~ L C that ~ L ~r:o $ IJ,k? In ptirticular, can wc firltl the location of a jump discontin~~ity just by examining tlle ;~l)solutevaluc of tlle Haar coefficients? Wc will sce that in fact wc can do tliis. For sirriplicity, let 11s asswnc that the givcn functio~if ( J ; ) is C2 or1 the iritervals [O, xu] :-tnd [.xu, I]. Tliis nlcarls that both f t ( x ) and f t t ( x ) exist. are co~it,iriuousfunctions, arid lierice are l)ou~idcdon each of these intervals. Fix i~ltegcrsj 2 0 a~itlO 5 k 5 2 5 - 1, anc1 let .c5,k 1)c the rriidpoirit of the interval IJ.k.;that is, .r,?k. - 2 - j ( k : 1/21. Tliere ;ire ~ i o wtwo possibilities; cithtr :x:o E I j . k .or 20 $!
+
Casel: .rao @ I,,,.. If .r.o 6 I j , k , then expancling f (:r) a1)out .x.,,,.by Taylor's fo~~iliulit, it follows tllttt for all .r: E
wliore
Ej,,.
i s s o ~ n epoint, i r ~rJ,k. Now, using thc k c t that
h j q k ( xd)s
-
0,
5.4. Comparison of Haar with Fourier
131
If j is large, then 2-5f12 will be very small compared with 2-"12; so we conclude that for large j ,
Tf xo E I,,*, t,hen either it is in I:,, or it is in I:,. Let us Case 2: r o E assume that xo E I:,,. The other case is similar (Exercise 5.30). Expanding f (x)in a Taylor series about $0, we have
Therefore,
where
Thus,
If j is large, then T 3 j I 2will be very small compared with 2 - j I 2 ; so we conclude that for large j ,
132
Chapter 5. The Haar System
The quantity Izo - 2TJk can in principle be small if s o is close t o the left endpoint of I ; , , arid can even be zero. However, we can expect that in rnost cases, xo will bc iii the middle of so that s o- 2-Jkl F; (114) 2-J. Thus, for large j. 1 ( f , )= f ( - ) - f ( ~ o + ) 2 - . 7 / ~ . (5.8)
11~
1 for 1a.rgej is Cornparing (5.8) with (5.6), we see that the decay of I ( f ,11 considerably slower if zo E Ij.kthan if :co @ IJ,k. That is, large coefficients in the Haar expansion of a fiinction f (.c) that persist for all scales suggest the preserice of a jurnp tliscoritiriliity ill the intervals IJ,k corrt:sporiclirig to the largc coefficients. 5.4.3 H a w Coeficients and Global Smoothness We know that the global silioothrless of a function f (.c) defined on [O. I] is reflected in the decay of its Fourier coefficients. Specifically, if f (z) is periodic and CK 011 R. thcn therc exists a constant A depending on f (z) such that for all Ir E Z, I(., 1 A ~ T I J - ~where . c,, are the Fourier coefficients of f (z) (Excrcisc 3.38). This can be regarded as a stntcrrlcnt about the frequency contcrlt of srnootli fi~nctions~ rlanlcly that snloother filnctions tend to have smaller high-frequency corrlponerlt,~than do filnctiorls that are not smootli. However. no such cstirnat,~holds for tlie Haar series. To see tliis. sirrlply note that tlie f~~nctiori f (n.) = o'"' has period 1 and is Cw on R with all of its dcrivatiw:~1)oilnded by 1 (Show tliis). But by Exercise 5 . 3 2 ,
0 and 0 < k < 2J
1,
-
1, let
h,j,k(z) e-2"77'"' ctz.
(a) Sliow that
where
~ : j , k :=
2-.j(k
+ 112) is the rrlidpoirit of I,,c
(b) Show that ( a i ( k ) (is maximized when In( = (314) 2.j, and that the first positive zero of la{l( k )1 occln-s when n = 2J+' (see Exercise 5.31). Hence it is reasonable t o say that ! L ~ , ~ ( Xis) localized at the 2J f r e q ~ ~ e n cies satisfying 2.ipl 5 In( r n
pJ.k) =
(f.l ) J . k )
(6.4)
(Exercise 6.5). Supposc that we are given ix finite sequence of data of length 2N for some N E N. {cO( I C ) 2\} ~ -1, ~ . Wc assume that for some underlying function f (.c). cao(k)= ( f . ~ ~ Fix , ~ ,I) E. N. tJ < N. and for each 1 < j < tJ, define
142
Chapter 6. The Discrete Haar Transform
It turris out that there is a convenient recursive algorithm that can be used k ) d , ? ( k )from ~ j - ~ ( k This ) . algorithm t o compute the coefficierit,~~ , ~ (and uses the fact, proved in Exercise 5.18, that for each Q, k E Z, 1
(XI
+
P P , ~ = - (~e+i,zlc (x) p~+1.21;+1 (x)),
J'
(k)
Jz
=
( f - ~ ~ - , , . k )
-
1 -
h
1 (fl(r)~-j+l.2k) + - ( f .~
JZ
~ - j + l , 2 k + ~ )
arid by (6.6),
By writing (6.7) arid (6.8) in rriatrix form, it is easy t o see that the calculatiorl is corripletcly reversible. In fact,
6.1.1
The Discrete Haar Transform ( D H T )
Therefore, wc rrlake the followirlg definition.
Definition 6.1. Given J , N E N with .I < N and a finite sequence co = { c g ( k ) } : Z , ' , the ( D H T ) of co is defined by
where
6.1. Motivation
143
The inverse DHT is given by the formula
As with the DFT, thc DHT can be thought of as a linear transformation on a finite-dimensional space and as such can be written as multiplication by a matrix.
Definition 6.2.
Given L
N even, define the ( L / 2 ) x L matrices H L and
E
G L by 1 1 0 0
0 1
...
1
0
...
0
1
HL = 0
...
1
Define the L x L matrix Wr, by
Il'he matrix H L i s referred to as the approximation matrix, the matrix GL as the detail matrix, and the matrix WI, as the wavelet ma,trix.
From here on, we will suppress the subscript L in the matrices H L , G L , and WL in order to clarify the presentation. The value of L will either be clear from context or will be indica,t,ed sepa.ra.t,ely.It is easy to verify that for each L, W is an orthogonal matrix (Exercise 6.6). Hence its adjoint is its inverse; that is,
=
(H* G * ) ( E )
where I is the L x L identity matrix.
144
Chapter 6. T h e Discrete Haar Transform
If we consider our initial sequence of data, co, to be a vector of length L = 2 N , then the DHT algorithm reduces t o rr~atrixmultiplication. Specifically, let co = ( ~ " ( 0 )c o ( l ) - . . ~ ~- 1)): ( and 2 for~1 < j < J, let
and
dj = (d,,(0) dJ (1) . - - d , (2N--'
-
1)).
T h e n t , h e DHT of co is giver1 by
where H and G arc 2N-"2 by
x 2 N - , ~ - 1matrices. Thc inverse DHT is give11
Tlie above car1 be sunlrriarized as follows. Theorem 6.3.
Given .I, N E N with .I
sof the image. Corlsider for exanlple the 2N-1 x 2N-1 mat,rix d j l ) derived from an
z N -1
image coo = {co (n, rn) I,,,,, =0 by (6.15). Fix 0
5 n. 7 n 5 2 N - 1 . Since
Chapter 6. The Discrete Haar Transform
154
djl) = G ~ B ~ H we ~ % have ~ that C ~by, (6.12) and (6.14), djl)(n,m )
-H
=
row
~ N co(2n,m)
JZ
1
--
JZ
H;O,WC,,
1 (co(2n,fZm) 2
=
-
(2n
+ 1,m )
+ co(2n, 2m + I ) )
If (2n, 2m) is a horizontal edge point of the image co: then the differences co(2n, 2m) - co(2n 1,2m) and co (2n,2m I ) - co(2n 1,2rrz 1) will tend to be large due to the large variation in pixel values in the vertical direction. If (2n, 2m) is a vertical edge point, then these same differences will tend to be close to zero. If (212, 2.172) is a diagorlal edge point, then tlie pixel values will tend to be similar in one of the diagonal directions. That is, at least one of co(2n,2m) - co(2n 1:2m 1) or co(2n,2m + 1)- co(2n 1,2m) will be
+
+
+
+
+
+
+
close to zero. Hence, if (2n, 2m) is a horizontal edge point then dl( 1 ) (n, rn) will tend to be larger than if (2n, 2m) is either a vertical or a diagonal edge. The same argument can be made if the edge point is at (2n,2m + l), (2n 1,2m), or (2n 1:2m 1). Similarly, d l( 2 ) (n. m) will be largest if any of (2n, 2m) : (2n: 2m 1).(2n 1,2m), or (2n 1,2m 1) is a vertical edge, and dY)(n,m ) will be largest if any is a diagonal edge. See Figure 6.4. Since the matrix cj can be thought of as containing the features of the origi~lalimage t,hat are of size 2-r la,rger, t,lia,t is, those features that are larger than scale 2% the matrices d,(1), d,( 2 ) , and d (j 3 ) are interpreted as identifying, respectively, the horizontal, vertical, and diagonal edges at scale 2.7' (See Figures 9.3, 6.7-6.9).
+
6.3.3
+ +
+ +
+
+
"Naive" Image Compression
The key to good image compression is to find a representation of the image with as few numbers as possible. In the language of orthogonal decompositions, this means finding an orthonormal basis in which most of the coefficients of the original image are zero or at least very close to zero. In principle the srnall coefficients can be set to zero without significantly affecting the quality of the image. The purpose of this section is to illustrate some of the principles that make wavelets effective for image compression. The central idea has been
6.3. Image Analysis with the DHT
155
alluded to before, namely that in decomposing a matrix cn into the four matrices cl, d l(1) , d y ) , and dl( 3 ), we have separated the smooth (or slowly varying on a scale of two pixels) parts of the image from the nonsmooth (or rapidly varying on a scale of two pixels) parts of the image. These latter parts are usually interpreted as edge points. If the image consist.^ of large areas of constant intensity separated by edges (which is true of many images), the detail mat,rices will contain many elements that are nearly zero. The same is true when we decompose the matrix cl into c2, d2(1), d(2) , and d r ). This principle is illustrated in Figure 6.10. Here we have taken an image and have computed its DHT with J = 3. We choose various thresholds; that is, fixed numbers below which the DHT coefficients are set to zero, and compute reconstructed images. We see that if 80% of the smallest coefficients are set t o zero, the image is virtually unchanged. If 90% of the smallest coefficients are set t,o zero: most important features of the image are still visible. If 97% are set t o zero, there is significant distortion, but gross features of the image are still recognizable.
FIGURE 6.6. Original image (top left). Reconstruction using only the coefficients (top right), c.2 coefficients (bottom left), and r:c coefficients (bottom right). Reconstructions are increasingly blurred and blocky.
6.3. lrrlage Ai~alvsiswith the DHT
15'7
FIGURE 6.7. Left: Horizontal edges at scale 1. Right: Horizontal edges at scalc 2.
FTGURE 6.8. Left: Vertical edges at scale 1. Right: Vertical edges at scale 2.
158
C1,lapter 6. The Discrete Haar Transfur~n
FIGURE 6.9. Left: Diagonal edges at scale 1. Right: Diagonal edges at scale 2.
6.3. Image Analysis with the DHT
159
FIGURE 6.10. Original image (top left). Compressed image with smallest 80% of DHT coefficients set t o zero (top right). Compressed image with smallest 90% (bottom left) and 97% (bottom right) of DHT coefficients set to zero.
Part I11
Ort honormal Wavelet Bases
Chapter 7 Mult iresolut ion Analysis In Section 5.5, we saw that if h(x) = X[o.l/2)(x) - X[1/2,1)(x), then the collection
forms an orthonormal basis on R. In this chapter, we will see how this construction can be generalized. In particular, we will present a general framework for constructing functions $(x), L2 on R, such that the collection
is an orthonormal basis on R. Such a function $(x) is called a wavelet and the collection {$j,k(x)}j,kEza wavelet orthonormal basis on R. This framework for constructing wavelets involves the concept of a mz~ltiresobution analysis or MRA. Before giving the definition of an MRA, we need to study some properties of c~llect~ions of fi~nctionsof the form
where g(x) is some fixed L~ function. In Section 7.1, we address the following questions: (1)When is the collection {T,g(x)} an orthonormal system? and (2)When does Llle subspace spal?{Tng(x)) admit an orthonormal basis of the form {Tnh(x)) for some possibly different function h(x)? In Section 7.2, we define the notion of an MRA and derive some of its basic properties and in Section 7.3 present some examples of XIRA. In Section 7.4, we give the very simple recipe for corlstructing a wavelet orthonormal basis from an MRA and present some examples of wavelet bases. In Section 7.5. we present a proof that this recipe works, and in Section 7.6, we gather some necessary properties of the scaling and wavelet functions t#hatfollow from the definition of MRA arid the colistruction of the wavelet. Thcsc properties will be useful in later chapters when we explore more examples and generalizations of wavelet, orthonormal bases. Finally, in Section 7.7, we discuss the Battle-LeMarii! construction of spline wavelet orthonormal bases.
164
Chapter 7. Multiresolution Analysis
7.1 Orthonormal Systems of Translates In our study of multiresolution analyses arid their associated wavelet bases, we will frequently encounter orthonormal systems that are integer translates of a single function. In addition to sharing the general properties of orthonormal systems presented in Section 2.3, systems of this form also have special properties that will be valuable in the construction of wavelet bases. In this subsection, we present some of these properties.
Definition 7.1. A n orthonormal system o n R of the f o r m {Tng(~))nE~, where g(z) i s L2 o n R i s called a n orthonormal system of translates.
Example 7.2.
(a) The collection of scale 0 Haar scaling functions
(Definition 5.9) is an orthonormal system of translates by Theorern 5.14. (b) The collection of scale 0 Haar functions {ho.k(z):k E Z ) (Definition 5.11) is an orthonormal systern of translates by Theorem 5.13.
Rernark 7.3. By Theorem 2.55, if (T,g(x)) is an orthonormal system of translates, then it is by definition an ortllornormal basis for the subspace span{Tng(x)). In other words,
f (x) E span{T,g(x)) if and only if
Lemrna 7.4.
T h e collection (T,g(z)) i s a n orthonormal system of translates zf and only if for all y t R,
Proof: Not,e first, t,ha,t, ( T k g , Tl g ) = ( g , Ttpkg ) = 6 ( k - !) if and only if (g, Tkg) = 6(k). By Parscval's formula,
7.1. Orthonormal Systems of Translates
165
(see Exercise 7.10 for a justification of the interchange of the sum and integral in the last step). By the uniqueness of Fourier series,
if and only if x F ( y + n ) / 2 = 1 forall ~ E R . n
Lemma 7.5.
Let { T , g ( x ) ) be a n orthonormal system of translates. T h e n the function f ( x ) t @Zi%(Tng(x)) zf a n d only i f there i s a n t2 sequence { c ( n ) ) such LJLU L
ic-:) = n-a (Ccc1.
e-2x.n~)
n
Remark 7.6. The only assumption being made about the coefficieiits { ~ ( n ) is ) that C , Ic(n)I2 < CCI. Therefore, we cannot conclude necessarily that the Fourier series Enc ( n )e p Z T i nis~ a piecewise continuous function. The most we can conclude, however, is that this Fourier series represents a function L2 on [0,11 in tlie sense of Lebesgue. We know this to be true by the Riesz-Fischer Theorem (Theorem 4.48). This particular technicality does not enter seriously into the proof of Lemma 7.5; so we will not mention it further. In light of (7.3), we know that the coefficients c ( n ) are given by ( f ,T,g) so that by making appropriate assumptions on the functions f ( x ) and g(x), we can say more about the coeficierlLs ~ ( 7 1 )and llle corresponding Fourier series.
Proof:
(=-=+) Suppose that f ( x ) E ~ { T n g ( x ) )By . Theorem 2.55,
in L2 on R.Taking the Fourier transform of both sides and using Theorem 3.40(b),
166
Chapter 7. Multiresolution Analysis
By Bessel's inequality,
so that
sat sifies (7.2). (t=) Suppose that (7.2) holds. Then by the Riesz-Fischer Theorem (Theorem 4.48),
Letting
C
f ~ ( =4
c ( n )Tng(:c),
InllN
it follows that f N ( x ) E span{~,g(x)} and that
Therefore, by Plancherel's formula (Theorem 2.57(d)), Lemma 7.4, and c(n) tip2""?, the periodicity of
and hence f (x) E @ZE{T,g(z)).
7.1. Orthonormal Systems of Translates
167
In some of the examples of multiresolution analyses that follow, we will encounter collections of the form { T , ~ ( X ) )that , ~ ~are not orthonormal systems of translates but that satisfy a weaker version of (7.1), namely that there exist constarlts A, B > 0 such that for all y E R,
For such a system, we wish to consider the subspace ~ { T n g ( x )and ) show that in fact there is an L~ function g(x) such that {T,,g(x)) is an ortllonormal basis for @Zii{T,g(x)). The construction of g ( x ) is referred to as an orthogonalixation of the collection { ~ , , g ( x ) ) . The following lemma shows how to orthogonalize a collection {T,g(x)) satisfying (7.4). To avoid certain technicalities in the proof, we assume that g(x) has compact support. Lemma 7.7. Suppose that g ( x ) is L' o n R with compact support. If the system { T , g ( x ) ) satisfies (7.41, then there is a function ;(XI, L~ o n R, such that: (a) {T,g(z)) is a n orthonormal system of translates and
Proof: Sirice g(x) is compactly supported, Exercise 7.11 and (7.4) implies that the function
is a trigonometric polynomial that never equals zero. Define
Then @(y) is C0 (in fact, Cm) on R and car1 be expanded in a Fourier series as
where the Fourier coefficients satisfy Define the function g ( x ) by
C , Ic(n)l2 < a.
Taking the inverse Fourier transform of both sides, it follows that
168
Chapter 7. Multiresolution Analysis
Since g(x) has compact support, the sum on the right side is finite if x is restricted to any closed finite interval. Hence, on every such interval, g(x) is piecewise continuous and so is piecewise continuous on R. By (7.4), @(y) is L" on R so that ?(?) and F(z)are L2 on R. Since @(y)has period 1,
for each k E Z, and hence
By Lemma 7.4, {T,g(z)) is an orthonormal system of translates and (a) is proved. To see that (b) holds, note that by (7.5) and Lemma 7.5,
By Exercise 7.9, Tkg(x) E span{T,g(x)) for each k E Z and by Exercise 2.61, span{TTLg(z)} is closed under the formation of linear combinations. Therefore span{T,,g(x)) C ~pan{T,g(x)}. Let f (x) E S@Zifi{~,g(x)). This means that given t > 0, there is a function h ( z ) E span{~,ij(x)) such that (1 f - hlla < c/2. Since span{T,g(x)) C: ~p""{T,g(x)), there is a function r ( x ) E span{T,g(x)) such that 11 h-rl2 < €12. Therefore, by Minkowski's inequality,
Hence f (x) E span{T,g(x)) and
A
Since g^(y) = y(?) @-l(-y) the same argument with a l ( y ) replacing @(y) shows that span{T,g(x)) C spnn{T,g(x)} and (b) follows.
Exercises Exercise 7.8. Given any function f (x), L~ on R with J J f 1J2 center of mass of f (x) is defined to be the number
=
I, the
7.2. Definition of MRA
169
provided that the integral exists. Prove that if the center of mass of a function $(x), L2 on R with )($112 = 1, is m, then the center of mass of 4,,k(x)is 2 - j ( r n k ) .
+
Exercise 7.9. Consider the collection of functions (orthonormal or not) {Tng(x)) where g(x) is L~ on R. Show that w { T n g ( x ) ) is invariant under integer translatio~l.T l l a ~is, slluw that if f (x) t span{TT,g(z)), then Tkf(x) E spal?{Tng(x)) fur each k E Z. Exercise 7.10. Justify the interchange of the sum and int,egral in the proof of Lemma 7.4 by showing that if g ( x ) is L2 on R, then the suln
converges in L1 on [0, I].
Exercise 7.11.
(a) Show that if g(x) is L2 on R, then
(b) Conclude that if g(z) is L2 on R and has compact support, then
En
+ n) l2
is a trigonometric polynomial.
7.2 Definition of Multiresolut ion Analysis Definition 7.12. A multiresolution analysis o n R i s a sequence of subspaces { V , ) , E ~of functions L~ o n R satisfying the following properties. ( a ) For all j f Z , T/,
C V,+1.
( b ) I f f ( z ) i s Cz o n R, then f (z) E Sj5iE(V,),Ez. That is, given i s a j E Z and a function g ( x ) E V, such that I1 f - 9112 < E .
E
> 0, there
( d ) A function f (x)E Vo i f and only if D02.1 f (x) E VJ. (F)
There exists a function cp(z),L~ o n R, called the scaling function such that the collection {T,cp(z)) i s a n orthonormal system of tran,slates and
170
Chapter 7. Multiresolution Analysis
Remark 7.13. (a) Typically, an MRA is defined by first identifying the subspace Vo, defining V, by letting
so that Definition 7.12(d) is satisfied, and then proving that Definition 7.12(a), (b), (c), and (e) hold. Vo can be defined by first identifying a function cp(x) such that {T,p(x)) is an orthorlormal system of translates, and then defining
(b) Note that if f (x) E Vo, then by Definition 7.12(e),
By the orthonormality of {T,,cp(x)),
7.21 Some Basic Properties of MRAs In this subsection,
{VJ)
always denotes an MRA with scaling function p(x).
The Approxirrlation and Detail Operators
Definition 7.14.
For each j , k
E
Z define cpj,k(x)b y
For each j E Z, define the approximation operator Pj o n functions f (x),L' o n R by
For each j
E
Z,
define
th,e detail operator Q, o n functions f (x),L2 o n
R
by
Lemma 7.15. For each j E Z, {pj,k(x))j,kEz i s a n orthornormal basis for
7.2. Definition of MRA
171
Proof: Note t h a , t since y O , k ( ~ E) VOfor all k, Definition 7.12(d) implies that D 2 3 ~ 0 , k= ( ~(Pj,k(x) ) E for a11 k . Also, since {cpo,k(x)}is an orthonormal system of translates, Theorem 3.42 (c) implies that,
Hence { c p j , k ( ~ ) ) k E z is an orthornormal system on R. Given f (x) E DZ-/ f (x) E Vo SO that by Definition 7.12(e) and Theorem 3.42(c),
4,
Applying D2J to both sides of Lhe above equation, we obtain
and the result follows by Theorem 2.55.
Lemma 7.16. (a) lim (IP,f
For all f (x),C: o n R: -
3'00
flln =
0, and
Proof: To see (a), let E > 0. By Defirlition 7.12(b), there exists J E Z and g(x) E VJ such that 11 f - g1I2 < ~ / 2 By . Definition 7.12(a), g(x) E and Pig(x) = g (x) for all j 2 J. Thus,
VJ
IIf
-
Pjf 112
Ilf 5 Ilf
g I PJg - P J f l l z
=
9/12 + IIPJ(f- g)112 2llf - 9112
172
Chapter 7. Multiresolution Analysis
To see (b), suppose that f (x)is supported in [-A, A] and let c > 0. By the orthonorrnality of { p j , k ( z ) ) k E z , and applying the Cauchy-Schwarz and Minkowski inequalities,
To do this, let c > O and choose K so large that
Therefore, if 2,'A
< 112, tiler]
Since for each k E Z, limj+-,
.2'A-k J-2JA-k
lq(x)l2dx = 0 ,
7.2. Definition of MRA
Since
E
173
> 0 was arbitrary, (b) follows.
The Two-scale Dilation Equation Lemma 7.17.
There exists a n t2 sequence of coeficients { h ( k ) } such that
in L' o n R. Moreover, we m a y write
where
Proof: Since cp E VoG Vl, and since by Lemma 7.15(a), { q l , k ( ~ ) )isk E Z an orthonorrrlal basis for Vl,
Thus, (7.7) holds with h ( k ) = (cp, c p l , k ) , which is e2 by Bessel's inequality. Equation (7.8) follows by taking the Fourier transform of both sides of (7.7).
Definition 7.18. LeL y3 ( z ) be the scaling function associated with a n MRA {V,}. T h e sequence { h ( k ) ) satisfying (7.7) is called the scaling filter associated with y ( x ) . T h e .function r r ~ ~ defined ( ~ ) by (7.9) is called the auxiliary function associn.ted with ~ ( 2 ) .
Remark 7.19. To call h ( n ) a filter is slightly misleading. According to Definition 4.13, a .filter must satisfy C , ( h ( n 1) < oo.This does not necessarily follow from the definition of h(n)given in Lemma 7.17. It is convenient t o makc this assumption, and we will do so in what fulluws. In fact the sca,ling filter will satisfy I h(n)1 < CX; in every example in this book but one (the Bandlimited MRA) .
En
Chapter 7. Multiresolution Analysis
174
7.3 Examples of Mult iresolut ion Analysis 3
.
The Haar M R A
Let Vo consist of all step functions f (x) such that (1) f (x) is L2 on R and (2) f (x) is constarlt on l l ~ einle~vals for k: E Z. 111other words, Vo is the collection of all scale 0 dyadic step functions, L2 on R. By Exercise 7.26,
where p(x) = XIO,L)(x). Since by Theorem 5.14, {T*(z)) is an orthonormal system of translates, this proves that Definition 7.12(e) is satisfied. For each j E Z, define 4 by Definition 7.12(d); that is, f (x) C T$ if and only if D2-.,f (x)E Vo. By Exercise 7.27, 4 consists of all step functions f (x) such that (1) f (z) is ~ % n R and (2) f (x) is constant on the intervals Ij,k, for k E Z. In other words, V, is the collection of all scale j dyadic step functions, L2 011 R. It remains only to verify Definition 7.12(a)-(c). To see tha,t Definition 7.12(a,) holds, we must prove that if f ( x ) E Ii,, then f ( x ) E y+l for ariy j E Z. Recall that by Definition 5.3, IJ,k = Ij+1,2kU Ij+1,2k+l for all j , k E Z. This means that if f (x) is corlstarlt or1 Ij,k for all k E Z, it is also constant on Ij+l,efor all 4!E Z. Thus, if f (x) is a scale j dyadic step function, it is also a scale j 1 dyadic step function, and Defiriitiorl 7.12(a) is verified. That Definition 7.12(b) holds is a direct consequence of Lemma 5.37(a). To see that Dcfirlition 7.12(c) holds, note that to say that f (x) E ng-, is to say that (1) f (x) is L2 on R and (2) f (z) is constant on thc intcrvals [o,CCI) and (-oo,0). But the only such function is the function identically zero.
+
v.
7.32 The Piecewise Linear MRA Let Vo consist of all functions f ( x ) , L~ and C0 on R., a n d linear on the intervals for k E Z. For each j E Z, define V, by Definition 7.12(d); that is, f (x) E if and only if D2-) f (x) E Vo.By Exercise 7.28, consists of all functions f (x), h2 and C0 on R, and linear on the intervals IJ k , for k c Z. It remains to verify Definition 7.12(a)-(c) and (e). To see that Definition 7.12(a) holds, we must prove that if f (x) E V J 1 then f ( x ) E V,+l for any j E Z. Since IJ,k = IJ+1,2kU IJ+1,21c+l for all for all k E Z, is also lincar on IJ+l,e j, k E Z , any function linear on for all .t E Z. Thus, if f (x) is L2 and C0 on R and linear on IJ,k for all k E Z, it is also L2 and C0 on R and linear on IJ+l,efor all !E Z. Thus, Definition 7.12(a) is verified. In order to prove Definition 7.12(b), let E > 0 and let f(x) be C: on R and supported in the interval [-A, A]. Since f (x) is continuous and has
5
compact support, it is uniformly continuous; so for j large enough, we know that given xo E I j , k , If ( x ) - f ( xo)l < for all x E IJ,kand k E Z. Now, let f j ( x ) be defined as follows. For each k E Z, let 13,k= [a,b) and let b-x x-a f (a') -f ( h ) , f,(4= - a b-a for x E Ij,+.Since b-x x-a = 1, b-a b-a
~/m
,
+
+-
.ir~i-.:~w =
b-x x 7 f
(
b -a b-x
x-a xb -) -
b-x b-u
x-a
b-a x
-
a
for all x E Ij,k.Thus,
This proves Defintion 7.12(b). To see that Definition 7.12(c)holds, note that to say that f ( x ) E ng-,V, is to say that ( 1 ) f ( x ) is L~ and C O on R arid (2) f ( x ) is linear on the intervals [O, m) and ( - O O , ~ ) . But the only such function is the function identically zero. To see that Defintion 7.12(e) holds, we will use Lemma 7.7. Before applying the Lernrna, we will need to establish the following facts. Let
Lemma 7.20.
I f f (x) is then f (x) can be written
CO
o n R and linear o n the intervals Io,k for k E Z ,
where the s u m converges pointwise.
Proof: Let k E Z be fixed, and consider (7.11) for x E 10,k.For any such x , the sum on the right side of (7.11) consists of exactly two terms. Hence the sum converges pointwise and we must verify that in fact f ( x ) = f ( k ) Tkcp(x)
+ f ( k + 1)T k + l v ( ~ ) .
(7.12)
176
Chapter 7. Multiresolution Analysis
Since T,cp(x) is linear on I o , k for all n t Z , it follows that the right side of Since Tkcp(k) = cp(0) = 1, and Tk+lcp(k) = p ( - 1 ) = (7.12) is linear on 0 , equation (7.12) is satisfied when x = k. Since Tkcp(k 1 ) = p ( 1 ) = 0 , and T k + l p ( k + 1) = p ( 0 ) = 1 , equation (7.12) is satisfied when x = k + 1. Thus, the right side of (7.12) is a linear function on I o , k that agrees with f ( x ) at the endpoints. Since f (x) is also linear on it must agree with f ( x ) on the wholc interval. Sincc this holds for any k E Z , (7.11) holds for every x E R.
+
Since we are interested in L~ convergence of (7.11) and since pointwise convergence does not necessarily imply L 2 convergence, we must prove L~ convergence separately.
Lemma 7.21. Suppose that f (z) i s linear o n the i.r~ter.uul [n,rL + 1) Jos some n E Z. Then
Proof: Since f ( x ) is linear on [n,n+ 1 ) , f ( x ) = f ( n )+ ( f (n+1 ) - f ( n ) (x ) n) for x E [n, n 1 ) . Therefore,
+
Because of the inequality 2ab 5 a 2
Therefore.
Also
+ b2, for any real numbers a and b,
7 . 3 . Examples of MRA
177
a'nd (7.13) follows.
Lemma 7.22. Suppose that f ( z ) is L~ and C O o n R and is linear o n the intervals I o , for ~ k E Z. T h e n (7.11) holds in L~ o n R. Proof: Since f ( x ) is L2 on R, (7.13) implies that
In particular limlni,, 1 f (n)I2= 0. Let M , N E N , and consider the partial sum cause (7.l l ) holds pnintwise,
x
f(-M)(x I M+1)
N
f ( n ) T n ~ ( x=)
n=-hl
f (4
f ( N ) ( N +1 - x ) 0
xr=N=_M f (n)T,,q(x). Beif x E [-A(- 1 , - M ) , if x E [-A[, N ) , if x E [ N , N I ) , otherwise.
+
Therefore.
+
lim
+
lim
If
(x>12 dx
I f (412dx
Chapter 7. Multiresolution Analysis
178
Lemma 7.23. Vo = span{T,p(x)). Proof: Since f (z) E 6 is L~ and C0 on R and linear on 1 0 , k for k E Z. Lernma 7.22 says that (7.11) holds in L2. Since every partial sum of the f ( n )Tnv(x) is in span{Tnp(x)}, Lemma 7.22 implies that form f ).( E s p a n { T T L c p ( . ~ ) ) . Conversely, suppose that f (x) E SjT%i{Tncp(x)}. By Exercise 2.62, we can find a sequence of functions { fk(z)}such that f k ( x ) E span{Tn cp(z)) and lirnk,, 11 f - f k 11 2 = 0. We need to show that f (z) is linear on the intervals loe for e E Z, that f ( x ) is C0 on R and that f (x) is L~ on R. Fix & E Z. Since each f k (x) is linear on In,! and since f k (x) converges to f (x) in L~ 011 Exercise 7.30(a) implies that f (x) is also linear on Io,g. Since by Exercise 7.30(b), the convergence is also uniform, f (x) is C0 on R. Finally, since each f k ( x ) is L2 011 R arid since f k ( x ) converges to f (x) in L~ OII R, f (z) is also L~ on R. Hence, f (x) is in Vo.
c:=-~
We are now in a position to prove that Definition 7.12(e) holds.
Theorem 7.24.
There is a functi07~@(LC), L~ o n R, S,UCJZ that:
( a ) ( T , , y ( x ) }i s a n orthonormal system 0.f translates, and
(b) Vo = 3FGi{Tr,F(x)).
Proof: We will prove this theorem by applying Lemma 7.7. To do this, we must first show that cp(.z.) = (I - 1x1) Xr-l,ll (x) satisfies (7.4). In order to do this, note that (Tncp,T,p)
=
213 if n = m , 116 if In - ml = 1, 0 otherwise.
(7.14)
By Exercise 7.11, it follows that
Hence.
for all y E R and (7.4) is satsified. The theorem follows by Lernma 7.7(a) and Lemma 7.23. See Figure 7.1.
7 . 3 . Examples of MRA
7.1. Top left: ~ ( z ) . Top IG(? + n) 12)-'I2. Bottom right:
FIGURE
(C,
7.3.3
right:
F(?).
Bottom
179
left:
A
G(?).
The Bandlimited MRA
Let Ifo consist of all functions f (x) bandlimited with baridlirriit 1 (Definition 3.47). In other words, Vo consists of all functions f (z),L~ on R, such that is supported in the interval [-1/2, 1/21. For each j t Z, define I.; by Definition 7.12(b). That is, f (x) E V, if and only if D z P Jf (x) E Vo By Exercise 7.31, 5 consists of a11 filnctiorls f (x) bandlimited wit21 bandlirriit 2J. To see that Definition 7.12(e) holds, let
y(?)
Then {T,cp(x)} is an orthonormal system of translates by Exercise 3.52. It remains to show that
To see this, note that if f (x) E Vo, then by the Shannon Sampling Theorem (Theorem 3-50), f (x) E s p a n { T , c p ( ~ ) ) , ~ ~For . the opposite inclusion, note that if f (x) E span{T,,,cp(~)}, then by Exercise 2.62 there is a sequence of functions {fk(x)}k6Nsuch that for each k E N, f k ( x ) E span{T,cp(x)),
180
Chapter 7. Multiresolution Analysis
and such that limk,, 11 f - f k I 2 with bandlimit 1. Therefore,
= 0. For each
=
Ilf
k
E
N, Jk(x)is ba~ldlii~iited
2
- fklla
-+ m , we conclude that h71,,,z 1 f ( ? ) I 2 dy = 0 and hence that Y(7) = 0 if 171 > 1/2. Thus, f ( x ) E Voand (7.15) follows. A
Letting k:
. . ,
,
4
Clearly the subspaces are nested so Definition 7.12(a) is satisfied. To verify Definition 7.12(b), let f (x) be C: on R. Since f (z) is also L2 on R, Plancherel's forniula says that f^(y) i d 2 on R By Corollary 2.37(b), there is a function g(y), C: on R such that 11 f - ? I z < E . Since ?(y) is L2 orr R, so is g ( x ) (it's inverse Fourier transform) by Pla,ncherel's fnrrn~lla,.Also by ,. Plancherel's formula, 11 f - gl12 = 11 f - 5112 < E . Since g(y) is supported in an interval of the form [-A, A] for some A > 0, then y (x) E VJ as long as 2 ~ - I> A. Definition 7.12(b) follows. To verify Definition 7.12(c), let f (z) E nJEzVi. Then f (7) is supported in [ - 2 j 1 , 2 i 1 ] for all j E Z. Letting j i m , it follows that f^(?) varrislies everywhere except possibly at y = 0. But since f (7) is L2, this rrrearis that .f (y) must be identically zero so that f (x) is ideritically zero as well. A
A
A
A
7.4
The Meyer M R A
Tliis example of an MRA is due to Y, hleyer, and the correspondiilg wavelet basis is historically the first example of a smooth orthonormal wavelet basis. The idea behind the Meyer MRA is to create a "smoothed" version of the bandlimited MRA by replacing the sharp frequency cutoff function X[-1/2,1/2)(y)by a smoother bell-shaped cutoff function in the frequency domain. The result will be a wavelet with better decay in the time domain (see Section 3.7). To this end, we define below the specific properties required of our smooth cutoff function.
Definition 7.25. Given k E N ( o r k = oo), a ,function b ( z ) is a C" bell function over [- 1/2,1/2] provided that b(z) is C" o n R and satisfies the follouling conditions: ( a ) b ( z )= 1 zf 1x1 5 l / 3 , (b) b ( z ) = 0 if 1x1
> 2/3,
7 . 3 . Examples of MRA
181
(c) 0 5 b ( x ) 5 1 for all x E R, and (d) x l b ( x + n ) 1 2
-- 1.
One way to construct such a function is as follows (see Figure 7.2).
c"'
(1) Define a "bump" function P(n:), (or Cw) on R and supported in the interval [-I, 11. If k is finite, this can be done with the followirlg construction (cf. Example 1.14(e) and Section 7.7.1). Define P o ( 4 = X[-l/(k+t),~/(k+l)](~). (7.16) and let Pn(x) = PO* PrL-l(x), for n E N. By this definition, (Exercise 7.32). Finally, let
(7.17)
PI; is Ck-I on R and is supported in [
1,11
where ck is chosen to guarantee that
(Exercise 7.33). For the k = m case, see Exercise 7.34. (2) Define a "sigmoid" furlctiorl B ( s ) 1,y taking an antiderivative of /?(n)
Then H(x) is C y o r C") on R arid satisfies: (a)
Q(2)=
O i f x 5 -1,
(b) Q(x)= 1 if n:
(c) 0 5 H(x)
2 1, and
< 1 for all n: E R.
(3) Define
(
)
=1
(
'7i
( 6 ) )
and
c ( r ) = cos
(i
~ ( 6 s ) .)
Then s(x) and c(z) are each C v o r CCO)on R and satisfy: (a) s(x) = 0 and c(x) = 1 if x 5 -1/6, (b) s(z) = 1 and c(x) = 0 if n:
2 1/6,
(7.21)
182
Chapter 7. Multiresolution Analysis
< c(x) 5 1 for all x E R, and
(c) O 5 s ( x ) 5 I and 0
+
(d) s 2 ( x ) c2(z) = 1, for all x
E
R.
(4) Define b(z) by
Then b(x) is a Ck bell function over [-1/2: 1/21. The Meyer MRA is defined as follows. Given k E N (or k = oo), let p(x) be such that +(y) is n C k bell function over [-1/2,1/2]. Define
and for j E Z, T(, by f (x) E T(, if and only if D2-If ( x ) E Vb. Thus, Definition 7.12(d) is satisfied. Since +(7) satisfies Definition 7.25(d),
{T,cp(x)) is an orthonormal system of translates by Lemrna 7.4, and Defirliliori 7.12(e) is satsified. In order to verify that Definition 7.12(a) holds, recall that by Lemnla 7.5, f (x) E Vo if and only if there is an t2 sequence of coefficients {a(n)) such that i ( 7 ) = ( x a i n ) e-2x.n-)
pi-) = a:-:
~(7).
(7.22)
Tl
We first show that Vo Vl. Let f (x) E Vb. In order to show that also f (z) E Vl , we must show that Dll2f (x) E VO. By (7.22),
Since @(2y) is supported on the interval [-1/3,1/3] and @(y) = 1 on [-1/3,1/31, 8 2 7 ) = F(27)
@(?I.
Therefore, A
Dl12f (7) = A a ( 2 y ) +(2y) Q(y). Define a(y) to be the period I extension of the function a(2y) @(2(~)). Since 6;(2(~))is continuous on R and periodic, it is bounded. Hence a(?) is L~ on [0, 1) and therefore has t2 Fourier coefficients {Z(n)}. Hence,
7.3. Examples of MRA
183
D l 1 2 f ( z ) E Vl. If for any j E Z, f ( z ) E 4, then D2-Jf ( x ) E V". By the above argument, D2-Jf ( x ) E Vl and hence D2-3-l f ( x ) E VO.Thus, and Definition 7.12(a)holds. f(x) E In order to prove Defiiiition 7.12(b),let E > 0 and let f (x) be C: oil R. By Exercise 3.55, we can find a function g^(y),C: on R and supported in an interval of the form [-A, A] such that 11 f - glla < E . In order to see that g E U J F Z y ,clloose J SO large that A < (1/3)2'. In this case @ ( 2 ~ ~=? 1) 011 [-A, A] SO that ?(?) = c ( y )F(2Ps7y). Let a(?) be the period 2.' extension of G(y).Since 2 > A, G(y) = a ( y )@(2-J y ) , and so Ij(zJ y ) = a(2J y ) $ ( y ) . Since F ( 2 J y ) is continuous on R, c ~ ( 2 ~has y ) t2 Fourier coefficients. Thus, g ( 2 - J x ) E Vo and g ( x ) E V J . To prove Definition 7.12(c),let f ( z )E n j E z V , . This means that for every j t Z, D 2 ,f ( x ) E Vo or that f^(?) has the form T ( y ) = a ( y )@(2-jy).But sirlcc +(2-.jy) = 0 if (2/3)2-1,lctting j + C O , wc scc that f (?) = 0 for every y 1 > 0, which implies that T ( y )isidentically zero and hence that f ( z )is identically zero.
lyl >
A
Exercises Exercise 7.26.
Show that
where p(:x) = Xlo,l)(z), where Vo is given in Section 7.3.1 Exercise 7.27. Prove that a step function g ( x ) is constant on the interfor k t Z if arid only if D2-.,g ( x ) is a step function coristant on vals the intervals for k E Z. Exercise 7.28. Prove that a function g ( x ) is Go on R and linear on the intervals Ij,k, for k E Z if and only if D Z p g 7 ( x ) is C" on R and linear on the intervals for k E Z. Exercise 7.29. Prove equation (7.14). Exercise 7.30. (a) Let {fn(z)),EN be a sequence of linear functions converging to a function f ( x ) in L2 on a finite interval I . Show that f ( x ) must also be linear on I. (Hint: Let f n ( x ) = anx b,. Use the fact that { f n ( z ) ) , is~ L2 ~ Cauchy on I-see Exercise 1.50(c)-to show that {a,,) ,EN and { b n j n E N are Cauchy and hence convergent sequences of numbers. Prove that if lim,,, a, = a and limn,, b, = b, then f ( z ) = ax b on I . )
+
+
(b) Prove that under the assumptions in part (a), f n ( x ) converges uniformly to f ( x ) on I .
184
Chapter 7. Multiresolution Analysis
FIGURE 7.2. From top left: P(x) with k = 2, the sigmoid function O(x), s ( x ) , and c ( x ) . The c2bell function b ( x ) (bottom).
Exercise 7.31. Prove that for any j E Z , f (z) is bandlirnited with bandlimit 23 if and only if D2-3f ( z ) is bandlimited with bandlimit I. (Hint: Use Theorem 3.40(a).) Exercise 7.32. Prove that for each n E N, the function /?,(z) defined by (7.16) and (7.17) is on R and is supported in the interval [-I, 11.
c"'
Exercise 7.33. For each k, find c(k) in (7.18) such that (7.19) is satisfied. (Hint: Consider the Fourier transform.)
7.4. Construction and Examples
185
Exercise 7.34. Show that the function e("-l)/("+1)
Q(x) =
0 1
if < 1, if x < -1, i f n : ? 1,
is a C" sigmoid function. That is, Q(x) is Cw on R, and satisfies (a) 8(x) = 0 if n: -1, (b) O(x) = 1 if z 2 1, and (c) 0 5 8(2) 1 for all x E R. Use tliis fullcliorl t o construct a CCObell functioii on [-I, 11.
k.
Therefore by Lemma 7.16,
and { Y ~ . ' J , / ~ ( x )is) ~complete. .~~Z
7 . 5 2 Proof of Theorem 7.35 Let h ( n ) be the scaling filter. Define the wavelet filter g(n) by (7.23) and $ J ( x )E Vl by (7.24). We will show that { $ J , k ( . x ) ) j , k E zis a complete orthonormal system on K. by showing that Lemma 7.48(a)-(c) are satisfied. To prove (a), note that since {T,cp(x)) is an orthonormal system of translates,
200
Chapter 7. Pvlultiresolution Analysis
where we have split the second sum into its even and odd terms, and used the periodicity of m o ( r ) . By (7.26): Iml (?/2)12 lml (?/2 1/2)12 = 1 also and using an argument similar t o the above
+
+
and (a) follows. To prove (b), note that by (7.26), and the orthonormality of {Tnp(x)),
7.5. Proof of Theorem 7.35
201
for all n, !E Z. To prove (c), let f(x) be C,O on R. Since Qof(z)= Plf(z)- Pof(z),and by Lemma 7.15(a), 1
=
f
l
k
l
k
and
f i f ( 2 )=
k
C(f. Po,?+)P " . ~ ( Z ) . k
(7.39) Taking the Fourier transform of both sides of (7.39), we have that
and
Since by Bessel's inequality
the Riesz-Fischer Theorem implies that a,(?) a n d b(y) are functions L2 on [O, 11 in the sense of Lebesgue. By Lemma 7.5, it is enough to show that there is an f2sequence { c ( n ) ) such that (7.42) ( 7 )= ~ Y ) ? ? ( Y ) = m,l(Y/2) $(7/2). If (7.42) did hold, then in light of the definition of Qo, (7.40), and (7.41), E(y) would satisfy
Kf
BY)
Thus, (7.42) would follow if we could find E(7) such that
Chapter 7. Multiresolution Analysis
202
Replacing y by y + 1 in (7.43) and remembering that a(?), b(y), and ?(y) each have period 1, we obtain
Combining (7.43) with (7.44), we obtain the system
(
ml (712) m1(y/2 112)
+
mo;;Pf!/2)
) ( Eh) ) - ( b(y)
a(y/2) a(yI2 + l / 2 )
1.
(7.45)
Since ml(?) =
e-Z7rz(y+l/Z)
mob + 112)
and lmo(y)I2 + Imo(7 + 1/2)12 = 1, the matrix
is unitary for all y
E
R; that is,
Applying this fact to (7.45) gives
so that
It can be verified directly that ?(-(y) has period 1, and since m l ( y ) l 5 1 for all y,that ?(?) is L~ 011 [O, 11 ill the sense of Lebesgue arld Ilence has t 2Fourier coefficients. Therefore, (7.42) holds, (c) is proved, and the result follows from Lemma 7.48.
Remark 7.49. In the course of the preceeding proof, we showed the following facts. (a)
(b) If j
(z))kEz is an ortl~onorrnalbasis for WJ
# j'
then W j IWjl.
(c) For each j E Z,
VjIW j .
7.6. Necessary Properties of the Scaling Function
203
(d) For each j E Z, q+l= V, @ W j . This means that every f ( x ) E y+, can be "split" as f ( z ) = fi (z) f 2 (z), where f (z) E V, and f 2 (x) E Wj . By (c), (f1 , f 2 ) = 0. This fact is t o be compared with the Splitting Lemma (Lemma 5.16) for the Haar system.
+
(e) Every f (z), L2 on R, can be written as a sum
where fj (x) E written as
W'
and by (Is),
(fj,fj1)
=
O if j
# j'.
This is usually
Exercises Exercise 7.50. Prove the second part of Theorem 7.35.
7.6 Necessary Properties of the Scaling Function In this ~ e c t i o nwe , ~ derive sorne propcrtics that thc scaling function, c p ( ~ ) , and the wavelet function, $(+), for a given MRA must satisfy. Throughout the section, it will be assumed that the scaling function is both L' and L2 on R and that the wavelet function defined by (7.24) is also L1 on R.
Theorem 7.51.
Proof: Let f (z) be given so that
I f 11
A
= 1, f (7) is continuous and sup-
ported in the interval [-R, R] for some R
> 0. By Theorem
3.40,
-
2 ~ h material e in this section is adapted from Daubechies, T e n Lectures on Wavelets, Society for Industrial and Applied Mathematics (1992) p. 144.
204
Chapter 7. Multiresolution Analysis
By Parseval's formula,
Since (2 j I 2 e 2 T i k 2 p " y ) k , z is a complete orthonormal system on the interval [-2j-', 2jP1], then as long as 2j-' > R, the above sum is the sun1 of the squares of the Fourier coefficients of the period 23' extension of the function f^(o)@(2-iy). Therefore, by thc Planchcrcl formula for Fourier series.
Sincc cp(z)is L1 on R, @(?) is continuous on R by the Riemann-Lebesgue Lemma (Theorem 3.9). It follows that
un,iforrn,ly on [-R, R]. Therefore, by Theorem 1.40(b),we can take the limit under tlie integral sign and conclude that
Ilf 1 ;
=
lim IPjf l l 2
3-rn
Hence I@(O) I = 1, and since p(z) is L1,
7.6. Necessary Properties of the Scaling Function
205
Corollary 7.52.
Proof: Since @(y)= mo(y/2) @(y/2),where 7 ~ (y) ~ 0is defined by (7.8),and since by (7.46), @(0)# 0, mo(0) = 1. Since by (7.25)
and since by the ortl-lonormality of {Tkcp(z)),
$(o)
mo(l/2) = 0, and hence on R.
= 0. Therefore,
(7.47) holds since $(z) is L'
Corollary 7.53. G(n)= 0 for. all irrtegers n # 0.
Proof: Since {T,,p(z)) is an orthornorrnal system of translates, Leinma 7.4 says that 7L
for all y E R. Letting y
= 0,
By Theorem 7.51, Ig(0) l2
=
this gives
1 so that
and (7.48) follows.
Corollary 7.54.
+
Proof: Note first that the function C , p(z n) is L1 on [0,1) and has period 1. By Corollary 7.53, @(0) = 1 and @ ( k )= 0 for all integers k # 0.
206
Chapter 7. Multiresolution Analysis
Therefore for each k E Z,
The only function with period 1 and Fourier coefficents equal t o b ( k ) is the function that is identically 1 on [0, 1). Therefore, (7.49) follows.
7.7 General Spline Wavelets In Section 7.3.2, we studied an MRA in which the spaces VJ consisted of continuous piecewise linear functions. The wavelet $(z) associated with this MRA is also piecewise linear and continuous, but is not compactly supported. However, since it has rapid decay a t infinity, it is very small outside a relatively small interval. As a result, the piecewise linear wavelet expansion has most of the advantages of the Haar expansion and does a better job of representing smooth functions. In particular, any partial sum of the piecewise linear wavelet series of a smooth function is continuous. We would like to do even better. In this section, we will construct wavelets that are smooth and piecewise polynomial. Specifically for each n E N, we will construct a wavelet that is C7'-I on R and that is a piecewise polynomial of degree n. To do this, we will require some preliminary properties of piecewise polyrlomial functions and specifically of spline functions.
7.7.1 Basic Properties of Spline Functions Definition 7.55.
Let B O ( x )= X [ 1 / 2 , 1 / 2 ~( x ) , and for n E N , define x+1/2
B r L ( x )= BTL-1* B o ( x ) =
S
B , l ( t ) dt.
(7.50)
x- 1 / 2
Thc function B,(x) zs callcd thc B-splinc (or spline) of ordcr n. For n E Z', define ( x ) by (7.51) & ( x ) = B,(x - ( n 1 ) / 2 ) .
g,
+
7.7. General Spline Wavelets
(k) Bl (s)= Bo * Bo(:) = ( 1
Example 7.56.
-
207
1x1)X [ - l , l j ( x )
x
€
[-3/2, -112)
otherwise
1 i 2+,Ix 1 4
+
B3 (4 = B2 * B o ( z )= =
x
[ 3 / 21 / 2 )
otherwise,
2-1/2
Exercise 7.64.
Bob) =
-
B
)
-
(4,
X[0>1]
{
2
z 0
zE[O,l),
: E , 2 ] ,
otherwise.
-
B3( z ) = Exercise 7.64.
Lemma 7.57.
( a ) B,(z) i s supported in [-(n
[O,n+11.
-
T h e functions B, (x)and B,(x) satsify the following properties.
+ 1)/2, ( n + 1)/2], and
&(x)
is supported in
( b ) B, ( x ) and
-
En(z)are cnP1 o n R.
(c) B,(x) is equal t o a degree n polynomial o n intervals of the f o r m [ k ,k k E Z.
gn(y)=
(y)
sin(~y)
n+l
-
A
, and B,(-y)
=
e-Ti'nli'
(?)
sin(7i.y)
+ 11,
n+ 1
.
Proof: (a) Exercise 7.65.
(b) The proof is by induction on n. Clearly B l ( x ) is continuous on R. By (7.50), B,,(x) = B, - (t) d t . By the Fundamental Theorem of Calculus, BA,(x) = B n P l ( x 112) - B,,-l(z - 112).
J~!T~;:
+
By the induction hypothesis, B,,-l (x) is CnP2on R. Therefore, B, (z) is CIL-1 on R. (c) The proof -is by induction on n. Clearly the result holds for B ~ ( X ) . Now assume that B,,(z) = pk(x) on [k, k + 11, k E Z, where pk (x) is a degree n polynomial. Fix k, and let x E [k,k I]. By Exercise 7.63,
+
Since the indefinite integral of a degree n polynomial is a degree n polynomial, we are done.
+1
(d) Exercise 7.66.
7.72 Spline Multiresolution Analyses Given n E N, define the degree n s p l i n e m u l t i r e s o l u t i o n a n a l y s i s by
and for j E Z, define
Note that any function f (z) E Vois Cn-' on R and is a degree n polynomial on each interval k E Z. Any function f (z) E V, is Cn-' on R and is a degree n polynomial on each interval I j , k , I% E Z.
7.7. General Spline Wavelets
209
We need to verify that { V , ) j c z is an MRA. To verify Definition 7.12(a), we need the following lemma.
Lemma 7.58. For each n E N ,
B,(z) satisfies n
E n
where m o ( y ) = 2-'"'
(1
Proof: If n = 0, then
(7) = mo (712) En (74121,
+ e-2"")nf
(7.54)
l.
Bo(z)= Xlo,ll
(x), and by Exercise 5.18,
Taking Fourier transforms gives
A
B,,
A
Since (?) = (go ("r))"+' (Exercise 7.63 and the Convolution Theorem), raising both sides of ( 7 . 5 5 ) to the n 1 power gives (7.54).
+
Since
and Definition 7.12 (a) follows. The verification of Definition 7.12(e) is contained in the following lemma. 1
Lemma 7.59. thonormal basis for
Thcrc cxists a function F(x) such that {T,@(X)}
is a n or-
Vi.
Proof: In light, of T,emrna 7.7, it, will he sufficient t,o show that there exist constants A, B > 0 such that for all y E R,
To see this, note that
210
Chapter 7. Multiresolution Analysis
A
Since CI:(En(?+6)l2 has period 1, it is enough t o show that it is boundcd above and away froni zero on the interval [-1/2,1/2]. For y E [-1/2,1/2],
and
To verify Definition 7.12(b), we require the following lemma. Fix n E N , let f (z) be Da.r~dlirruiLedwilh b a ~ ~ d l i n z R i t > 0, (Definition 3.4'7) and suppose that f (7) is C' on R. 'L'hen
Lemma 7.60.
A
( f , D~~ &) D~~TxEn (x)= f (x)
lim J'S
Proof: Applying Parseval's formula, wc obtain
(f,D~~T, B,)
=
/f
(x) ~
2
)
dx
R
=
~ ( y~) ~ ( 2 -2 j j /~2 e) 2 r i ( 2 - " y ) k d,Y
7.7. General Spline Wavelets
211
Recall that {2-312 e-2TL2-'7 is an orthonormal basis on [ - 2 k 1 , 2,1-']. This means that as long as 2J-I > R, ( f , D2J is the kth Fburier A
fi;) g,,(2-j;).
coefficient of the period 2i extension of the fur~ction
C f^(?+ 2%) 5,(2-j1. + k ) .
Hence
..
-
2i/2
(7.57)
k
In light of (7.57), taking the Fourier transform of the left side of (7.56) gives
The term in the sllrri for m = 0 is
A
and in fact, this term converges to f ( 7 )as follows.
-
A
since limj+, that
As long as 2"' that
11- B , ( 2 - 9 ) l2
=
0 uniforrr~lyon [-R, R].It remains to show
> R, the supports of each term in thc sum arc disjoint so
Chapter 7. Multiresolution Analysis
212
-
A
y(f^(~)
Since iscontinuous and compactly supported, it and also Bn(?) are L" on R. Therefore,
But since
1
5 jj
03
c
m=l
( T
1 (~ 1/2))2n+2
.2-.1
R
J-2-,n
s i ~ l ~ (" ~+ y~dy. )
Since { 2 j f 1 R XI-2.1R,2-J n]( x ) ) , is~ an ~ ~approximate identity arid si~lce ( ~ y is) a, continuous function that vanishes at y = 0: Theorem 2.33 says that ,in1 2~ l 2 - I R sin2"+ 2 ( ~ yd )y
,i+"o
Thus,
2
3R
= 0.
7.7. General Spline Wavelets
213
asj-m. To complete the verification of Definition 7.12(b), let f (z) be C: 011 R. Let F > 0. By Corollary 2.37(b), and Plancherel's formula, there is a bandlimited function g(x) such that i?j(y) is C0 on R and such that 11 f - g1I2 < ~ / 2 By . Lemma 7.60, we can find J > 0 such that
Therefore,
T~B,,
Sillce Ek(y. D ~TJk E n )0 2 . 7 (z) t I/,, Definition 7.12(11) holds. 'l'hc verification of Definition 7.12(c) and (d) is left as a n exercise (Exercise 7.68).
Exercises Exercise 7.61.
Verify the formula for B3(z)given in Exan~plc7.56(a).
Exercise 7.62. Verify directly that
E,(r)
is CrL-'on R for
IL =
1, 2, 3.
Exercise 7.63. Prove that for each n E N,
Exercise 7.64. Calculate explicit formulas for B3(x) and Example 7.56. Exercise 7.65.
Prove Lemma 7.57(a).
~
~
( as2 in)
214
Chapter 7. Multiresolution Analysis
Exercise 7.66. Prove Lemrna 7.57(d). (Hint: Use the Convolution Theorem. Theorem 3.21 .) Exercise 7.67. Prove that for each n E N, B,(x) satisfies E n
( 7 ) = m n ( ? / 2 ) En(?/2),
where mu(y) = cosn+l(2ny). Exercise 7.68. Verify Definition 7.12(c) and (d) for the degree n spline
MRA.
Chapter 8
The Discrete Wavelet Transform 8.1 Motivation: From MRA to a Discrete Transform The MRA structure allows for the convenient, fast, and exact calculstiorl of the wavelet coefficients of an L2 function by providing a recursion relation between the scaling coefficients at a given scale and the scaling and wavelet coefficients at t,he next coarser scale. In order to specify this relation, let {V,) be an MRA with scaling furictiorl cp(z). Then by Lernma 7.17: ~ ( 3 : satisfies a two--scale dilation equation (7.7)
The corresponding wavelet $(z) is defined by (7.24)
where g ( n ) = (-l)?'h ( l - n) (7.23). Suppose that we are given s signal or scqucrlcc of data {cO(k))kEZ. We lllake the assurrlption that c o ( k ) is the ktli scaling coefficient for some underlying furlctiorl f (z); that is,
fur. each km E Z. This assunlption allows the recursive algorithrrl to work, but
it is important t o understand that this interpretation ofco(k) 0,s th,e sca,ling coe.ficient of some function f (x) is dzfferent from the usual interpretation of data in signal processing as the samples of some underlying function f (z).' We will show that all scalirig and wavelet coefficients o f f (x) for all negative scales car1 be calculated using a very convenient recursive algorithm. lIn spite of the interpretation of data as scaling coefficients and not samples, the samples of a function f (z), { f ( k ) J k E z will , often be treated as input to the Discrete Wavelet Transform. That is, it is assumed that f ( k ) zz ((f, which need not be tile case. Strang referrs to this assumption as a "wavelet crime." Strang strongly suggests preprocessing sampled data by taking c o ( k ) = f ( n )p ( n - k ) . See Strang and Nguyen, Wavelets and Filter Banks, Wellesley-Cambridge Press (1996) p. 232-233.
En
)
216
Chapter 8. The Discrete Wavelet Transform
Since p o , o ( x ) = En h ( n ) yl,,,(x), it follows that for any j. k E Z ,
Similarly,
&,k(x) = C g ( n- 2k) ~ j + l , n ( x ) .
(8.2)
For every j E N, define c j ( k ) and d j ( k ) by
for k E Z . Then b y (8.1))
n
In order t o see t h a t the calculation of cj+,(k) and d j + l ( k ) is completely reversible, recall thal by Definition 7.14,
and that by (7.37),
Also, by Definition 7.14, for any j E Z ,
Writing this out in terms of (8.5) and (8.6) gives
By matching coefficients, we conclude that
cJ ( k ) =
+
cj+l ( n )h(k - 2 ~ ' )
dj+l ( n )g ( k
-
2n).
(8-7)
We sumnlarize these results in the following theorem. Let {V,) be an M R A with associated scaling function p(x) and scaling filter h ( k ) . Define the wavelet filter g ( k ) b y (7.23) and the wavelet function $ ( z ) by (7.24). Given a function f ( z ) ,L~ o n R, define for k € Z,
Theorem 8.1.
and for every j E N and k E Z ,
and CJ
( k )=
C C J + I (h~( )k
-
2n)
+ C d j + l ( n )g ( k - 2n).
218
Chapter 8. The Discrete Wavelet Transform
8.2 The Quadrature Mirror Filter Conditions Theorem 8.1 suggests that the key object in calculating ( f ,pj,k)and ( f , $,,k) is the scaling filter h ( k ) and not the scaling function cp(x). It also suggests that as long as (8.7) holds, (8.3) and (8.4) define an exactly invertible transform for signals. The question is: What conditions must the scaling filter h ( k ) satisfy i n order for the transform defined b y (8.3) and (8.4) to be in,?~ertihEeb y (8.7)? These properties will be referred to as the Quadrature Mirror Filter ( Q M F ) conditions and will be used in the next section to define the Discrete Wavelet Transform. In this section, we will motivate the QMF conditions, refornlulate them in the language of certain filtering operations on signals called the approximation and detail operators, and finally give a very simple characterization of the QMF conditions that will be used in the design of wavelet and scaling filters.
8.2.1 Motzuation from MRA In this subsection, we will derive some properties of the scaling and wavelet filters I L ( ~ z ) and g ( n ) that follow directly from propcrtics of MRA. Ultimately this will motivate our definition of the QMF conditions.
( I ) By Theorem 7.51, Jk ~ ( xdx)
=
# 0 so that
h ( n )2-1/2 n
L
~ I ( z d) s .
Cancelling the nonzero factor JR p ( s ) d z from both sides, it follows that
By Corollary 7.52, JR $ ( s )d x = 0 so that
8.2. Thc QMF Conditions
219
Hence,
This is equivalent to the statement that
(Exercise 8.15). (2) Since {YO,, (x)) and {cpl,,(x)}are ortllorlorrnal systerrls on R,
Hence.
Since {$o,,(z):n E Z ) is also an orthorlormal system on R, the same argument gives y(k) y ( k - 271,)= 6 ( n )
C
Since ( $ o , ~ ~PO.^^) , = 0 for all n, m E Z , the same argument gives
for all n E Z.
(3) Since for any signal co (n),
Chapter 8. T h e Discrete Wavelet Transform
220
where co (m) h(m - 2k)
cl ( k ) = m
and dl ( k )=
co(m) y ( m - 2k) ; m
it follows that
Hence we must have
We surrlmarize these results in the followirlg theorem. Theorem 8.2.
Let {V,) be a r ~MRA with scaling filter h ( k ) and wavelet .filter g ( k ) g i ~ ~ e hy n , (7.23). Then
h ( k ) h ( k - 2 n ) = x g ( k ) y(k
(c) k
(d)
-
2n) = 6(n),
k
1
g ( k ) h ( k - 2n) = 0 for all n t Z , and
h(m- 2k) h ( n - 2k)
(e) k
+x g ( m
-
2k)g(n- 2 k ) = 6 ( n - m)
k
Remark 8.3. (a) Condition (a) is referred to as a normalization condition. The value fi arises from the fact that we have chosen to write the two-scale dilation equation as p(z) = Enh ( n )21/2 p(2x - n) . In some of the literature on wavelets and especially on two-scale dilation equations,
8.2. The QR4F Conditions
221
the equation is written p(x) = C h ( n ) cp(2x - n). This leads to the normalization C , h ( n ) = 2. The choice of normalization is just a convention and has no real impact on any of the results that follow.
(b) Conditions (c) and (d) are referred to as orthogonality conditions since they are immediate consequences of the orthogonality of the scaling functions at a given scale, the orthogoilality of the wavelet functions at a given scale, and t h e fa,c,t,t,hat the wavelet functions are orthogonal t o all scaling functions at a given scale. (c) Condition (e) is referred to as the perfect reconstruction condition since it follows from the reconstrliction formula for orthonormal wavelet bases.
8.2.2
The Approzimation and Detail Operators and Thei~, Adjoints
The goal of this subsection is to reformulate Theorem 8.2(c)-(e) in terms of certain filtering operations on signals referred to as the approximation and detail operators. These operators will also play an important role in the defi~itionof l l ~ eDiscrete Wavelet Transform.
Definition 8.4. Let c ( n ) be a signal. ( a ) Given m E Z, the shift operator r, is defined by
( b ) The downsampling operator ./. ,is defined by
(Note: (.j,c)(n) is formed by removing every odd term in c ( n ) . ) ( c ) T h e upsampling operator
( (Note:
(T c ) ( n ) is formed
is
)
=
defin,ed h p
{
c ( n / 2 ) i f n is even, a if n is odd.
by inserting a zero between adjacent entries of
c ( n ).) See Fzgurc 8.1.
Definition 8.5. Civcn a signal ~ ( nand ) a filter h ( k ) , define g ( k ) by (7.23). T h e n the approximation operator H and detail operator G correspondtng to h ( k ) are defined by
222
Chapter 8. The Discrete Wavelet Transform
FIGURE 8.1. Top left: A signal c ( n ) , top right: ( J c ) ( n ) bottom: , (Tc)(n) (right). The approxiniation adjoirit H* a n d detail adjoirlt G* are defined b y
Remark 8.6. ( a ) The operators H and G can be thought of as convolution with the filters h(n) = h ( - n ) and g ( n ) = y ( - n ) followed by downsampling. That is, ( H c ) ( n )=J ( c * b)( n ) and
( G c )( n )=J ( c * g)( n ) .
( b ) H * and G* can be thought of as upsampling followed by convolution with h and g ( x ) . That is, ( H * c ) ( n )= ( T c ) * h ( n ) and
( G A c ) ( n= ) (?c)*g(n).
( c ) The operators H * and G* are the formal adjoints of H and G . That is, for all signals c ( n ) and d ( n ) , ( ~ cd ), =
x
C ( H C ) (=~ )e ( k ) ( ~ * d ) ( =t )(c. ~ * d ) k
k
8.2. The QMF Conditions
223
and
(Exercise 8.16). Taliing the above remarks into consideration, we car1 reformulate the conditions of of Theorem 8.2(c)-(e) as follows.
Theorem 8.7. Given a filter h ( k ) , de6ne g ( k ) by (7.23) and let I denote the identity operator. Then:
if and onlg i,f H H * = GG* = I , where I is the identitg operator o n sequences,
for all n E Z i f a,nd only if HG* = GH* = 0 , and
k
k
i f and only if
H*H
+ G*G = I ,
where I is the identity operator.
Proof: Exercise 8.17
8.2.3
The Quadrature Mirror Filter (QMF) Conditions
In Theorem 8.2, we set forth conditions on the scaling filter h ( k ) that are consequences of the fact that h ( k ) is the scaling filter for an MRA. In Theorem 8.7, we saw that sorrle of these conditions can be characterized in terms of the approximation and detail operators and their adjoints defined in Definition 8.5. In this section, we will show that all of the conditions in Theorem 8.2 can be written as a single condition (Theorem 8.11(a)) on the auxiliary function mo(7)= 1/1/2 Enh ( n )e-2Xin' plus the normalization condition mo(0)= 1. These two conditions will be referred t o as the Quadrature Mirror Filter ( Q M F ) conditions.
224
Chapter 8. The Discrete Wavelet Transform
We will need the following lemmas.
Lemma 8.8.
Given a signal c ( r ~ )the , followir~g hold.
( a ) For every m E Z
(y)
( T ~ ~ C ) =~ e-2x"%(y)
See Figure 8.2.
Proof: (a) Exercise 8.18. (b) To prove (8.21), we compute tlie Fourier coefbcier~tsof the right-hand side. Let n E Z be fixed.
and (8.21) follows by the Uniqueness of Fourier series.
(4
8.2. The QMF Conditions
225
FIGURE 8.2. Top left: The Fourier series, ?(?) of the signal c ( n ) of Figure 8.1. Top right: ( J c ) ~ ( Bottom: ~). (?C)~(?).
Lemma 8.9. ml
Given a jiltcr h ( k ) , define g ( k ) b y ( 7 . 2 3 ) , 7 7 ~ " ( ~ )2lrj (7.9), and
( y ) b y (7.26). T h e n for any signal c ( n ) ,
Proof: To prove (8.23), note that by defining h(n)= h(-n) that
and recall that for any c, H c
=I( c * h).Taking the Fourier transform,
226
Chapter 8. The Discrete Wavelet Transform
The other part of (8.23) follows similarIy. To prove (8.24),note that
and recall that for any c, H*c = h
* (Tc).
Talking t h e Fourier transform,
The other part of (8.24) follows similarly. In Theorem 8.7(c), the equivalence of the conditioris X I ,g ( k ) Fb(k - 2 n ) = 0 for all n E Z (Theorem 8 . 2 ( d ) ) and HG* = G H * = 0 was demonstrated. The next lemma shows that Theorem 8 . 2 ( d ) is a consequence only of the way in which the wavelet filter y(k) was defined arid is not related t o any other property of the scaling filter h ( k ) . Lemma 8.10.
Given a filter h ( k ), define the jilter g ( k ) b y (7.23). Then
and
HG* = G H * = 0.
Proof. To see (8.25), note that by the definition of g ( n ) ,
Then,
+
+
mo(7)r r ~ o ( + y 112) ml(7)m l ( y 1/21 = mo( y )mo( y + 112) - e - 2 " i ~ m o( y + 1/2)e2"" mmo (7) = mo(r)m,o(y + 112) - m o ( + ~ 1/21 ~ o ( Y )
To see (8.26), note that given any signal (8.23),
~ ( Y L ) ,a i d
applying (8.24) and
8.2. The Q M F Conditions
227
U
We can now prove the following theorem.
Theorem 8.11.
Given a filter h ( k ) , define g(k) b y (7.23), m o ( y ) b y (7.9), rnl ( y ) b y ( 7 . 2 6 ) , and the operators H , G, H * , n,n,d G* hg (8.1 8 ) an,$ (8.19 ) . Then. the following are equiua81ent.
( a ) 1mn(y)l2
+ + m o ( y+ 1/2)12
1.
h ( n )h ( n - 2 k ) = d ( k )
(b) n
( c ) H* H
+ G*G = I .
( d ) H H * = GG* = I .
Proof: (a)
* (b). Applying Parseval's formula to h ( n )and ~
=
Jo
+
e2""'y ( I r n 0 ( ~ / 2 ) 1 ~Irn0(7/2
Therefore, (h) is equivalent t o the statement that
~ ~ hgives ( n )
+ 1/2)12) dy.
228
Chapter 8. The Discrete Wavelet Transform
But this is true if and only if Imo(y/2)12
+ mo(y/2 + 1/2)12 = 1
for all y E [0, I ) , which is (a). (c). Given a signal c(n), (a)
Similarly,
Therefore,
by (8.25). Therefore (c) holds if and only if 4 7 ) (lmo(7)12+ lm1(Y)l2)= ?(YL
A
for every signal c ( n ) , which is true if and only if
+
I m ~ ( r l 2I2 ) + l m o ( ~ l 2 1/2)12 = 1 for all y E [O,l). This is (a). (d). C ivcn a signal c(n), (a)
8.2. T h e Q M F Conditions
229
Similarly,
Therefore,
H H * c = GG*c = c, for every signal c ( n ) if and only if
which is (a).
Definition 8.12.
Given a filter h ( k ) , define m.o(y) h g (7.9) Then h ( k ) is a
QhfF provided that: ( a ) m o ( 0 ) = 1 and
+
( b ) ( r n 0 ( ~ / 2 ) (+~m o ( y / 2+ 1/2)12 = 1 for all y F R. W e refer t o ( a ) and ( b ) as the quadrature mirror filter ( Q M F ) conditions.
Theorem 8.13. Suppose that h ( k ) is a QMP. Define g ( k ) by (7.23). Then: (a)
h ( n )=
d?,
( e ) x g ( k )h ( k - 2 n ) = 0 for nil n E Z . k
Proof: (a) By the definition of mo(y),
and (a) follows.
230
Chapter 8. The Discrete Wavelet Transform
(b) Since mo(0) = 1 and l r r ~ ~ ( +~ I) 7 7 ~ mu(1/2) = 0. But by the definition of g(k),
l2
+~ (1/2)12 ~ =
1, it follows that
so that
and (b) follows. (c) By Exercise 8.15, (b) is equivalent to (c). (d) By Theorem 8.7(a), (d) is equivalent t o the staterrleilt that H H * = GG* - I , which by Theorem 8.11 is equivalent to Definition 8.12(b). (e) By Theoreni 8.7(b), (e) is equivalent to the staternerit that G H * = HG* = 0, which is Lemma 8.10.
(f) By Theorem 8.7(c), (f) is equivalent t o the statement that H* H+G*G I , which by Theorem 8.11 is equivalent to Definition 8.12(b).
=
Remark 8.14. It follows frorn the first part of Theorem 8.13(d) that an FIR filter h,(n) that satisfies the QMF conditions car1 be supported on only an even number of points. That is, if h(n) = 0 for n < hf a r ~ dn > N , h ( M ) # 0, and h ( N ) # 0, then N - M 1 is even (Exercise 8.19).
+
Exercises Exercise 8.15. (7.23), then
Prove that if h,(k) is any filter and if g ( k ) is given by
Exercise 8.16. Verify the statement made in Reinark 8.6(5) Exercise 8.17. Prove Theorem 8.7. Exercise 8.18.
Prove Lemma 8.8(a).
Exercise 8.19.
Prove the statement made in Remark 8.14.
Exercise 8.20. The purpose of this exercise is to show that the formula (7.23) for the wavelet filter g ( k ) is not arbitrary. Prove that if h(k) is a
8.3. The Discrete Wavelel Trafislol.nl
231
real-valued FIR QMF and if g(k) is any real-valued FIR filter such that Theorem 8.13(a)-(f) are satisfied, then g(k) must be of the form
for surrle odd integer equivalent to
71.
(Hint: (1) Show that the QMF conditions are
H(z) H(zpl)+G(z) ~ ( z - l )= 2
and
H(-z) ~ ( z - ' ) + G ( - z ) ~ ( z - l )= 0,
where H ( z ) and G(z) are the z-transforms of h(n) arid g(n). (2) Show that these identities imply that ~ ( z - l )= G(-a) cwzN and G(zpl) = - H ( - 2 ) a z N for some N E Z and a E C . (3) Sllow that u2 = ( - I ) ~ + ' , and rewrite the identities from (2) in terrns of h ( n ) and g ( n ) . )
8.3 The Discrete Wavelet Transform (DWT) 8.3.1
The D WT for. Signals
Summarizing some of the considerations giver1 in the prcviolis section, we can now make a formal definition of the discrete wavelet transform.
Definition 8.21.
Let h ( k ) be a Q M F , define g ( k ) by (7.23), and let H , G , H * , and G* be giz~enby (8.18) and (8.19). Fix J t N . The D W T of a signal c o ( n ) , is the collection of sequences { d , ( k ) :1 5 j _< J ; k E Z ) U { c ~ ( k k) :E Z ) , where C,+I ( n )= ( H c 3 ) ( n , )
and
d,+l(n) = ( G c , ) ( n ) .
(8.30)
The inverse transform is defined by the fomw~~la
If J = m, then the D W T of co is the collection of sequences {d,(k): j E N ; k E Z ) .
8.3.2
The D WT for Finite Signals
In practice, we never deal with infinite signals and this raises the question of how to take the DWT of a finite signal. There are essentially two ways to do this. (1) Zero Padding. This approach is to treat the finite signal as an infinite signal padded with zeros. Then apply the DWT as in Definition 8.21.
232
Chapter 8. The Discrete Wavelet Transform
The main difficulty with this approach is that the representations we obtain are not as efficient as possible. For example, suppose that our signal has length 2N. That is, suppose that co(n) satisfies co(n) = 0 if n < 0 or n > 2N - 1. Suppose also that the scaling filter h ( n ) and the wavelet filter ( 1 1 ) have length L > 2, with L even. In this case, the sequences cl = Heo and dl = Gco would each have length (2N L - 2)/2. Similarly, cj and dj would have length at least 2Np3 (1 - 2-j)(L - 2) (Exercise 8.24). This means that the total length of the DWT for co would be at least
+
+
where J E N indicates the depth chosen for the DWT. Thus, the representation of a length 2N signal (which may be thought of as an 2N-vector) is achieved with at least 2N + J ( L - 2) coefficients. This may be acceptable for certain applications, especially if J and L are srnall compared to 2N, but is not the most efficient representation possible. (2) Periodization. A rriore efficient representation is achieved if the finite signal is viewed as a periodic signal. The following lernnia shows that the DWT defined in Definition 8.21 can be applied to periodic signals. In this case, for a period 2N sequence, c j ( n ) and dj(n) will have period 2N-.j so that if the depth of the DWT is J N, then the DWT of the sequence has exactly 2N coefficients (see Exercise 8.25(b)).
0. By Lernma 8.22, (Hc)(n,) has and car1 period p/2. Thus, H is a linear trarisfornlation from R" to be represented by a p/2 x p matrix. We will call this matrix HI, or sirriply H when its size is clear frorri context. Sirriilarly, the detail operator G car1 be represented by a p/2 x p matrix G,, (or G).
Example 8.23. Let h(k) be a real-valued scaling filter of length four2; that is, h,(k) = 0 if k < 0 or k 4. Define g(k) = ( - I ) ~h(3 - k ) , so that also g(k) = 0 if k < O or k 4, and let p = 8. Then
>
>
and
The approxirnation and detail adjoints H* and G* are represented by the adjoints of the matrices H and G (Exercise 8.27). 'For examples of such filters, see Exercise 8.26
234
Chapter 8. The Discrete Wavelet Transform
Since
=
I,,
where I, is the p x p identity matrix, W, is an orthogonal matrix. Therefore, the first step in the DWT of an A{-vector co is given by
the second step by
and in general, the j t h step by
The A4 x DI matrix W representing the DWT taken to level J is therefore the product of J such matrices. Since each matrix in this product is orthogonal, so is W (Exercise 8.28). Basis Vectors for the Finite DWT Since the DWT d of an M-vcctor co is realized as the product of co with an I\/I x M orthogonal matrix W, it follows that each number in the vector d is the inner product of co with the corresponding row of W. Taken as a set of vectors in R", the rows of W form an orthonormal basis for R"',
8.3. The Discrete Wavelet Transform
235
which is referred to as a discrete wavelet basis for R M .These vectors can be calculated and plotted simply by taking the inverse DWT (8.31)of the canonical basis vector ei = [0 . . . 0 1 0 . . . 0] in Rn*,where 1 is in the i t h position. A plot of the discrete wavelet basis for R16 based on the Daubechies lengtli-four scaling filter is shown in Figure 8.3. This lor firldirlg and displaying the wavelet basis vectors is actually the same as the cascade algorithm described in Section 8.4.2. The only difference is that here we consider our sequences t o be periodic, and in Section 8.4.2, the sequences are assumed to be zero padded.
FIGURE 8.3. Discrete wavelet basis for R~~based on the Daubechies length-four scaling filter.
236
Chapter 8. The Discrete Wavelet Transform
Exercises Exercise 8.24. Let co(n) be a finite signal that satisfies co(n) = 0 if n < 0 or n > 2N - 1 for some N E N. Also suppose that the scaling and wavelet filters h and g ( x ) have length L > 2. Prove that if cj and dj are given by Definition 8.21, then cj and d j are finite sequences with length equal to the smallest integer greater than 2N-j (I - 2-j) (L - 2).
+
Exercise 8.25. (a) Prove Lemma 8.22. (Hint: Use the fact that a periodic sequence must be bounded.) (b) Show that if the depth of the DWT of a sequence with period 2N is J 5 N, then the DWT has exactly 2 N coefficients.
Exercise 8.26. Prove that all four-coefficient scaling filters (that is, QMFs h(n) such that h(n) = 0 for n < 0 and n > 3) can be parametrized by
ho =
JZ -+4
JZ
cosa , hl = 2 4
sin a
2 '
h2=
JZ 4
cosa 2 '
h3=
JZ
sina
4
2
(Hint: The QMF conditions reduce to:
(b) h i
+ hf + h; + h i = 1, and
+
+
+
(1). Show that (ho h2)2 (hl h ~ - )1. ~ (2). Show that ho h2 = hl h3 = &/2.
+
+
Jz + t , hl = $ + s , (3). Letting ho = 4 s, t E R, show that s2 + t2 = 114.)
h2 =
JZ - t , and h3 = a Jz - s , for
Exercise 8.27. Show that when applied to period p signals, the approximation and detail adjoint operators H* arid G* are linear transformations from Rp t o R2" and can be represented by the matrices Hzp and Gzp respectively. Exercise 8.28. onal.
Show that the product of orthogonal matrices is orthog-
8.4 Scaling Functions from Scaling Sequences We have seen how the scaling function, ip(x), associated with an MRA gives rise to a sealing filter, h(k) , namely h(k) = ( c p , cpllk). We have also seen that any scaling filter associated with the scaling function of an MRA
8.4. Scaling Functions from Scaling Sequences
237
must satisfy the QMF conditions (Theorem 8.13). The question we address in this section is: G i v e n a QMF, can we find a scaling function associated with it that gives rise t o a n MRA?
8 . 1
The Infinite Product Formula
(5)with scaling function cp(x),we know by Lemma 7.17
Given an MRA that p(x) satisfies
8 7 ) = mo(y12) @(7/2),
where mo(y) is given by (7.9). Therefore, we may write
Letting n 4 co,it follows that
provided that the infinite product makes sense. In order t o deal with infinite products of functions such as in (8.32), wc will require a few definitions and theorems. I~lfi~lite Products of Numbers
Definition 8.29. Let {z,},~N be a sequence of complex numbers. Then
provided that the limit exists.
Remark 8.30. (a) If z, (b) Let p~ =
n,-,z,, N
= 0 for any
with z ,
n E N, then
lim
r ) =~ z
Then p ~ / p = ~ z- ~~Since . lim p~
= 0.
# 0 for all n , and suppose that
N+w
N--too
nr=,z,,
=
lim
N+oo
p ~ + l=
z,
238
Chapter 8. The Discrete Wavelet Transform
limN+,m z~ = 1. In other words, if a n infinite product of numbers convcrgcs, then t h e limit of t h e terms must be 1. In what follows, we will always assume that the sequence { z , ) , , ~ satisfies z, 0 for all n and tha,t lim,,, z,, = 1.
+
Let {Z,),~N be a sequence of complex numbers. Let log(z) denote the ~ , I . Z ~ L C~Z*~II.LUL~ofI Cthe / ~ logarithm; that is, i f z = Izl e z e , with 0 8 < 27r, then log(z) = In lzl + i 0 . If log(z,) converges, then so does z,.
Theorem 8.31.
0 , t h e infinite p m d u c t
converges absolutely a n d zn Lm o n [-R, A].
Proof: Since mu(0) = 1,
Lct C = En Ih(n)(In(.Then
since for all x, ( sin(x)l 5 1x1. Thus,
and so given R
> 0, for all
171 I R ,
Therefore, by Theorem 8.33 for every R absolutely and in LDO on [-R, R].
> 0,
njc,mo(y/2i) converges 17
Theorem 8.34 asserts that the infinite product formula converges uniformly on intervals [-R, R] to some liniit function. Anticipating that this liinit function will be the Fourier transform of our scaling function, let us write M
8(r)=
n
j=1
m"(r/2').
240
Chapter 8. The Discrete Wavelet Transform
Note that since mo(0) = 1, G(0) = 1. It remains t o prove that in fact @(?) is L2 on R so that by Plancherel's formula we may corlclude that our scaling function cp(x) is also L2 on R. Since mo(y) has period 1, the partial e product n j = , rno(7/2j) has period 2'. 'l'herefore, it does not make sense to say the the partial products converge in L2 on R since no periodic function can be close to an L2 function in the sense of the L~ norm. We therefore restrict our attention to one period of each partial product and define
Note that Kt(?) and hence also p e ( ~is) L2 on R and that p e ( ~is) a bandlimited approximation t o the scaling function p(x). The next theorem asserts that in fact k ( y ) converges in L2 to G(y).
Theorem 8.35.
L e t h ( k ) be a finite QMP', a n d let m o ( y ) be defined by (7.9). S.n,,ppose t h a t there is a n u m b e r c > 0 s u c h t h a t
For l E N , define G ( y ) by (8.34) a n d ?(y) b y (8.33). T h e n : (a) ,!%(y) -+ (P(y) in L'
o n R, a n d
Proof: The proof will use Theorern 1.41. Specifically, we will show that: (1) for each R > 0, Fl(7)+ $(Y) in LO" on [-R, R],
(2) there is a constant co and y E R and
(3)
> 0 such that lF!(y)(I co (@(y)(for all
.e E N
JR l a o I 2 d 7 < m.
Once (1)-(3) have been established, we apply Theorem 1.41 as follows. Consider the sequence of functions {Ik(r) - @(y)12)eEN, and note that by (I), for each R > 0, I & ( ? ) - F(7)l2 + 0 in LDO on [-R, R]. Since II.e(7)
-
@( ? ) I 2
1 2(lGh(7)l2+ I ( P ( Y ) I ~ )
(Exercise 8.41), it follows from (2) that IKp(7)
-
G(7)I2 I 2 ( 1 +
4)lG(7)12,
which is L1 on R by (3). Therefore, Theorem 1.41 applies and r
241
8.4. Scaling Functions frorn Scaling Sequences
Proof of (1). Lct R > 0 be given. By Theorem 8.34, we know that
e n,=, mO(y/2j) for all
in L" on [-R, R]. As long as 2'-' > R, = y E [-R, R]. Thus, k ( y ) t p(y) in Lm on [-R, R].
Proof of (2). Since h ( k ) is finite, mo(y) is continuous and hence so is
n:=,
ny=,
mo( r / 2 j ) for ea,ch I t N. Since @(y)= lirn,,, mo(r/2j)uniformly on every interval [-R, R], @(y)is continuous on R. Since @(O) = I, there is an E > 0 such that if Iyl < E, then I@(y)l2 1/2. Since
we may choose J so large that < E for all lyl then for 1 5 j 5 J, /2-jyl 114, and by (8.35),
0
Proof: - (1)- Suppose that $(x) is supported in the interval I, which has the = [0,a] for some a > 0. It follows that the function Q j l k ( x )= form I = 23/'$(21x - k ) is supported in the interval = [2-3k, 2-1(k a ) ] ,and that the length of denoted is 2 ~ j Denote a the center of the interval Ij,k by Ej,k, and note that % , k = 2-(3+')a 2-jk. As a consequence of (9.6), given any polynomial p ( x ) of degree no greater than N - 1, and any j , k E Z ,
Gtn
IG,~~,
+
+
( 2 ) Since f ( z ) is C N on R, for each j , k t Z , f ( x ) can be expanded in a Taylor expansion about the point E j c k . That is,
where
.-
for some number J between Zj,k and x. If x E
Ij,k,
then we have the estimate
9.1. Vanishing Moments
255
(3) Applying (9.8) to (9.9),we compute
Applying the estimate (9.10) and the Cauchy-Schwarz inequality,
Note that with C = ( l / N ! )a3j222-N 11 f ( " ) 1 1, ,
(9.7) is satisfied.
We can actually go a little bit further by observing that since f (")(z)is C0 on R and since the lengths 1 i 0 as j i m, f (N)( E ) will be very close to f(N)(Zi,k) for 5 E I;,a.Therefore, for large j ,
Hence
256
Chapter 9. Smooth, Compactly Supported Wavelets
Hence we can make the qualitative statement that as j gets large,
'l'he value of (9.11) is that it identifies the decay of the wavelet coefficients of a smooth function as a local phenomenon. That is, suppose that f (x) is not CN on all of R, but does have N continuous derivatives at some point zo.This means that f (N)(x)is in fact defined on some small interval I' containing the point xo. Since $(x) is supported in the interval I as described above, computing the coefficient ( f ,Qj,k) requires knowing the values of f (z) only on the interval ?;,k. From this, it follows that the estimate (9.11) holds for every J and k such that &,r L 1'. The above paragraph suggests the general principle that the wavelet coefficients corresponding to the smooth parts of a function will be very small compared with the wavelet coefficients corresponding to the nonsmooth parts of a function. This observation has implications for the compression of images using wavelets since marly classes of images consist of large areas of constant intensity (for example, background areas) that can be interpreted as the smooth parts of the image, separated by edges, that can be interpreted as the nonsmooth parts of thc imagc. Therefore, the wavelet transform of such an image will have a few large coefficients, corresponding to the edges, and a lot of small coefficients, corresponding t o the regions of near-constant intensity. This is just what is required in most image compression algorithms.
Example 9.6.
Consider the linear spline function f (x) defined by
It is clear that f(x) is linear on the intervals (-co,-I), (-1: O), (0, I ) , and (I,m), and hence continuously differentiable infinitely many times there. At the points -1, 0, and 1: f (x) is continuous but has a discontiniuty in
9.1. Vanishing Moments
257
its first derivative. Hence f (x)is only C0 on R, but in fact f (x)is C" at all but a few points. Now suppose that $(x) is a real-valued wavelet function with two vanishing moments; that is,
and that $(x) is supported in the interval [O, 31. We will see in Section 9.2 that such a wavelet exists. If j = 0, then by considering the support of the fullctions $ J ~ , ~ ( Xit) , follows that ( f ,Q O , k ) = 0 for k 1 and for k -4, so that there will be at niost four nonzero wavelet coefficients of f (x) at this scale. The same holds when j is any negative integer as well. = If j = 1, then support considerations lead to the conclusion that (f, O for k 2 2 and for k I -5 leading to at most six nonzero wavelet coefficients at this scale. For j = 2, something different happens. Support considerations lead to the conclusion that (f, ?,b2$) = 0 for k 2 4 and k I -7, but we also observe that if k = 0, then
>
0 such that
for all y E R . T h e n { T n P ( z ) }is linearly independent.
Proof: Wc will find a function q(x) such that {T,$(x)) is biorthogonal to {Tnp(z)). The resull will lollow by Lerrlma 10.3. Define g(x) by
Ry (10.1), the denominator is never zero so that this division is defined for all y.
294
Chapter 10. Biorthogonal Wavelets
Note that
By Lernrna 10.8, {Tn@(x))is biorthogonal to {Tnv(x)) and the result follows by Lemma 10.3. Lemma 10.10. Suppose that p ( ~ )satisfies (10.1). T h e n for any
1
~ I 5I c2: 1 ~ z ( r ) ~ ~ d x
(10.3)
Enc ( n )Tnp(z)
and ?(T) is
1
I
5
where { ~ ( n )is} a finite sequence such that f (z) i t s Fourier transform.
=
Proof: Since f (z) = C , c ( n )Tncp(x) by Plancherel's Formula,
If (10.1) holds, then
which is (10.3). Lemma 10.11. A compactly supported function p ( x ) , L' o n R, s a t i ~ f i e s (10.1) if and only zf there exist constants A, B > 0 such that for all f (x) E
10.3. Riesz Bases of Translates
295
Proof: Note first that by Plancherel's formula,
(===+)Suppose that (10.1) holds, and let f (x)E span(', is a finite sequence {c(n)) such that
~(x)).Then there
Therefore, for each m E Z,
Therefore (f,T,p) is the -mth Fourier coefficient of the period 1 function ClcI @ ( 7 k) E(7) and by Planchcrcl's formula for Fourier series,
+ l2
so that
Chapter 10. Biorthogonal Wavelets
296
But by (10.3),
so that c Z 1f
ll;
5
C l(f,Tnv)125
(.:~;l
llf 1:.
and (10.5) follows. ( k The )
proof of the converse is somewhat inore complicated and is not given here. The outline of a proof is given in Exercise 10.15.
Remark 10.12. (a) If y ( x ) is compactly supported and satisfies (10.1), then by Exercise 7.1 1 , E nI?(y+n,) 1' is a trigonometric polynomial bounded away from zero. Therefore, ( E n[@(y+ n ) I 2 ) ' can be written as an L2 Fourier series. If
then the function F(x) given by (10.2) satisfies
Taking the inverse Fourier transform of both sides, we obtain
in L2 on R. Therefore, F(x) E s p a n { ~ , , y ( ~ ) } . (b) In fact, @(x) given by (10.2) is the unique function in span{Tnp(z)} such that {T, Fix)) is biorthogonal to {T, cp(x)).However, there exist other functions @(x) n o t in span{Tncp(x))such that {T,,F(x)) is biorthogonal to {Tncp(x)).This fact will be exploited in the construction of Riesz bases of wavelets that have compact support. Theorem 10.13. Let p ( z ) be L~ on R and compactly supported and let { T , p ( x ) } be a Riesz basis for s p a n { T n p ( x ) ) . If there exists a fwr~ct.Lo,r~ G(5) such that {T,$(x)} is biorthogonal t o {T,p(x)}, then:
( a ) for every f (x)E Sf%Zi{T,p(x))
where the s u m converges i n L~ on R and
10.3. Riesz Bases of Tkanslates ( b ) there exist constants A, B
> 0 such that for
297
all f (x) E ~ { T n P ( x ) ) ,
Proof: Wc will first prove (a) and (b) lor f (x) E span(Tncp(x)) and then generalize to f (x) E -{T, cp(x)). To see (a), let f (x) E span{Tnp(x)). Then there is a finite sequence {c(n)) such that (10.9) f (+) = c(n) TnP
x n
By the biorthogonality of {TnF(x)),
and (a) follows. To see (b), recall that by (10.3), there are constants el, c;?> 0 such that for all f (x) E span(T,cp(x)),
where E(7) is the Fourier transform of the sequence { c ( n ) ) of (10.9). By the Plancherel formula for Fourier series and the fact that c(n) = ( f , T,g):
and (b) follows. Generalizing the previous results to span{Tncp(x)), we will prove (b) first. By Exercise 2.62, given f (x) E @Zii{T,,cp(x)},there is a sequence { f m ( x ) } m Gsuch ~ that fm(x) E span{Tn9(x)) for each rn and
Also, by the Cauchy-Schwarz inequality,
for every n E Z. For every N E N, since (10.8) holds for each f,,(x), we have that
Chapter 10. Biorthogonal Wavelets
298
Since the right side of t,he ineqixality has nothing to do with N, we may let N + cc and conclude that
Therefore, we have established the upper bound in (10.8). To see that the lower bound in (10.8) holds, note that by the CauchySchwarz inequality for t2 sequences, we have that for each m E Z ,
Since (10.8) holds for each fm( x ) ,and since the upper bound of (10.8) holds for each f m ( x ) - P ( 4 ,
Letting m
-+co,we have that since I f r n ;
i
(1f 1 ;
which is the lower bound in (10.8). To prove (10.7) for f (x)E span{Tncp(x)l-7 let
E
> 0 and consider the partial sum
and since I fm- f
1 ; + 0,
10.3. Riesz Bases of Translates
299
for some N, A1 E N. Let g ( x ) t s p a n { T , y ( x ) } be such that 11 f - 9 / 1 2 < 6. Since (10.7) holds for g ( x ) . we know that for all N , A f t N large enough.
'l'herefore,
By the Cauchy-Schwarz inequality,
by (10.3), there is a constant
c2
such that
and by (10.8)
Therefore, for all n, ~$1t N large enough,
Since
t
> 0 was arbitrary,
(10.7) follows.
Exercises Exercise 10.14.
Prove Lemma 10.8.
300
Chapter 10. Biorthogonal Wavelets
Exercise 10.15. The purpose of this exercise is to prove the "only if" part of Lemma 10.11. (1) Since cp(x) has compact support, Exercise 7.11 implies that the function + k)I2 is a period 1 trigonometric polynomial and therefore bounded on [0, 1).Therefore it remains only to prove the existence of the lower bound of (10.5).
(2) Equations (10.4) and (10.6) hold regardless of whether (10.1) holds. (3) Let { F , ( x ) ) , be ~ ~the Fejkr kernel defined by Definition 2.29, and fix [ O , l ) . There is a trigonometric polynomial ZN(?)such that lZM(y)l2 = F N ( y - y o ) (Hint: Use an argument similar to that of Theorem 9.21 on spectral factorization of the Daubechies polynomials.)
70 E
(4) Let ZN (7) = C I cN L (n) e-2Tiny, and let f N ( x ) = Use (10.6) to show that
C,,cN (n,)y ( x
-
n).
(5) Use (10.4) to show that
(6) Show that if the lower bound in (10.1) fails t o exist, then for every > 0, we can find a function f (x) (which will be f N ( x ) for some N and snme yo) si~chthat E
10.4 Generalized Multiresolution Analysis (GMRA) In order to construct Riesz bases of wavelets, we require a generalized notion of Mulliresoluliorl Analysis. The defirlition below is exactly the same as Definition 7.12 except that (e) no longer requires orthonormality for the collection {T,,y (x)) of shifts of the scaling function. Definition 10.16.
A generalized multiresolution analysis (GMRA) o n R i s a sequence of subspaces {V,),tz of functions L~ o n R satisfying the following properties. (a) For. all j E Z, T/j C T/5+1
Chapter 10. Biorthogonal Wavelets
302
Lemma 10.18. Suppose that {V,},Ez is a GMRA with scaling function p(x). T h e n there ezists a n e2 sequence {h(n)},€z called the scaling sequence or scaling filter
l;h.n.t
S'LLC~,
p(x) =
h ( n )2'"
y (2x - n)
(10.10)
and a period 1 function mo(y) called the auxilliary function such that
Proof: By Lemma 10.17, {cpl,,(x)).,,z is a Riesz basis for Vl. Since cp(x) E Vo Vl, (10.7) says that there is an t2sequence {h(n)),Ez such that ~ ( 1 := )
C h(n) y ~ , , ( x ) C h ( n )2'1' =
~ ( 2 s n), -
which is (10.10). Taking the Fourier transform of both sides of (10.10) gives (10.11) with
10.4.2 Dual GMRA and Riesx Bases of Wavelets Dual GMRA Definition 10.19. A pui,,. of G M R A ' s
{&}lE~ with scaling function p(x) and with scaling function $(z) are dual t o each other if {TnP(z)}is biorthogonal t o {T,@(x)).
{e}3E~
Remark 10.20. (a) Since there may be more than one function F(x) such that (Tny(x)) is biorthogonal t o {TnF(x)), there may be more than one GMRA {&}jEZ dual to {ll,Ijtz. (b) Since {Tncp(x)) is a Riesz basis for Vo = span{T,cp(x)), it is always possible t o define @(x) by (10.2). In this case, the GMRA generated by @(x) will be dual t o the one generated by cp(x). However, if @(z)is defined by (10.2) then {T,F(x)) is also a Riesz basis for Vo.From this, it follows that = V, for all j t Z.
&
for dual GMRAs. Definition 10.21. Let y(x) and $(x) be scaling functions For each j E Z , define the approximation operators P,, P,, and the detail operafunctions f (z) b y tors Q j and I), on
10.4. Generalized Mult,iresolution Analysis
Lemma 10.22.
&, Q,
The operators P,,
303
and Q,satisfy the following prop-
erties. (a,) P, f ( z )= f ( x ) i f and only i f f ( x ) E f ( x ) E V,.
( b ) &, f ( x ) = 0 for all f(z)E
4, and
f ( z )= f (z) if and only if
-
11/7 and &x,f(z) = 0 fi)r a11 f ( x ) E I/li.
( c ) For all f ( x ) , C: on R ,
lirn((P,f-fl(2-0
and 0-'3
lim JJP3fJ12 =0.
h,,)
Proof: ( a ) Pj f (x)= f ( x ) i f and only if f ( x ) = C , ( f , pjln(x). Since { ~ j( ~ , ~7 . 'is) a,) Riesz ~ ~ ~basis for T/, and since { @ J , , L ( x ) ) l L EisZ biorthogonal to {vj,n( x ) ) n , z , Theorem 10.13 says that f ( x ) = C , ( f , @ j , n )pj.n (x) if and only if f ( x ) E ~ ( c p j 3 , ( x ) ) , G z= V j . A sirnilar argument works for Pf ( x ) . so that by (a), P j f ( x ) = P j + 1 f ( x )= (b) If f ( x ) t q,then f(x) E f ( x ) . Hence Q3f ( x ) = P,+l f ( x ) - Pj f ( x ) = f ( x ) - f ( x ) = 0. A similar argument works for G j f ( x ) . ( c )The proof of (c) is only a slight modification of the proof of Lemrna 7.16. The details are left as an exercise (Exercise 10.27).
0
The Wavelet $(z) and the Dual Wavelet '4;;(x)
Definition 10.23. Let p ( z ) and @ ( x ) be scaling functions for dual GMRA's, and let h ( n ) and x ( n ) be the scaling filters corresponding to p ( z ) and @ ( x ) (Lemma 10.18). De,fine the dual .filters g ( n ) and g(n) by
-
g(n)=(-l)nh(l-n)
and
-
g(n)=(-l)nh(l-n).
(10.12)
Define the wavelet $ ( z ) and the dual wavelet J ( x ) b y
$(.)
=
g ( n ) 21/2y ( 2 x - n )
and
-
$(x) =
?(n,) 21'2 G(2x - n)
304
Chapter 10. Biorthogonal Wavelets
The followi~lglemma contains some basic properties of the wavelet and its dual.
Lemma 10.24. Let $ ( x ) and
&(z) be the wavelet and dual wavelet corre-
sponding to the G M R A ' s {&) with scaling function p ( z ) and function @ ( x ) . Then the following hold. ( a ) (/, ( z ) t
(b)
{&,n
{c)
with scaling
K and $(z) E GI.
(x)} is biorthogonal to { Q o , , ( z ) ) .
( c ) { $ ~ , n ( x ) 2s ) a Riesz basis foi-sl,air{$o,,,(z)} and { , & ~ , ~ ( is x ) a} Ricsz basas for span{Go,, ( x ) } .
-
( d ) For all n, m E Z, ( $ o , n , ~ o , m= > ($)o,n,po,m)= 0 .
( e ) For any f ( z ) ,C: on R, &of ( x ) E sp8n{&,n (z))
-
and
Qo f
(z)
~ { & o ,( z, j ) .
Proof: (a) This follows from the definition of $(x) and $(x). (b) Taking the Fourier transform of both sides of (10.13) gives
where -2ni(y+1/2)
So(r+ 1/21
(10.15)
-27r2(-y+1/2)
mo(r + 112).
(10.16)
m1(r) = e and %1(y) = e
Since {po,,(z)) is biortl~ogonalt o {@o,,,(x)),Lemma 10.8 says that
Combining (10.15), (10.16), and (10.17) gives
10.4. Generalized hlultiresol~ltionAnalysis
305
Repeatirlg the argument giving (10.17) gives
Therefore, by Lemma 10.8. {$",,, (x)} is biorthogonal to
{qo,,(x)}.
(c) By Lemma 10.11, it is enough to show that for some constants e l . 0,
c2
>
,-.
and similarly for &(?). Sincc {po,n(x)}is n Riesz basis for span{~o,,,(x)), Lemma 10.11 implies that there are constants A, B > 0 such that
Therefore.
so that
A B < lmo(r12)12+ l h 0 ( r / 2 + 1/2)12 < A. B -
-
(10.21)
306
Chapter 10. Biorthogonal Wavelets
also. Finally,
and similarly,
A2
0 and all y E R, t h e n (10.25) and (10.26) hold.
Condition (10.30) is sa,t,isfied by all examples we will consider in this book.
Exercises Exercise 10.27. Prove Lemma 10.22(c).
10.5 Riesz Bases Orthogonal Across Scales In this section, we will construct2 Riesz bases of wavelets that satisfy a partial orthogonality condition; specifically. they are orthogonal across scales. 'This construction is due t o Chui and Wang, A cardinal splzne approach t o wavelets, Proceedings of the American Mathematical Soceity. vol. 113 (1991) p. 785-793.
312
Chapter 10. Biorthogonal Wavelets
That is, we will construct Riesz bases of the form { I ) ~ , ~with ) ~the . ~ ~ ~ property that ($j,k, $ j / , k / } = 0 whenever j .f j'. An advantage to this construction is that the dual GMRA's are the same; that is, 5 = This means that finite approximalions to a function f (x) have similar properties. For example, if we start with the piecewise linear MRA of Section 7.3.2, then the partial sums
&.
are both in VJ and hence are both piecewise linear approximations t o f (x). This example will be explored in detail in Section 10.5.1. - A drawback t o this construction is that the wavelet $(x) and its dual $(x) cannot both be compactly supported. This is a problem especially for numerical algorithms involving these bases. This difficulty can be overcome by allowing the dual GMRA's t o be different (Section 10.7). Let be a GILIRA with compactly supported scaling function p(z). Let a(?)= 1x7 + k)12.
(4)
C Ic
Then a(?)is a period 1 trigonometric polynomial bounded above and away from zero on [ O , 1 ) (since ( p o , n ( x ) }is a Riesz basis for Vo).Define as in (10.2) by
@(XI
$(7)= a(?)-' @(?)-
By Lemma 10.9, {cpo,,(x)) and { @ o , n ( ~ )are ) biorthogonal. Since is COon R, we can write it as an L~ Fourier series as
(10.31)
a(?)-'
Taking inverse Fourier transforms of both sides of (10.31), we have that
Thus, @(x) E Vo and it follows eventually that
4
and that T/, = for all j E Z. Now, in order to define the wavelets $(x) and &(x) in this case, note that by (10.11), there is an L~ Fourier series mo(y)such that
10.5. Riesz Bases Orthogonal Across Scales
313
where
% ~ (= r )@(a?>-'@(r) ~o(Y). Reinembering that
a(?) is
(10.32)
real-valued and has period 1, we define by
(10.15),
arid ?)(l,-;
= e-2.rri(r+1/2)
mo(r t 1/2).
(10.34)
Then the wavelets $(z) and &(z) are given by
and
10.5.1 Example: The Piecewise Linear GMRA Recall the MRA defined in Section 7.3.2 in which Vo consisted of all functions f (x), C0 on R and linear on the intervals lo,k for k E Z. We showed in Section 7.3.2 that the MRA (4) satisfies Defir~iliorl10.16(a)-(d).It re) that {yo,,) is a Riesz mains to show that there is a function v ( ~ such ha.sis for Vo.However, letting p(x) = (1 - 1x1)X , - l , , ~ (x), then Exercisc 7.11 implies that
so that
1 -
1 be given, and assume tliat the theorem holds for all rn < n. If rl is even, then n = 2m with m < n and
Hence, n = 0+2c0 + 4 ~ l + arid since
m" (z)=
C l r ( k ) 2'12
+2jcj-i, ~ ~ " ' ( 21
k),
arid the result follows for n,.A siniilar argument gives the result if n is odd (Exercise 11.8). Some examples of wn(?)are calculated and plotted in Figure 11.4 for various orthogorial MRA (the absolute values are plotted). Note tliat l G ( ? ) 1 is symmetric and appears to have a single domirlarlt peak in [O, GO) as well as several smaller peaks. The location of this dorriinant peak is referred to as the nominal frequencg of wr' (x). A
The Nominal Frequericy of w"(x) In order to identify the nominal frequency of w n ( x ) rnore precisely, consider the wavelet packet functiorls associated with the bandlirnited MRA (Section 7.3.3). We will identify these wavelet packet functions as ll)gL(x). Recall that the bandlimited MRA had "perfect frequericy localization" with = ( X I - 1 / 2 ] (7) + X[1/2,11(~)) e-z"7. F(7)= XI-1/2;1/2,(7) and Using Theorem 11.3, we can calculate explicitly the Fourier transform of the w g , ( x ) . Some of these (in absolute value) are plotted in Figures 11.511.7. Note that for each n E Z+, there is an interval Ao,, [0,GO) of the form A",, = [k(n)/2,( k ( n ) 1)/2) for some k(n,) E Zt such that IwgL(7)I = XAo,,,(x) XAo,n(-x). We therefore define the nominal frequency of any wavelet packet furictivr~, w n ( x ) as the rlurrlber k('rz)/2. Sorrle useful properties of the intervals Ao,, and the numbers k ( n ) (the number Ic(r2) is referred to as the Greycode of n ) are summarized in Theorem 11.6. A
+
+
11.2. Localization of Wavelet Packets
341
~z(~)l
FIGURE 11.4. for n = 0, 1, . . . , 7 corresponding to the Daubechies twelve-coefficient filter.
Definition 11.4.
Suppose that a nonnegatzve integer n E Z+ has the ,urz.ig.ue
representatzon n = (0
+ 2cl
+4c2
+ ...
where c, = 0 or 1. T h e n n has cvcrl sequcrlcy zf if CJn=, t n 2.7 odd.
Definition 11.5.
1 zJcJ,
xi,=,,is E,,
ellen and odd sequcrlcy
Given j E Z and n E Z + , we define the interval A,.,, b y
where A"., and k ( n ) are defined above.
342
Chapter 11. Wavelet Packets
1
08
06 04
02
or-
-
-
-
I
-
FIGURE 11.5. Top left t o bottom right: Imo(r/2)1, Imo(r/4)1, Imo(r/8)l, and their product,
1t~'g~,(~)l.
Theorem 11.6. For each j E Z and n
E
z':
( a ) If n has even sequency, then k ( n ) = 2 k((n/2)) and i f n has odd sequency k(n) = 2 k((n/2))+ 1. Here (x) denotes the greatest integer less than or equal to z. ( b ) A,,, = A3-1.2n U AJ-l,an+l, where the union is disjoint. If n has even sequency, A , - 1 ) 2 , is h e lefL / ~ u l f oAf J , n , and 2f 12 has odd sequency, A , - 1 . 2 , is the right half of A,,,,
Proof: (a) The proof is by induction on n. If n = 0 , then n has even sequency and Ao.o = [0, 112). Thus k(0) = 0 = 2 k((012)) = 2 k(0). If n = 1, then n has odd sequency and A"," = [1/2,1). Thus k(1) = 1 = 2 k((1/2)) = 2 k(0) + 1 and the theorem holds for n = 0, 1. Suppose that n > 1 has even sequency and that m = (7212) < n also has even sequency. By Theorem 11.3, it must be true that (why?)
with
11.2. Localizat.ion of Wavelet Packets
343
FIGURE 11.6. Top left t o bottom right: ( m l ( y / 2 1), ( m o ( y / 4 )1, ( r n 0 ( ~ / 8 ) 1 , and their product, J w h L( 7 )I.
+
and 2Ao,, = [k(m),k(m) 1) with k(m) even by the induction hypothesis. Since mo(712) = 1 if y E [& - 1/2, Q 1/2) for even integers !and 0 elsewhere, it follows that
with Ao,, = Ik(m), k(m)
+
+ 112) = [k(n)/2,(k(n) + 1)/2).
Thus, k(n) = 2 k(m) = 2 k((n/2)). The other three cases are similar: (1) n > 1 has even sequency and m = (n/2) < n has odd sequency, (2) n 2 1 has odd sequency and m = (n/2) < n has even sequency, (3) n > 1 and m = (n/2) < n have even sequency. The details are left as an exercise (Exercise 11.10).
(b) Since Aj,, = 2 AjPl,,, it is enough t o show that for all n E Z + , 2 A",, = U A0.2n+l By Definition 11.5,
If n has even sequency, then 2n has even sequency and
212
+ 1 has odd
344
Chapter 11. Wavelet Packets
FIGURE 11.7. Top left t o bottom right: lmo(y/2)1, lml (r/4)1,lrrlo(r/8)1, and their product, Iw;, (y)l. A
sequcncy. By (a), k ( 2 n ) = 2 k ( n ) and k ( 2 n
+ 1 ) = 2 k ( n ) + 1. Since
(b) follows for n,.Tho argurr~entis similar when n has odd sequerlcy (Exercise 11.11). Figure 11.8 lists the first 16 wavelet packet indices and their nominal frequencies. By cornparing these frequencies with the graphs given in Figures 11.2 and 11.3, we see that the greater the nominal frequency, the larger the r~urriberof zero-crossings of 21in(:c) per unit length. The reader is asked to explore this relationship in Exercise 11.14.
Exercises Exercise 11.7. Prove Theorem 11.2.
11.2. Localization of Wavelet Packets
n
sequency
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
even odd odd even odd even even odd ocld even even odd cvcrl odd odd even
k(n)
345
Ao.??
FIGURE 11.8. The first 16 wavelet-packet indices and their nominal frequencies.
Exercise 11.8. Conlplete the proof of Theorern 11.3 by considering tlie case wlien n is odd. 7nG(r). Exercise 11.9. Verify tlie fornlul;~sgive11 above for v~"(a), ~rlyx)). and 7u7(.c) .
Exercise 11.10. Complete the proof of Tlleoreal 1I.G(a) by consitlering the remaining three cases. Exercise 11.11. Cornplete the proof of Theorem 11.6(t)) by considering the case in which n has odd seqliency. Exercise 11.12. Write a h2ATLAB program that cornputes k(r2) and one that computes k-l (n). Exercise 11.13. Complete the proof of Theorern 11.6 by considering the case in which n has odd sequency. Exercise 11.14. (a) For several wavelet packets with conlpact support, compare the nominal frequency of wn(x) with the number of zero-crossings per unit lengtli (that is, for each w n ( z ) , count the number of times its graph
346
Chapter 11. Wavelet Packets
crosses the x-axis and divide this riuinber by the length of its supporti~ig interval).
(b) Conjecture a relationship between the ill~rnberof zcro-crossings per unit length and the nomina.1 frequency of a wavelet packet. (c) Write a RIATLAB prograrn that takes as input a finite scaling filter h ( k ) a d a11 integer n and returns tlie number of zero-crossings per uiiit length of the wavelet packet uln(x)corresponding to that scaling filter. (d) Check whether your corijecture i11 part (b) persists for large scalirlg functions with more and more vanishing rnoments.
71, arid
for
11.3 Orthogonality and Completeness Properties of Wavelet Packets Since U I ' ( Z ) = p(x) and llll(.r) = 1/1(.7:), a simple restateirierit of tlie sccond part of Theoreni 7.35 with J = 0 is that the collectiorl
is an ort,tlonormal basis on R. It is also true tllal { t ~ ( ; ' , ~is )an~ orthonornial basis on R (Theorem 11.19). Thcre may bc other collectiorls of scalcd and shifted wavelet packets that form orthonorrnal lxises on R. Tlie goal of this section is to dcterlrliiie exactly wlic.11 this takes place. The solut,ioxl is closely related to the properties of the intervals {A,,>,,) defirietl in Definition 11.5. To each sllcli iritcrval is associated a sut~space Wl.,, as follows.
Definition 11.15. Gzven j E Z , and n E Z' , define
E Z, the silbspax:e WJ,l= Wj, where WJ is the wavelet subspace defined in Definition 7.47. Note also that WoSo = Vo (see Defiriitioli 7.12(e)). As rnentiorierl above, the collection
Remark 11.16. (a) For eacli j
is an orthonormal basis on R.Note further that thc, collection of intervals { A j , l .Ao.o),lEz+.k,z is a disjoint partition of [0, m).
(b) By Theorem 11.19, tlie collectio~i{ u ) $ , ~E Z) +~. k E Z is a11 orthonorinal basis on R. Note that { A o ~ n ) n G zis + a disjoint partition of [ O , m). Also, by Corollary 11.20, for each fixed j E Z, {wZk. ( z ) ) ~ ~ is ~ an + orthonorrnal , ~ ~ z basis on R and {Aj*,),Ez+ is a disjoint partition of [0, cm).
~ ~ ~ + ~
11.3. Orthogonality and Completeness
347
In light of Remark 11.16, we can formulate at this point a correspondence between disjoint partitions of [0, m ) by intervals of the form A,,, and orthonormal bases on R consisting of the functions {wzk(x)}j,,tZ+ ,k tZ. The remainder of this section is devoted to making this correspondence explicit.
11 3 1
Wavelet Packet Bases with a Fixed Scale
Theorem 11.17. o n R.
The collection { w : ~( x ) ) , ~ ,k,z ~ + is a n orthonor7nal system
Proof: Since
{w;~, ' u I ~ ~ @= ) {w$,~-@, ~1;;'~) for any n,, rn E Z f , k , !E Z, it will be enough to show that for all k t Z and n,, m E Z+, ( 11.3) ( U I ? . ~ ,wCo) = d(m - n,)6 ( k ) .
The proof of (11.3)is by induction on n and m. If n = 0, then ( i r ~ : , ~t , r ~ = ~ ~ (q~~,~:, p) = h ( k ) by t,he ort,l~onorrnalityof the scaling function. Given 7n > 0, assurrie that {w;,~, w : . ~ )= (Sit') SIX:) for all k E Z, 0 ':O < m. If m is even and m > 0, then m / 2 > 0 arid by the induction hypothesis,
If m is odd and m > 1, then (m- 1 ) / 2 > 0 and by the irldlictiori liypothesis.
Similarly, if m
=
1, then (m - 1 ) / 2 = 0 and by (8.13),
~ }
348
Chapter 11. Wavelet Packets
Thus, (wZ;L,, wi>,) = 6 ( m )S ( k ) . Let n t 'N be given, arid assume that for all rn t Z + , k t Z, (tuck, w e ~ ,=~ ~ ) 6 ( t - m) 6 ( k ) for all 0 5 /. < n. If m and n are even and m > n, then m / 2 > n / 2 and b y the induction hypothesis, (w:d2, w:,f) = 0 for every !E Z and so
If r , and n are both odd and if m tlic ~ e s u l follows t similarly sirice
,,,,
> n, then (m - 1 ) / 2 > ( r ~ 1 ) / 2 and -
( I 2 :md (u10 . a, ( 1 1 1 ) / 2) = 0 by the irldriction hypothesis. If nL is even and n is odd a.nd i f r n n,, then m / 2 > (n 1 ) / 2 and b y the iriduction liypothesis,
,
-
Finally, if r n is o d d and 72, is even with rrl > n , then either (711 - 1 ) / 2 > n./2, in which case, ( u ~ ; 'a,;,,) ~ , = 0 as above; or ( m - 1 ) / 2 = n / 2 . in wllicll case,
Corollary 11.18. For each fixed j E Z , the collection a71
O T ~ L ~ L O T L O ~ ~ I I LsCyLs~ t e m on
{
W
~
~
(
X
)
)
is~
R.
Proof: Exercise 11.27.
We now prove completeness of the systems defined in Theorem 11.17 and Corollary 11.18.
~
~
,
~
~
~
+
11.3. Orthogonality and Completeness
Theorem 11.19.
The collection
{ W ~ , ~ ( X ) ) ~ ~ ~is, a , n~ orthonormal ~ +
349 basis
o n R.
Proof: Since orthogonality was proved in Theorem 11.17, it remairis only to show cor-npleterless. We will do this by proving that for each J t Z f ,
(see Exercise 11.28). Since 1
{'1CI~,k)kcz= { w , J , ~ ) ~ E z is a11 orthonormal basis for W J (Remark 7.49(a)), it is enough to show that for each m E Z,
The proof of (11.5) is by induction on J . If J = 1, then n = 1 and clearly
Suppose that (11.5) holds for J - 1. By the induction hypothesis and the orthogonality of { w $ , ~ ) ~ ~ ~ , ~ ~ ~ Z + ,
so that
Note also that by definit,io11,for any 1E Z , ~ ~ ~ ( 2 ) = ~ h ( ~ - 2 C ) w and ; ~ ( x w;",+'(z:) ) =~g(p-2e)ur;lP(x) P
'2
Therefore, by the QMF condition (8.14), for any k E Z,
350
Chapter 11. Wavelet Packets
Thus, for each k E Z , (
x
)
t
span{w$(r), wo2n+1 ( L . ) : e E Z , 2J-2 j n < 2J-1}
=
~ p a n { w F ~ ( x )E : k~ , 2 ~ - n ' < ZJ).
2, the11 l l ~ eseyuerlce d;L(k) will have length at least 2 N p j (1 - 2 - 9 (L - 2).
>
+
(2) Periodization. Here we assume that co(n) is a period M = 2N sequence. Then the DWPT is defined as in Definition 11.33. In this case, each d y ( k ) is a period
11.4. Discrete Wavelet Packets
355
2-JM = 2N-i sequence so that it is only necessary t o store dy(k) for k = 0, 1, . . . , 2 - j M - 1. Also note tha,t, t,hs depth of the wavelet packet tree can be at most log2( M ) = N. Therefore a total of M log,(M) = N 2N wavelet packeL cueficierlts will be kept for a length &Isignal. The DWPT as a Linear Transformation We can think of a period M = 2N sequence co(n) as an M-vector
as in Section 8.3.3. Since each sequence dy(,k) has period 2-JM we can think of d y ( k ) as a 2-W-vector
=
2N-J,
Since for every j and n,
where the matrices W, are defined in Section 8.3.2. Since W2-1nlis an orthogonal 2-3 M x 2 - W matrix, it follows that for each 1 5 j 5 N and 0 1, note that any dyadic partition of [0, 112) will either be {Ao,o) or else will be the union of a dyadic partition of [0, 114) and one of [1/4,1/2). Since there are P ( N - 1) dyadic partitions of [O, 1/4) and of [1/4,1/2) with intervals of length not less than 2-N-1 , we have the recursion formula
P1
In fact, for $1 = 2 N , there are inore than 2"/"iscrete wavelet packet bases (Exercise 11.41). Figure 11.11 shows the rapid increase of P ( N ) with N.
FIGURE 11.11. The number of wavelet packet bases of R".
11.5.2
The Idea of th,e Best Basis
Exercise 11.41 says that there are more than 2"12 discrete wavelet packet bases for R". The goal of this subsection is to consider the problem of finding the discrete wavelet packet basis that "best fits" or is "best adapted to" a given vector c o . We need to be more precise about what this means. Intuitively, we would like to say that an orthonormal basis is well adapted to a vector if the vector can be accurately represented by just a few of its coefficients in that basis. For definiteness, let us assume that our vector c o is normalized so that llcoI = 1. The best possible fit of an orthonormal basis to c o will occur when c o is one of the basis vectors. In this case, exactly one of the coefficients of c o in this basis will be 1 and all the rest will be 0. Now consider the case when c o sits in a subspace of R" spanned by, say, three of the vectors in an orthonormal basis, call them v l , v2, and vg. Then Cg = Q1 V1
+
Q2 V2 r ' Qy V y
11.5. The Best-Basis Algorithm
361
with cuf + a; +a: = 1. This is still a very efficient representation of co, but we would like to be able to find some way to say that the first representation, with only one nonzero coefficient, is "better" than the second, with three nonzero coefficients. In order to do this, we define a cost functional M tliat can be thought of as a way to measure the "distance" from a vector to an orthonormal system in Rhf. The way this works is as follows. M is a function that maps a vector c and an orthonorrnal system B = {bj) to a nonnega,t,ivereal number. Typically, M (c,B)will be small if the vector c is well represented by just a few of its coefficients in the basis B. For the purposes of the bestbasis algorithm, we will ask that tlie cost f~nct~iorlal M satisfy a mildly restrictive but very powerful additivity condition. ( a ) A function M is a n additive cost functional if there is a nonnegative functzon f ( t ) o n R such that for all vectors c E R" and orthonomnal systems B = {b,} C R",
Definition 11.39.
(b) G i v e n a vector c E Rn',a n addztzve cost functional M , and a finite collection, B, of orthonormal systems in R", a best basis relative to M for' c i s a system B E B for which M ( c , B ) is minimized. Although it i s n o t required by the definition, for the purposes of the best-basis algorithm, w e will alwr~ysm,ake t h e assumpkion that all of the system,^ in B have the s a m e span. I n other u!ord.s, each B E B i s a n orthonormal basis for the same subspace of R"' ( o r for all of R ~ ' ) .
Sorne exanlples of the type of cost functionals we will corlsider are given below.
(1) Shannon Entropy We define the Shannon entropy functional by
Entropy is a well-known quantity in information theory and is used a s a rrleasllre of the amo~xritof 11ricerta.intyin a probability distribution, or equivalently of the arnount of infornlatiorl obtained from one sample from the probability space. If the probability of the it11 outco~llei n a probability space corlsisting of P outcomes is pi, then the entropy of the probability distribution is
362
Chapter 11. Wavelet Packets
If, for example, pl = 1 and p, = 0 for i # 1, then the entropy of this distribution is zero. This is often interpreted as the statement that there is no urlcerLairlLy in llle outcome, or that no inforination is obtained from a single outcome. A probability distribution in which all outcomes are equally probable will result in high entropy, which is interpreted as high uncertainty of each outcome and that a large amount of information is obtained from each outcome. For our pixrposes, it suffices to note that if x is close to 0 or to 1, then the quantity x logx will be close to zero. Therefore, assuming that c is a unit vector in span(B), the entropy M ( c , B) will tend to be small if the coeficier~ts{ ( c ,bJ)}corlsisl of a few large coefficients (close to 1) and many small ones (close to 0). Note that there is no generality lost by assuming that c is a unit vector in span(B) because if not, just define PCto be the projection of c onto span(B) (which we assume will be the same regardless of which B E B is being considered; see Definition 11.39 above). Then
1 M / P c / l l P c , B) =
-M ( c , B)
IlPcll
+ log Ilpsl12
so that minimizing Pc/IIPcII over B is equivalent to minimizing c over 8. It is certainly possible that PC= 0; in which case, any basis from B will be a best basis. (2) Numbcr Abovc Thrcshold Here, for a given threshold value 0
< A, we define M by
In the context of signal or image processing, M measures how rnany coefficients are "negligible" (that is, below threshold) in a transformed signal or image and how many are "important." The more negligible coefficients, the lower the cost. (3) Sum of pth Powers Fix some p
> 0, and define
If p = 2, then for any vector c and orthonormal system {bj),
11.5. The Best-Basis Algorithm
363
Hence this measure is of no value in best-basis selection if p = 2, since llPcI is always the same no matter which system B E B is chosen. If p >> 2, then I(c, bj)IP will tend to be much smaller than ( c . bj)l if (c, bj) is close to zero, and hence M ( c , {bj))will tend to be small if the coefficients {(c, b,)) consist of a few la.rge coefficient,^ (close to 1) and many small ones (close to 0). (4) Signal-to-Noise Ratio (SNR) This cost functional is a combination of (2) and (3) when p = 2. For a given threshold value A, define
This is a direct measure of the rneari-square error erlcountered when the sniall (meaning below tliresholtl) coefficients are discarded arid the signal or irriage is reconstructed lisirig only the large (above threshold) coefficients. Typically, SNR is measured in decibels (db) arid is sorlietinies given by SNR
=
-10 loglo(M(c/llc~ll, {b,))) dl).
.5 so that m 2 , 2 = .5arid B2,2= ( ~ 2 2 . ~ ) . Siniilarly, since .33+ .33 > .47, m2,s = .47 and B2,3= { v : , ~ }The . updated entropy vahies and the updated best-basis are shown in Figure 11.14.
+
+
+
+
(4) Fix j = 1. For n = 0, since 7.4 + 2.8 < 17.3, let m1.0 = m2$0 m2,l = 7.4 2.8 and Bl,o = 8 2 , " U B 2 , For ~ n = 1, since .5 .47 < 1.02, let ml,l = m 2 , 2 + m 2 , 3 = .5 + .47 and BIt1= B2,2U B2,3.The updated entropy values and the updated best-basis (which actually has not changed) are shown in Figure 11.15.
+
+
+
+
+
(5) Fix j = 0. Since 10.2 .97 < 28.5, let mo,o = rn,l,o m131= 10.2 .97 and Bo,o= BlIoU B1,l.This basis is the best-basis and its entropy is equal to m o , The ~ final entropy value and best-basis a.re shown in Figure 11.16.
Exercises Exercise 11.41. Prove that for N and P ( N ) = P ( N - 1)2 1.
+
> 1, P ( N ) > 2 2 N - 1 ,whew P ( 1 ) = 2
11.5. The Best-Basis Algorithm
367
FIGURE 11.14. Updated entropy values and best basis at level j = 2 for the linear chirp.
FIGURE 11.15. Updated entropy values and best basis at level j = 1 for the linear chirp.
Exercise 11.42. Complete the proof of Theorem 11.38.
368
Chapter 11. Wavelet Packets
FIGURE 11.16. Final updated entropy values and best basis for the linear chirp.
Part V
Applications
Chapter 12 Image Compression The purpose of this chapter is to present some of the basic concepts behind image coding with the wavelet transform. There are many excellent expositions of the theory and practice of image and signal con~pressioriusing wavelets, and the reader is encouraged to consult those references for more information. The goal here is to give the reader enough information to design a model wavelet-transform image coder. A typical black-and-white image is an hf x Ad array of integers chosen from sorne specified range, say, 0 through L - 1. Each elenlent of this array is referred to as a picture element or pixel, and the value of each pixel is rcfcrrcd to as a grayscale value and rcprcscnts thc shadc of gray of the given pixel. Usually a pixel value of 0 is colored black, and L - 1 is colored white. In this chapter, we will assume for simplicity that h1 is some power of 2, usually 256 or 512. If M = 256 (hence 65536 pixels) and L = 256 (hence 8 bits per pixel), then the storage requirements for an image would be 256 x 256 x 8 = 524288 bits. The goal of image compression is to take advantage of hidden structure in the image to reduce these storage requirements. Any transform coding scheme consists of three steps: (1) the Transform Step, (2) the Quantixation Step, and (3) the Coding Step. (1) The Transform Step. In this step, the image data are acted on by sorne invertible transform T whose purpose is to decorreelate the data as rrluch as possible. This means to remove rediindarlcy or hidden structure in the image. Such a transform usually amounts to computing the coefficients of the image in some orthonormal or rlorlorthogonal basis. Because any such transform is exactly invertible, the transform step is referred to as lossless. See the can (2) The Quantization Step. The coefficients calculated in the transform step will in general be real numbers, or at least high-precision floatingpoint numbers, even if the original data consisted of only integer values. As such, the number of bits required to store each coefficient can be quite high. Quantization is the process of replacing these real numbers with approximations that require fewer bits to store. This "rounding off" process is rlecessarily lossy, meaning that the exact values of the coefficients cannot be recovered from their quantized versions. In a typical transform coding algorithm, all error occurs at this stage.
(3) The Coding Step. Typically, most of the coefficients computed in the transform step will be close to zero, and in the quantization step will actu-
372
Chapter 12. Image Compression
ally be set to zero. Hence the output of Steps (1) and (2) will be a sequence of bits containing long stretches of zeros. It is known that bit sequences with that kind of structure can be very efficiently compressed. This is what takes place at this step.
The Transform Step 1 . 11
Wavelets or Wavelet Packets?
We have seen that wavelet bases are very good at efficiently representing functions that are smooth except for a small set of discontinuities. Any image that has large regions of constant grayscale (for example, a white or black background) can therefore be well represented in a wavelet basis. Hence a wavelet basis wit,h sufficient vanishing rnoments can be used effectively in the transform step. It is also possible to find the best wavelet packet basis for an image and use the expansion in that basis as the transform. The advantage of this approach is that the resultiug coefficients will be optimized relative to some appropriate measure of efficiency. For example, maximizing the number of coefficients below a given threshold is precisely what is called for in a transform coding scheme as described here. A clear disadvantage is that the best basis will depend on the image so that a description of which basis is used must be included in the overhead. Since for an M x M image, therc arc rnoto than 2"2/2 wavelet packet bases, at least ~ ~ bits/ are2required to specify the transform being used. This amounts to at least .5 bits per pixel in overhead costs. One solution to this problem that is especially effective when a large number of images with similar characteristics are being compressed is to compute a single basis well suited to the collection. The way this is done is as follows. First a representative subset { fi):=, of the images to be compressed is chosen. Then for a given cost functional M, the basis B is chosen t,ha,t,minimizes
The basis B is the ensemble best basis for the subset and is used to specify the transform to be used for compression. The best-basis algorithm is still applicable in this case; so this calculation is efficient. An example of a situation in which an ensemble best basis is used is in the compression of fingerprint images. The ridges on a typical fingerprint translate to rapid oscillations in pixel values; so it is not silrprising that a standard wavelet basis does not give the optimal representation.
12.2. The Quantization Step
12.1.2
373
Choosing a Filter
Another question to be raised in choosing the transform is which scaling and wavelet filters to use. There are several things to consider. (1) Symmetry. Symmetric filters are preferred for the reasons outlined in Section 10.7.3, namely that large coefficients resulting from false edges due t o periodization can be avoided. Since orthogonal filters (except the Haar filter) cannot be symmetric, biorthogonal filters are almost always chosen for image compression applications.
(2) Vanishing moments. Since we are interested in efficient representation, we require filters with a large number of vanishing nioments. This way, the smooth parts of an image will produce very small wavelet coefficients. Since, because of symmetry considerations, we are only interested in biorthogonal wavelets, it is possible to have a different number of vanishing moments on the analysis filters than on the reconstruction filters. Vanishing moments on the analysis filter are desirable as they will result in small coefficients in the transform, whereas vanishing moments on the reconstruction filter are desirable as they will result in fewer blocking artifacts in the compressed image. Hence sufficient vanishing moments on both filters are desirable. (3) Size of the filters. Long analysis filters mean greater cornputation time for the wavelet or wavelet packet transform. Long reconstruction filters can produce unpleasa.nt atifacts in the compressed image for the following reason. Since the reconstructed image is made up of the superposition of only a. few scaled and shifted reconstruclion filters, features of t,he reconstruction filters, such as oscillatioris or lack of smoothness, can be visiblc in the reco~istructedimage. Smoothness can be guaranteed by requiring a large number of vanishing moments in the recoristr~ictionfilter, but such filters tend to be oscillat,ory. Therefore, we seek both analysis and rccorlstruction filters that are as short as possible. The more vanishing moments a filter has, the longer that filter must be. Therefore there is a tradeoff between having lots of vanishing morrients and short filters. The 9/7 filter pair turns out to be a good conipro~llisearld is in fact the filter used for fingerprint compression.
The Quantization Step After the image has been transformed, we are left with an M x h1 array of coefficients that can be high-precision floating-point numbers. These values must be quantized or rounded in such a way that they take only a relatively small number of values. Quantization is achieved by means of a quantization map, Q, an int,eger valued step function. A simple quantization scheme called unzform scalar quantixation is defined as follows.
374
Chapter 12. Image Compression
(1) Supose that all of the coefficients in the array fall in the range [ - A , A ] , and that the number of quantization levels, an integer q (usually even) is specified. The interval [ - A , A] is partitioned into q equal subintervals [xO,xl),[xl, xZ): . . ., [xq-1, x q ) , where xo = - A and xi+l - xi = 2 A / q . (2) We define a quantization map Q ( x ) as shown in Figure 12.1(left). Note that the rangc of Q is thc set of q - 1 integers { - ( q - 2)/2, . . . , ( q - 2)/2}.
(3) A dequ~n~tizing function, Q-l, is specified as shown in Figure 12.1(right). Note that each integer value in the range of Q is mapped to the center of the corresponding interval in the partition with the exception that Q-l(O) = 0. There are other types of quantization, such as vector quantixtion and predictive quantization. More complete discussions of the theory of image quantization can be found in the texts listed in the appendix. The goal is to rninimize the quantization error or distortion in the transformed signal.
FIGURE 12.1. Left: Q ( x ) , right:
Q-'(2).
A hallmark of an effective transform for image coding is that rnost of the coefficients of a given iniage are small and hence are quantized t,o zero. If the quantization map Q ( x ) shown in Figure lZ.l(left) is used, then all coefficients less than 2 A / q in absolute value are quantized to zero. It is often desirable to specify an independent parameter or threshold X > 0 such that all coefficients less than X in absolute value are quantized to zero. There are two types of thresholding, hard and soft thresholding. The difference between them is related to how the coefficients larger than X in absolute value are handled. In hard thresholding, these values are left alone, and in soft thresholding, these values are decreased by X if positive and increased by X if negative. Specifically, we define a pair of thresholding
12.3. The Coding Step
375
functions as follows:
Hard and soft thresholding functions are shown in Figure 12.2. If thresholding is used, then the quantization map has the form Q o T (x), where T is either a hard or soft thresholding function.
FIGURE 12.2. Left: Thard (rc), right: TsOft (x).
The Coding Step Suppclse that the tra~lsforri~ed hl x M irrlage has bee11 yuar~tiaedin sudl a way that the data t o be cornpressed consist of a string of M' integers between 0 and r - 1, for some positive integer r. The idea behind coding this string of numbers is t o exploit redundancy in order t o reduce the number of bits required to store the string. A simple example of this idea is the following. Suppose that r = 4, M~ = 16, and the data to be compressed were written as
AABCDAAABBADAAAA (we have substituted the letters A, B , (7,D for the int,egers 0, 1 , 2, 3 for simplicity in what follows). Since there are a total of four distinct symbols
376
Chapter 1 2 . Image Compression
in t,he data, it is possible t o code each symbol with 2 bits or binary digits. We could do this as follows;
In this case, our data would read as
a total of 32 bits. On the other hand, observing that the symbol A appears far more often in the data than does any other symbol (A appears 10 times, B 3 times, C once, and D twice), we can compress the data by represerltirlg A with fewer bits and using more bits for the other symbols. For example, we could use the fullowing code:
Then the data would read as
a total of 25 bits and a savings of about 22%.
In the rema,inder of this subsection, we will present sorrie basic concepts of information and coding theory and introduce the concept of entropy of a symbol source.'
12.3.1
Sources and Codes
Definition 12.1. A symbol source is afinite set S
= { s l , s2,
. . . , s q ) together
0 as an irliproper Rie~llallnintegral, the expression (13.15) will in general not exist for arbitrary functions f (x).
(b) Since by symmetry we car1 write
this suggests that the expression (13.15) can make sense for somp funct,ions f (z) provided that we approach the singularity a t 0 symmetrically. This suggests the definition (13.14). This way of approaching a singularity is referred to as the principal vulue znteyrul.
13.1. Examples of Integral Operators
403
(c) As a. function of two variables, K ( x - t) = (x - t)-' is COCon the sets { ( x , t ) :z < t) and {(z, t): z > t ) but has an infinite discontinuity on the diagonal {(z,t): z = t). The Hilbert transform is a simple example of a singulur integrul o p e r ~ t o r . ~ In order t o investigate some basic properties of the Hilbert transform, it is important to establish the class of functions on which it is defined, that is, for which the limit exists. To this end, we consider for a given f (s)and t > 0, the integral
If f (z) is L' on R, then the first integral on the right side is well defined for each z E R as an improper Riemann integral. This follows from t,he fact that since f (z - t ) / t J 5 If (a - t ) if ltl _> 1,
By using thc Cauchy-Schwarz inequality, it would also be sufficient to assume that f (z) is L2 on R. As for the second integral, assume that f (z) is C1 on R. Then by Taylor's formula, f ( s - t ) = f(z)- f r ( z ) t + R ( z . t ) , where limt+o R ( z , t ) / t = 0, and for each z, R ( z , t ) is C1 on R as a function of t. Then
L 5 1 t l 0, define
Then limM,,
g E r M ( s )= g,(z) in L2 on R. B y Plancherel's formula, = ge(7) in L2 on R. We calculate g6,M( 7 ) .
L . M ( ~=)
=
1
e-2xi,x
t 0 is an attenuation coeficient that is related t o the physical properties of the object. In general, the density and attenuation coefficicnt of the object will vary with position. Suppose that they are given by p(x) and a ( x ) . Then in the usual way, we can consider an area element with width ds in a twodimensional cross section of the object, centrered a t x. Then the attenuation of the beam as it passes through this element is given by I. e - " ( x ) f ( x ) d s . Integrating over each of these area elements yields the integral
where t is the line joining the entry and exit point of the beam. Assuming that the attenuation coefficient is constant throughout the object (so by normalizat,ion eq11a.l t,o 1 ), we arrive at -
In
-
=
i
p(x) ds
=
Rp(Q,s ) ,
where t = {x:x . 8 = s). Therefore, the problem of tomographic imaging becomes the problem of inverting the Radon transform. A reasoriable image
408
Chapter 13. Integral Operators
of the cross section of the object can be produced once the density function is known. (e) It is geometrically obvious from Figure 13.2 that the line corresponding to the angle 8 and the directed distance s is identical to the line corresponding to 6' .ir and -s as well as 6' .ir and -s. Therefore, we conclude that R f ( 8 , s ) = R f ( 0 + T , -s) = R f ( 0 -IT, -s).
+
-
Inversion of the Radon Transform We will now present a formula for the inversion of the Radon transform. Part of this inversion formula involves the computation of an integral operator with a singular kernel. If f(z1,xZ) is L I o n R ' , t h e n Rf(6,s) is L' o n [O,27r) x R (Lhal is, JR Rf(Q,s)l ds do is finite). If i n addition, f (xi,xz) is C O o n R' and has compact support, then Rf (6,s ) is CO o n [O,27r) x R.
Theorem 13.9.
SozT I
Proof: To see that Rf (Q, s) is L1 on [0,27r) x R, note that for each 8 fixed,
CO"
(
(
s
i
n
(
dn" cos*
) Idtds
where we have made the change of variables u = s cos 19 + t sin 8, v sin0 - t cos Q and noted that the Jacobian of the transformation is 1. Therefore,
=
As for continuity, suppose that f (xl, x2)vanishes outside a ball of radius A > 0 about the origin. Then for every Q and s,
LA A
RJ(Q,s) =
f ( s cos 6'
+ t sin 0, s sin8
-
t cos8) d t .
Since f (xl, x2) is continuous and compactly supported, it is uniformly continuous on R2. This implies that lim
f (s' cos 6"
+ t sin Or, s' sin 6'' - t cos 8')
(@',s')+(.,Q)
=
f (s cos 8 + t sin 8, s sin 6'
-
t cos 8)
13.1. Examples o f Integral Operators
409
in LW on R. Therefore, lim
R f ( O f ,s f )
(er,sr)+(~,e)
lim
-
-
J A
(Qf,s')+(s,Q)
A
=
f ( s f cos 8'
-A
+ t sin Q', s f sin 8'
-t
cos 8') dt
f ( s cos O + t sin 8 , s sin 0 - t cos 0) d t
Rf(Q,s).
Hence R f ( 8 ,s) is C0 on [O,2;.r) x R. If f ( x l ,x z ) is L1 on R ~then , for each 8 , Ref ( s ) is L' on R. Therefore, we can compute its Fourier transform. The following theorem relates the onedimensional Fourier transform of Ref ( s ) to the two-dimensional Fourier transform of f ( x l ,2 2 ) . It is referred to as the Fourier slice theorem because Ref ( 7 ) is a "slice" of f ( 7 1 , y 2 ) on a line through the origin making an angle 6' with the positive x-axis. This observation will be used t o derive an inversion formula for the Radon transform. A
A
Theorem 13.10.
(Fourier Slice Theorem) Suppose that f ( X I , 2 2 ) is L'
R ~ Then .
on
n
Re f ( y ) = f^(y C
O S ~y,
sin6').
Proof:
S,S, 1/
f ( s cos 8
= =
+ t sin 8 , s sin 8 - t cos 8) d t e-2"iys ds
f ( 2 L , V ) e-2ai-y(u cos H+a sin H )
R
d u dv
R
where we have made the change of variable u = s cos 8 sin 6' - t cos 6' and noted that s = u cos 6' v sin 6'.
+
+ t sinQ, v
Theorem 13.11. (Radon Inversion Forniula) Suppose that both f f^((YI, y ~ are ) L' on R ' . Then
f
( ~ 1 , 2 2= )
2 r z r ( z l cos
0+z2sin 0 )
1.
dr d o .
( X I , x2)
=
and
(13.21)
Proof: Writing the Fourier inversion formula for f (xl, xz) in polar coordinates gives
410
Chapter 13. Integral Operators
LI"^
f ( r cos 8 , r sin 0) e Znir(z1 cos 0+xz
=
Znir(xl cos 0+x2 sin 8 )
sin 8 )
dr d~
r d r dB.
e
Since ~ y f ( r = ) Hs,f(-r)
(Exercise 13.20), we can write 2?rir(xl cos B+xz sin 0)
m
e
Re-?,f ( - r ) e
h"J,
"
=
A
Ref (-r) e
m e
Ref ( - r ) e
r dr d o
27r2r(x1 cos 8+xz sin Q)
,dr d o
2?rir(xl c o s ( B + ~ ) + x zs i 1 1 ( 0 + ~ ) )d ,
-2?rir(zl cos 8+xz sin 8 )
2?rir(xl cos B + X ~ sin e )
do
dr
( - r ) d r dQ.
Combining the two calculations,
f
2 x i r ( x l cou 8 t xz sin B )
r dr d8
2 n i r ( x l cos 0+xz sin 0 )
Irl dr d6'
(XI,5 2 )
-03
=
L T S , c f ( r ) e~ n i r ( cos x ~ 0+x2
sin 0)
lrl
dr do.
Equation (13.21) can b e "unpacked" by looking a t the outer and inner integral separately. T h e outer integral is referred t o as backprojection, and the inner as ramp filtering, which we will see corresponds t o an integral operator with a singular kernel. We will describe each operator below. Given any function h(8, s) defined on [O,2n) x R, define the backprojection operator, R#, applied to h(8, s) as follows.
Definition 13.12.
R# h , ( z l ,22) =
h(8, z~ cos 8 + 2 2 sin 8) dB.
S_:
Note that if h(8 - .rr, s) = h(8, s ) , then ~ # h ( xzz) ~ ,= 2
LT
h(8, X I cos 8
+ x2 sin 8) dB
13.1. Examples of Integral Operators
411
(Exercise 13.21).
Definition 13.13. Suppose that hjx),L' o n R, has the property that 1 x(:,i is also L' o n R. Define the ramp-filtering operator Q o n such h(x) as follows. Qh(i) =
x(Y)
-Yl
c~~~~~dy.
Note that b y Fourier inversion, we have that
Remark 13.14. (a) It is clear from (13.21) and Definitions 13.12-13.13 that the Radon inversion formula can be written as
(b) The ramp-filtering operator is related t o the Hilbert, tra,nsform. By (13.19), we have that for all L~ functions f (x), Hf (7) = -i sgn(7) f^(?). We know by the Differentiation Theorein (Theorem 3.33) that differentiation , .ince corresponds to multiplication of the Fourier transform by - 2 ~ 1 ; ~5' (-2~iy)(-1; sgn(?)) = - 2 ~ 1 ~ 1we , can conclude at least formally that
Of course the interchange of limiting processes in the above calculation must be justified. (c) Just as the Hilbert transform corresponds to "convolution" with the function ( T X ) - l , we can in a similar way interpret Qf (x) as a convolution operator Qf (x) = f * w(x), where ,G(Y) = Irl. Evident,ly v) cannot he a function as we have defined them (but is in fact a generalized function or distribution). Nevertheless we can now write the Radon inversion formula as 1 (13.22) f ( x 1 , ~ 2= ) - R#(RQ* w ) ( x 1 , ~ 2 ) . 2 This explains why Q is referred t o as ramp-filtering. Ref is filtered via collvolutio~lwill1 so~rlethirigwhose Fourier transform is a "ramp" in the frequency domain. (d) What is often done in practice is t o replace w in the above formula with a function wo(x) that approximates w in some sense. Usually, W O ( X ) is defined by writing go(^) = Iy ?(Y) for some function g(y) that decays rapidly at infinity. In this case, (13.22) is used t o define an approximation t o f ( X I ,x2) as
412
Chapter 13. Integral Operators
(e) The relationship between r ( x l , z2)and f (zl,xz) can be determined via the filtered backprojection formula (Exercise 13.22):
where g(Q,s) = g e ( s ) is any function Loo on [O,27r) x R and f (xl,x2) is L1 on R ~The . convolution on the left is in one dimension and that on the right is in two dimensions. Applying filtered backprojection t o (13.23) yields
where R # W " ( X ~ , zz) = Wu(xl,x2). That is, once we know the smoothing function g ( t ) , we can determine the two-dimensional convolution kernel Wo(x1, ~ 2 7 . (f) We would also like t o go in the other direction, specifying the function Wo(xl , x2) and determining the smoothing function g ( t ). Here we will typically allow the smoothing function to depend on 8 , so that we are really determining a collection of functions {go (t))Q,[o,2mpSince
this suggests the relation wo(s) transform side, this becomes
Since Go(?)
=
Iy 1
-
(1/2)QRel.Vo(s).When taken on the
(y), we arrive finally at
F0 (7) = -21 Wo(7 cos 8, y sin 8).
(g) One way to use wavelets in the inversion of the Radon transform is to require that the kernel functions Wo(xl,x2) be the elements of a twodimensional wavelet basis. It ,turns out that the fact that wavelets have vanishing moments is advantageous for inverting the Radon transform efficiently and locally. Local inversion means that a good approximation to the image on a small region of interest can be obtained from processing the Radon transform data corresponding to lines that pass close t o that regi~n.~ 7See for example Rashid-Farrokh~,T , i l l , Rerensteln, and Walnut, Wovelet-bosed multtresolutton local tomography, IEEE Transactions on Image Processing, vol 6, (October 1997) p. 1412-1430, and the references cited there.
13.1. Examples of Integral Operators
413
In light of Remark 13.14(b), we can make the formal calculation
dt Q ~ ( x= ) H-(x) dx
f f ( x- t )
=
dt.
Of course, this calculation involves an exchange of limiting processes that must be justified. Leaving that aside for the moment, we integrate the right side by parts and obtain (Exercise 13.23)
L,>€
ff(x- t )
dt
=
f(x-6) €
+ f(x+e) €
+/
f(x-t) dt. Itl2~ t2
In any case, the following theorem can be rigorously proved.
Theorem 13.15.
Suppose that f ( x ) is
( a ) lim
t--to E
and C' o n R . Then
1
E
(b) If we define for
LI
exisis for every x t R.
> 0,
then for each y t R.
This shows that the ramp-filtering operator Q involves an integral operator with a singular kernel.
Exercises Exercise 13.16. Show that if uo(x) and u l ( x ) are linearly independent solutions to the homogeneous equation b(x)yl]' + q(x) y = 0, then the Wronskian W (uo,ul(x) is a constant multiple of l/p(x). (Hint: Show that the derivative of the function p(x) W(uo,ul)(x) is zero.) Exercise 13.17. Find the Green's functions for the following boundary value problems. Verify that each function has discontinuous first derivatives. (a) yff - 4yf - 12y = f (x), y(0) = y(1) = 0.
+
+
+
+
(b) ( I x) yff yf = f ( x ) , y(0) yf(0) = 0, y(1) y f ( l ) = 1. (Hint: y(x) = 1 is one solution t o the homogeneous problem.)
414
Chapter 13. Integral Operators
Exercise 13.18. Show that a homogeneous second-order linear differential equation with constant coefficients a ytt + b y t + c y = 0, a > 0, is equivalent t o an equation of the form [ ~ ( xyt]' ) q(x) y = 0 for some continuous functions p(x) and g(x) that never vanish on R. (Hint: Take p(x) = A es" and q(x) = c e D X ,and determine a,ppropria,tevaliles of t,he const,a,nt,s.)
+
Exercise 13.19. Show that the Hilbert transform commutes with translations and dilations. That is, show that if a > 0 and b € R, then
and H ( T bf )(x) = T b ( Hf )(x). (Hint: Use (13.191.) A
A
Exercise 13.20. Show that for any r , Ref ( r )= Re-, f ( - r ) . (Hint: Use the fact that Ref (s) = Re-, f (-s).) Exercise 13.21.
Prove that if h(O - T ,s )
= h(Q,s ) , then
where R# denotes the backprojection operator defined in Definition 13.12.
Exercise 13.22. that
Prove the filtered backprojection formula. That is, show
bac go * R e f ) ( x ~ , m = ) (~'ge * f)(xl,x2), where g(d, s) = ge(s) is any function Lm on [O, 27r) x R and f ( X I , 2 2 ) is L1 on R ~ The . convolution on the left is in one dimension and that on the right is in two dimensions.
Exercise 13.23.
Prove that if f (x)is L1 and C1 on R, then for any 6
> 0,
(Hint: Integrate by parts.)
13.2 The BCR Algorithm In this section, we describe the BCR algorithm. Suppose that we wish to approximate the integral operator T given by
13.2. The BCR Algorithm
415
We do not specify any limits of integration, but they should be clear once we specify the integral operator we are interested in. Suppose that we are given a scaling function cp(x) and a wavelet function $(x), which we assume for simplicity are orthonormal. The changes required for the case when they arc not orthonormal arc straightforward and left to the reader.
13.2.1
The Scale j Approximation t o T
A simple way to discretize the operator T is to assume that we can write down an expansion of the kernel K ( x , y) in terms of the scaling function as follows: K(x7 y)
=
C C co(k, t)
%,k(~)
yo,,(y).
Of course there is no reason t o expect that equality will actually hold in the above expansion, as this would assume that the kernel is a function in the two-dirnerlsiorial scalirig space Vox Vo. However, the above assurrlytiorl corresponds to our usual procedure for approximating a continuously defined function by discrete data in such a way that we can conveniently apply the wavelet transform. From now on, we will assume that the only knowledge we have of the kernel K ( x , y) is the coefficients {co(k,t)). We also note that in any practical setting, we will only have finitely many coefficients to work with; so we assume in addition that 0 5 k, t' < M, where Ad = 2N for some N E N. Inserting this expansion of K ( x , y) into the definition for T, we obtain
where so (t)= (f,yo,e). By the orthonormality of the scaling function,
The function Tf (x) is then approximated by the expansion
with equality holding if and only if Tf (x) is in the scale space Vo. Summarizing these calculations, we can write this approximation to T as the following M x M matrix multiplication:
416
Chapter 13. Integral Operators
where Co = [co(k,l)].We can call this the scale 0 apprvximatioll to T. In fact, we could have presented the efficient evaluation of the matrix multiplication (13.24) at the start as the problem to be solved and ignored the connection with integral operators. From this point of view, the BCR algorithm is simply a way to do fast matrix multiplication when the matrix is such that it has an efficient representation in a wavelet basis. Looking at the scale 0 approximation to T, we realize that there is nothing stopping us from forming a scale 1 approximation to T in a similar way. Once we have done it, we will see that it was a good idea. Applying one step of the two-dimensional DWT to K ( x , y ) , we obtain
so that
where s l (1)= ( f ,(I-,,!) the scaling function,
and d l (t) = ( f ,$ ~ ~ , e By ) . the orthonormality of
s; (k) = (Tf,p-l,n) = C ( c i ( k ,4) SI (t)+
(k,
dl (C))
e and d;(k) = (Tf,$ - l , n )
=
C(~l(k.,t) + r l ( k , t )dl(t)). e SI(~)
The function T f (x) is then approximated by the expansion
again with equality holding if and only if Tf (rc) is in Vo.We can write the scale 1 approximation t o T as the following h f x M matrix multiplication:
13.2. The BCR Algorithm
417
where r1 = [ 7 l ( k , i ) ] Bl , = [ P l ( k . , e ) ] .A1 , = [ ~ l ( k , i ) and ] , C1 = [ C l ( k , j ) ] are each M / 2 x M / 2 matrices. Applying the next step in the DWT to K ( x ,y ) , we call write
=
CC
~2
(k0) P-2;k ( 2 )y-2,e (y)
so that
where sz(P) = ( f ,c p - 2 , e ) and d2(!) = ( f ,T / - ~ , ~By ) . the orthonorrnality of the scaling and wavelet functions,
The fi~nctionTf (x)is then approximated by t h e expansion
with equality holding if and only if Tf (x) is in Vo.We can write the scale 2
418
Chapter 13. Integral Operators
approximation to T as the following 3 M / 2 x 3 M / 2 matrix multiplication:
where F l , Bl, and A1 are M / 2 x hf/2 matrices and = [yz(k,t)],B2 = [,&(A, k ' ) ] , A2 = [u2(k,k ' ) ] , arid C2 = [cZ(k,t)]are M / 4 x M / 4 matrices. Continuing in this fashion up to N times, we can form the scale N approximation to T as the matrix product
(13.27) where for each 1 L 3 I N , I'j = ( y j ( k , t ) ) , Bj = (Pj(k, !)), A j = (aj(k,!)), and Cj = (cj(k, 1 ) )are 2-jM x ~ P J matrices, M so that (13.27) is a 2n/l x 2M system.
13.2.2
Description of the Algorithm
The scale J approximation to the integral operator T really consists of the following steps:
(1) Approximate the kernel function K ( x , y) by its projection onto the subspace Vo x Vo.This is written as the expansion
13.2. The BCR Algorithm
419
(2) Approximate the function f (x)by its projection onto the subspace VU. This is accomplished by calculating the coefficients dj (k) = (f. $- j , k ) . and sj(k) = ( f ,( ~ - j , k ) for all k and 1 5 J' 5 J. Of course, not all of these coefficients are required in order to fully represent f (x). This can be accomplished by the expansion
(3) Approximate the function Tf (x) by calculating its projection onto the subspace Vo.This is the expansion
The BCR algorithm consists of one further approximation that is based on the following observation. If the kernel K ( x ,y) has the property that it is smootl-1 apart from singularities on the diagonal, then each of the submatrices 4,and BJ will have large entries near the diagonal and small entries away from the diagonal. The smoothness of the kernel and the number of vanishing moments of the wavelet chosen will help determine exactly how small the off-diagonal entries are. In rrlarly cases, these offdiagonal entries are so small that by establishing a threshold value A , which is usually some small fraction of the largest value in the matrix, and setting to zero all entries whose absolute value is less than X turns each of the submatrices into matrices whose nonzero entries are in a narrow band (say r entries wide, where 7- 0 there exists a 6 > 0 sucli that if .r E I arid 1.c - xol < b thcn (a)J< -hl.
If
linear combination. A linear combination of a collection of functions { fj(s)}r=i defined or1 ;m iriterval I, is a function of tlie form h(.c) = a,
f 1 (z)+(L~ (.c) + + a fN~ (.c) =
N
a,
fj (7) for
some constands {aj},;=,
]=I
A linear con~binationof a collection of vectors {v.,},:!, form x
=
zJxl bJ vj for some corrstants {h, }El.
is a vector of the
N
linear transformation. A function T from R7"nto Rtnis a linear transformation if for every pair of vectors x arid y in R n ,and every pair of real nurnbers u and b, T ( u x by) = n T(x) b T ( y ) .(See matriz representation of a linear transformation.)
+
+
lower bound. A number A is a lower bound for a set of real numbers S if A 5 z for every z E S. (See least upper bound, lower bound, greatest lower bound, supremum, infimum.) matrix. An m x n matrix is an array or numbers arranged in m rows and 12 C ~ ~ U I L We I ~write S. A ={ ~ ( i , j ) } ~ < z ~ ~ , ~ ~ ~ < ~ matrix multiplication. The product of an n a x 111atrix A = { u ( i ,j ) } and an n x p matrix B = {b(i, j ) } is the matrix AB = C = ( c ( i ,j ) } , where c ( i ,j ) = Cz=,a ( i , k ) b ( k ,j ) . matrix representation of a linear transformation. It is always possi-
A.1. Terms from Advanced Calculus and Linear Algebra
429
ble to represent a linear transformation from R.n into Rm as an m x n matrix with respect to a given pair of orthonorrnal bases. Specifically, if { ~ ~ } r = ~ is an orthonormal basis for Rn and if {wi)z?s an orthonormal basis for RTn,then we say that T is represented by the matrix T = {(T(vi), wj)). In this case, let V be the n x n matrix whose colunins are the vectors v, and let W be the m x m matrix whosc columns are the vectors wj. If x is a given vector in Rn, then T ( x )= W T V ~ X(See . transpose of a matrix,
adjoint of a ,matrix, vector, rnatmx multiplication, orthonormal basis, Linear transforn~ation.) monomial. A monomial is a function of the form z~~for some n E Z + . (See polyr~omial.) n-times continuously differentiablc on an interval. A function S ( x ) is n - t i m e s continuously diflerentiable o n a n interval I if the nth derivative f ( n )( x ) , defined recursively by
where f ( ' ) ( z ) = f (z), is continuous on I. In this case, f(z)is said t,o be Cn o n I . C0 on I means that f (z) is continuous on I. A functiorl f (x) is CO" on I if it is Cr' on I for every n E N. open interval. An open interval is an interval of the form ( a . b) = { . T : n. < z < b ) , for sorrie real numbers a < b. orthogonal matrix. A11 n x n matrix is orthogonal if its rows forrn an orthorlorrrlal system. In this case, its columns will also form an orthonormal system. (See inner product of vector:^, orthonomnal system.) orthogonal projector. Given a subspace ll.1of R7"or Cr", and an orthonormal basis {wi}:=, for &I, the orthogonal projector onto A4 is the d linear transforniation PM defined by PM(x)= Gill ( x ,wz)w,. (See sub-
space, linear transformation, ortilonormal system.) orthogonal vectors. A pair of vectors v and w are orthogonal if (v,w) = 0. (See i n n e r product of uectors.) orthonormal basis of vectors. An orthonormal system of n vectors in Rn or C7Lis an orthonorrnal basis for R 7 k r C n . If { v , ~ ) ?is =~ an orthonorma1 basis for RrL(or C n ) , then any vcctor x can be written uniquely as x = C;=~X, vi) vi. orthonormal system of vectors. A collection of vectors {vi)zn=, is an orthonormal system if (vi,vJ)= b(i - j ) . oscillatory discontinuity. A function f (z) has an oscillatory discontinuity at a point z o if f (z)is not continuous at xo and if it has neither a jump nor an infininte discontinuity at z o . See Figure A.2. (See jump
discontinuity, infinite d~is~o~r~l'i~ru'ity.) piecewise continuous. A furiction f ( m ) is piecewise continuous on a finite interval I if f ( z )is continuous at each point of I except for at most finitely
430
Appendix A. Review of Calculus and Linear Algebra
many points. A function f (x) is piecewise continuous on an infinite interval I if it is piecewise continuous on every finite subinterval of I . (See infinite intenial, finite intenial, ,cubinter?ial.) piecewise polynomial. A function f (x) defined on R is a piecewise polyand a nomial function if there is a collection of disjoint intervals {In)nEN collection of polynomials { ~ ~ ( x ) such ) , ~that ~ f (z) can be written in the 00
form
f (4=
C P,,(.P) XI,. ( 4 . rr=l
polynomial. A polynomial is a function of the form p(z) = a"
+ a1 z +
N
a2 z 2
+ - - . + U N x N = C a1 z2for some coristants {ai}N,". 2=0
R i e m a n n integral. The Riemann integral of a function f (x) continuous on a finite closed interval I
=
[a,b], denoted
f (z)dx or
f (z)d z , is > 0,there is a 6 > 0 /a
the number v with the following property: For every t such that for every choice of numbers {.~,,)f=~ such that u = xo < 2 1 < - . . < Z N = b and such that (xitl - x i ) < 6 for 0 5 i 5 N - 1; and for I
every choice of numbers z T E [xi,
N-1
I
f (25) (xi+1 - xi) - v < E .
sequence. A sequen,ce of nvm,berv is a collection of numbers indexed by some index set Z.Typically, Zwill be the integers Z, the natural numbers N, or the nonnegative integers Z + . Such sequences will be denoted by { c , ) , , ~ ~ {, c * ) , , ~ ~or, { c ~ , )Z+ , ~rcspect,ively. ~ A seqz~enceof fun.ction.s is a collection of furictions indexed by Zand denoted {y, ( x ) ) , ~ ~ . s p a n . The span of a collection of vectors is the set of all finite linear combinations of vectors in that set. (See linear combination.) step function. A function f (z) defined on R is a step function if there is a collection of disjoint intervals {In)nEN such that f (z) can be written in the 00
form f ( z ) =
C a,, xi. (x) for some constants {a,,)nEN.A step function is n= l
also referred t o as a piecewise constwnt f u r ~ c t i o ~ ~ . subinterval. An interval I is a subinterval of an interval J if I
C J.
s u b s p a c e . A subset M of the vector space Rn (or C n )is a subpace if it is closed under the formation of linear combinations. That is, if X I , x2 are in hf,then so is ax1 + b x 2 , for ally real (or complex) iiumbers u and b. There will always exist an orthonormal system of vectors { w ~ ) $ = where ~, d 5 n such that M = span{wi). The number d is the dimension of M. The collection { w ~ ) $is~said t o be an orthonormal basis for M. s u p p o r t e d o n an interval. A function f (x) defined on R is supported on the interval I if f (x) = 0 for all x $ I.
A.1. Theorems from Advanced Calculus
431
supremum. The s u p r e m u m of a set of real numbers S is a real number A such that z A for all .7: E S and such that for every number B < A, there exists z E S such that B < x. The supremum of S is also called the least u p p e r bound of S, and is denoted s u p s . (See upper bound, lower bound, infimum.) transpose of a matrix. The transpose of an m x n matrix A = { a ( i ,j ) ) is the n x m matrix = { u ( j , i ) ) . (See matmx, adjoint of a m a t r i x . ) uniformly continuous on an interval. A function f (z)defined on an interval I is u n i f o r m l y c o n t i n u o u s o n I if for every E > 0, there is a h > 0 such that if 2 , y E I satisfy J z- y J < 6,then (z) - f ( y ) l < E . upper bound. A number A is an u p p e r bound for a set of real numbers S if z 5 A for every z E S. (See least u p p e r boun,d, lower hound, greatest lower bound, s u p r e m u m , infimum.) vector. An n-tuple of real or complex numbers is referred t o as a vector. In this book we rnake no distinction between row vectors (1 x n matrices) and column vectors (n x 1 matrices). If v is a vector and A an m x n matrix, then the product Av is defined as though v was written as a column vector. (See matrix, m a t r i x multiplication.)
0 such that I f ( x ) l M for allx E I .