BEYOND WAVELETS
BEYOND WAVELETS
Gran t V. WELLAND University of Missouri - St. Louis Department of Mathematicsand Computer Science St.Louis, USA
ACADEMIC PRESS An imprint of Elsevier Science 2003 Amsterdam - Boston - Heidelberg - London - New York - Oxford - Paris San Diego - San Francisco - Singapore - Sydney - Tokyo
STUDIES IN COMPUTATIONA L MATHEMATIC S 10
Editors: C.K. CHUI StanfordUniversity Stanford, CA,USA P. MONK University of Delaware Newark. DE. USA L. WUYTACK University of Antwerp Antwerp, Belgium
ACADEMI C PRESS An imprint of Elsevier Science 2003 Amsterdam - Boston - Heidelberg - London - New York - Oxford - Paris San Diego - San Francisco - Singapore - Sydney - Tokyo
ELSEVIER SCIENCE Inc. 360 Park Avenu e South New York, NY 10010-1710. USA ' 2003 Elsevier Science Inc. Allrightsreserved. This work is protecte d unde r copyrigh t by Elsevier Science, and the followin g terms and condition s appl y to its use: Photocopyin g Single photocopie s of single chapter s may be made for persona l use as allowed by national copyrigh t laws. Permission of the Publishe r and paymen t of a fee is require d for all other photocopying , includin g multipl e or systematic copying , copyin g for advertisin g or promotiona l purposes , resale, and all forms of documen t delivery . Special rates are availabl e for educationa l institution s that wish to make photocopie s for non-profi t educationa l classroom use. Permission s may be sough t directl y from Elsevier’ s Science & Technolog y Rights Departmen t in Oxford, UK: phone : (+44) 1865 843830, fax: (+44) 1865 853333, e-mail:
[email protected] . You may also complet e your reques t on-lin e via the Elsevier Science homepag e (http://www.elsevier.com) , by selecting ’Custome r Support ’ and then ’Obtainin g Permissions’ . In the USA, users may clear permission s and make payment s throug h the Copyrigh t Clearanc e Center , Inc., 222 Rosewood Drive, Danvers , MA 01923, USA; phone : (978) 7508400, fax: (978) 7504744, and in the UK throug h the Copyrigh t Licensin g Agency Rapid Clearanc e Service (CLARCS), 90 Tottenham Court Road, London WIP OLP, UK; phone : (+44) 207 631 5555; fax: (+44) 207 631 5500. Other countrie s may hav e a local reprographi c rights agency for payments . Derivativ e Works Tables of content s may be reproduce d for interna l circulation , but permissio n of Elsevier Science is require d for external resale or distributio n of such material . Permission of the Publishe r is require d for all other derivativ e works , includin g compilation s and translations . Electroni c Storage or Usage Permission of the Publishe r is require d to store or use electronicall y any materia l containe d in this work, includin g any chapter or part of a chapter . Except as outline d above , no part of this work may be reproduced , stored in aretrieval system or transmitte d in any form or by any means , electronic , mechanical , photocopying , recordin g or otherwise , withou t prior written permissio n of the Publisher . Addres s permission s request s to: Elsevier’ s Science & Technolog y Rights Department , at the phone , fax and e-mail addresse s noted above . Notice No responsibility is assume d by the Publishe r for any injur y and/or damage to person s or propert y as a matter of product s liability, negligenc e or otherwise , or from any use or operatio n of any methods , products , instruction s or ideas containe d in the material herein . Because of rapid advance s in the medica l sciences , in particular , independen t verificatio n of diagnose s and dru g dosages shoul d be made. First edition 2003 Librar y of Congres s Cataloging in Publication Data A catalog record from the Librar y of Congres s has been applie d for. British Librar y Cataloguin g in Publication Data A catalogue record from the British Librar y has been applie d for. Academi c Press An Elsevier Science Imprint 525 B Street, Suite 1900, San Diego, CaHfomia 92101-4495, USA http://www.academicpress.co m ISBN: ISSN:
0 1 2 743273 6 1570 579 X (Series)
Typeset by Alden Press, Oxford Printed in Great Britain by MPG Books Ltd, Bodmin , Cornwal l ' The paper used in this publicatio n meets the requirement s of ANSI/NISO Z3 9.48-1992 (Permanenc e of Paper). Printed in The Netherlands .
PREFACE
The themes of classical wavelets include terms such as compression and effi› cient representation. Important features which play a role in analysis of functions in two variables are dilation, translation, spatial and frequency localization and singularity orientation. Singularities of functions in more than one variable vary in dimensionality. Important singularities in one dimension are simply points. In two dimensions zero and one dimensional singularities are important. A smooth singu› larity in two dimensions may be a one dimensional smooth manifold. Smooth sin› gularities in two dimensional images often occur as boundaries of physical objects. Efficient representation in two dimensions is a hard problem and is addressed in the first six chapters. The next two chapters return to problems of one dimen› sion where new important results are given. The final two chapters represent a transition from harmonic analysis to statistical methods and filtering theory but the goals remain consistent with those of earlier chapters. We have chosen to title "Beyond Wavelets". We could have used the title, "Pursuing the Promise of Wavelets". We briefly describe each chapter. The lead chapter, "Digital Ridgelet Transform based on True Ridge Functions" by David Donoho and Georgina Flesia addresses the problem of analyzing the structure of a function of two real variables. It extends work of Donoho and an associated group of co-workers. Special credit is due to Emmanuel Candes. Donoho and Candes have constructed a system called curvelets which gives high-quality asymptotic approximation of singularities. Passage from their continuum study to one appropriate for applications requires development of digital algorithms to implement concepts of the continuum study faithfully. A less obvious proposal than a standard tensor product basis was made earlier by Donoho emphasizing "wide-sense" ridgelets with localization properties in radial and angular frequency domains. Wide-sense ridgelets are no longer of strict ridge form but allow the possibility of an orthonormal set of elements. The theory is related to that of the Radon transform and to rotation and scaling of images. At the continuum level these are natural but for digital data issues are problematic. In this chapter a definiton of digital ridgelet transform is given. The digital transform has structural relationships strongly analogous to those of the continuum case. The transform takes a n-by-n array of data in Cartesian coordinates and expands it by a factor of 4 in creating a coefficient array. This leaves room for further improvements.
VI
Chapter 2 is a companion chapter to Chapter 1 and continues the study of digi› tal implementation of ridgelets with ridgelet packets. The two principal approaches given are the frequency-domain approach and the Radon approach. In the first approach a recursive dyadic partition of the polar Fourier domain produces a col› lection of rectangular tiles followed by a tensor basis of windowed sinusoids in the angular and radial variables for each tile. In the Radon approach transforma› tion to the Radon domain is followed by using wavelets in the angular variable and wavelet packets in the second Radon variable. The Radon isometry is important in this case. The notion of pseudopolar Fast Fourier Transform and a pseudo Radon isometry called the normalized Slant Stack are discussed and used. In both cases analysis of image data relies on directionally oriented waveforms. The wavelet packet and the local sinusoidal packet bases are generalizations of the original wavelet systems of elements. Ridgelet packets which follow in the spirit of these systems are highly orientation selective and bear much the same relationship to ridgelets as do wavelet packets to wavelets. In Chapter 3, Frangois Meyer and Raphy Coifman create brushlets to address the problem of describing an image with a library of steerable wavelet packets. By careful design of the window of the local Fourier basis, brushlets with very fast decay are obtained. They note that other directionally oriented filter banks have been constructed which a redundancy factor of 2 or 4. This presents a major hurdle to computing a sparse image representation. By use of a construction in the Fourier domain they create wavelet packets which are complex valued functions with a phase. A key ingredient of the construction is a window used for local Fourier analysis. The window is required to have very fast decay. Do and Vetterli study image representation in Chapter 4. An observation that the curvelet transform is defined in the frequency domain leads to the question: "Is there as spatial domain scheme for refinement which at each generation, doubles the spatial resolution as well as the angular resolution?" They propose a filter bank construction that effectively deals with piecewise smooth images with smooth contours. The resulting image expansion is a frame composed of contour segments, which are named contourlets. Their work leads to an effective method to implement the discrete curvelet transform. Chan and Zhou open discussion of the ENO-wavelet construction in Chapter 5, by discussing oscillations which emulate the classical Gibbs’ phenomenon. It has be discovered that the wavelet Gibbs’ phenomenon is generated by using differ› ence filters across boundaries of discontinuity. ENO is the acronym for the phrase essential non-oscillatory which represents an approach for suppression of unwanted oscillations encountered at discontinuities. Rigorous approximation error bounds are found to depend on the smoothness of function away from discontinuities when the ENO approach is used. Several applications of the ENO method are given which include function approximation, image compression and signal denoising. An explicit model for Bayesian reconstruction of tomographic data is given by S. Zhao and H. Cai in Chapter 6. Their approach to image analysis is based on an interesting analogy to classical mechanics. The intensity of each pixel of an image is modelled by a transverse motion of a "pixtron". The energy for Bayesian tomo-
VII
graphic reconstruction is interpreted as the total kinetic energy of the collection of pixtrons and log-likelihood is interpreted as potential energy restricting motion of pixtrons. Finally, the use of the minimization of a log-posterior is analogous to the principle of least action of classical mechanics. The analogy allows them to show that a Gaussian Markov random field prior can viewed as the kinetic energy of free motion of pixtrons. The analogy leads to a novel image prior for Bayesian tomographic reconstruction based on level-set evolution of an image driven by the mean curvature motion. Their methods are accompanied by applications to brain slice images which demonstrate algorithms produced by the model. Chui and Stockier give extensive description of recent developments of spline wavelets and frames in Chapter 7. Splines have many of the natural features required in the original design of I. Daubechies for wavelets which result in beauti› ful formulas. Vanishing moments reflect smoothness. Design of wavelet frames with vanishing moments requires a series of new ideas. The authors explain why early design approaches fail to create wavelets with higher orders of vanishing moments and then provide steps to recover vanishing moments. The method involves the notion of vanishing moment recover functions. The theory is extended in the direc› tion of tight spline-wavelet frames with arbitrary knot sequences that allow stacked knots. Knot Stacking provides local increase in smoothness and can be applied at the boundaries of bounded intervals and half line segments. This gives greater flexibility overcoming standard rigid design features of classical wavelets in which supports are closely tied to the dilation factor of wavelet families. Multi-wavelets represent a special case of this more general construction. Chapter 8, "Afl^ne, Quasi-afl[ine and Co-affine Wavelets", by Washington Uni› versity the group of researchers, is devoted to fully understanding results of Ron and Shen. Dilations and translation are two characteristic operators used to define the wavelet pyramid. The question studied asks whether the order in which dila› tion and translation are applied is important. A subset of the affine group, used in the wavelet definition, is the set translations followed by dilation. A second subset of the aflfine group is the set for which dilation is applied first which is followed by translation. The effects are dramatically diflferent. Ron and Shen found that by reversing the order of these operators at a ’half-way’ point in the wavelet pyramid results in a diflferent set of functions and yet they are sufficient to solve the rep)resentation problem. This chapter is devoted to understanding this phenomenon and it is discovered that the choice of Ron and Shen is essentially optimal. Benichou and Saito search for relations between the related criteria in Chap› ter 9. Two studies motivate them. Olshausen and Field pioneered an approach to imaging which investigates representation of natural images emphasizing sparsity of representation using a large library of photographs of natural images and computer experiments to derive a set of basis elements for eflficient representation. Bell and Sejnowski conducted similar studies in which statistical independence was the major criterion. The pair of studies suggests both the basis derived for sparse representation and the basis derived under the independence criterion pro› duce elements eflficient for capture of edges, orientation and location; all features prominently studied by image researchers. Their study is based on a modest goal
Vilf
that begin s with an artificial stochastic process , the spike process , from which they obtain theorem s which give precis e condition s on the sparsit y and statistical independenc e criteri a to select the same basis for the spike process . S. Akkarakaran and P.P. Vaidyanatha n provid e a new directio n from previou s work in Chapter 10. Standard filter banks fall unde r the theor y of design and uni› form filter banks. A nonunifor m filter bank is one whose channe l decimatio n rates need not all be equal . Most nonunifor m filter bank design s resul t in approximatio n or near-perfec t reconstructio n which leaves open theoretica l issues for nonunifor m filter banks. Their stud y is restricte d to filter banks with integer decimatio n rates. A set, S, of integer s satisfies maximal decimatio n if the reciprocal s of the inte› gers sum to unity. They only stud y filter banks with integer decimatio n rates. Their stud y searche s for necessar y and sufficien t condition s on S for existence of a perfect-reconstructio n filter bank belongin g to some class which uses S as its set of decimators . They presen t examples with condition s which are either sufficien t or necessar y but unfortunatel y different . They focus on rational filter banks and strengthe n known necessar y condition s providin g an importan t step to solvin g the problem . However , the basic proble m remain s unresolved . Necessary and sufficien t condition s remai n unknown . Thus they open an importan t proble m and provid e insigh t toward solvin g it. This volum e is a produc t which was conceive d durin g a conferenc e funde d by the National Science Foundatio n and the Conferenc e Board of Mathematical Sciences at which David Donoho was the principa l speaker in May of 2000 at the Universit y of Missouri - St. Louis. The title "Beyond Wavelets" is due to David Donoho. I thank the NSF and the Universit y of Missouri - St. Louis and the suppor t staff of the Mathematics Departmen t there . A very special thank s is extended to David Donoho for his continue d suppor t and understanding . Many contribute d to the success of that conferenc e and to the origina l idea to develo p "Beyond Wavelets". I give thanks to Charles Chui, Raphy Coiftnan, Ingrid Daubechies , and Joachim Stockier and Shiying Zhao. I thank the contributor s to the volum e both for their efforts and understanding . I take responsibilit y for the delays encountere d and beg your forgiveness . Many more deserv e to be mentione d to whom I extend my thank s anonymously . Grant Welland St. Louis, MO February , 2003.
CONTENTS
V
Preface
v
1 Digital Ridgele t TVansfor m base d on Tru e Ridg e Function s D.L. Donoho and A.G. Flesia 1.1 Introduction 1.1.1 Ridgelets on the Continuum 1.1.2 Discretization of Ridgelets 1.2 Digital Ridgelets 1.3 Relation to Fast Slant Stack 1.4 Structural Analogies 1.4.1 Two Continuum Radon Transforms 1.4.2 Analogies between Polar and Pseudopolar Fourier Domains 1.4.3 Analogies between Radon Isometrics 1.4.4 Analogies between Ortho-Ridgelet Analyses 1.4.5 Analogies Between Frequency-Domai n Tilings 1.5 Example: HalfDome 1.6 Sparsity of the Frame Kernel 1.6.1 Analysis of a Coarse-scale ridgelet 1.6.2 Remarks on Decay 1.6.3 Edge Effects 1.7 Comparisons 1.7.1 Comparison with Zp-ridgelets 1.7.2 Comparison with earlier ridgelets 1.8 Discussion
1 2 2 3 4 10 13 13
Reference s
29
2
14 15 16 16 17 19 20 20 21 21 21 23 26
Digital Implementatio n of Ridgele t Packets
A.G. Flesia, H. Hel-Or, A. Averbuch , E.J. Candes , R.R. Coifman and D.L. Donoho 2.1 Introduction 2.2 Fourier Preliminaries
31 32 36
IX
X
CONTENTS 2.3 2.4 2.5
2.6
2.7
2.8
Radon Preliminaries The Ridgelet Construction, and its Properties Ridgelet Packet Construction 2.5.1 General Procedure 2.5.2 Bases of Ridgelet Packets 2.5.3 Radon Approach: Wavelets in both Ridge and Angular Directions 2.5.4 Radon Approach: Wavelet Packets in the Ridge Direc› tion 2.5.5 Polar Fourier Approach: Wavelet (g) Cosine Packet Implementation on Digital Data 2.6.1 Fast Slant Stack 2.6.2 Pseudopolar F FT 2.6.3 Digital Radon Domain 2.6.4 Strategy for Digital Implementation 2.6.5 Digital Ridgelet Packets 2.6.6 Digital Implementation 2.6.7 Examples of Digital Implementation 2.6.8 Synthesis from Tiles 2.6.9 Analysis Adaptation 2.7.1 Background on Best Basis 2.7.2 Application to Ridgelet Packets Discussion 2.8.1 Improvements in the Digital Implementation
39 40 42 42 43
2.8.2
58
Limitations on the Ridgelet Packet Scheme
Reference s 3
43 45 47 48 48 48 49 50 50 51 52 52 53 55 55 56 57 57
59
B r u s h l e t s : S t e e r a b l e Wavele t P a c k e ts
Francoi s G. Meyer an d Ronal d R. Coifma n 3.1 Introduction 3.2 Biorthogonal windowed Fourier bases 3.2.1 Implementation by folding 3.3 Choice of the bell function 3.3.1 The orthonormal bell of Wickerhauser 3.3.2 Optimized bell of Matviyenko 3.3.3 Modulated Lapped Biorthogonal Transform (MLBT) 3.4 Biorthogonal brushlet bases 3.4.1 One dimensional case 3.4.2 Discrete implementation of the brushlet expansion 3.4.3 Two-dimensional case 3.5 Conclusion
61 61 65 67 69 69 69 71 72 72 74 75 81
Reference s
82
CONTENTS
4
XI
Contourlet s
M. N. Do and M. Vetterli 4.1 Introduction and Motivation 4.2 Representing 2-D Piecewise Smooth Functions 4.2.1 Curvelet construction 4.2.2 Non-linear approximation behaviors 4.2.3 A filter bank approach for sparse image expansions 4.3 Pyramidal Directional Filter Bank 4.3.1 Multiscale decomposition 4.3.2 Directional decomposition 4.3.3 Multiscale and directional decomposition 4.3.4 PDFB for curvelets 4.4 Multiresolution Analysis 4.4.1 Multiscale 4.4.2 Multiple Directions 4.4.3 Multiscale and multidirection 4.5 Numerical Experiments 4.6 Conclusion
83 83 85 85 85 87 89 89 90 91 93 93 94 95 98 100 104
Reference s
104
5 ENO-wavelet Transform s £Uid Some Application s Tony F. Chan and Hao-Min Zhou 5.1 Introduction 5.2 The ENO-Wavelet Algorithm 5.2.1 ENO-wavelet at Discontinuities 5.2.2 Locating the Discontinuities 5.2.3 A Simple Example 5.3 Theory: Error Bound and Stability 5.4 Application 5.4.1 Function Approximation 5.4.2 Image Compression 5.4.3 Signal Denoising
107 107 HI 111 116 118 119 121 121 125 130
Reference s
131
6 A Mechanica l Imag e Model for Bayesian Tomographi c Reconstructio n Shiying Zhao and Haiyan Cai 6.1 Introduction and Background 6.1.1 Introduction 6.1.2 Positron Emission Tomography 6.1.3 Bayesian Tomographic Reconstruction Method
135 136 136 137 138
xii
CONTENTS
6.2
6.3
Materials and Methods 6.2.1 A Mechanical Image Model 6.2.2 Kinetic Energy Induced from Level-Set Evolution 6.2.3 Numerical Implementations Results and Discussion 6.3.1 Simulation Results 6.3.2 Discussion 6.3.3 Conclusion
Reference s
140 140 142 143 145 145 146 149 149
7 Recen t Developmen t of Splin e Wavele t Frame s wit h Compac t Suppor t Charle s Chu i and Joachim Stockier 7.1 Introduction 7.2 Characterization of Wavelet Spline Frames 7.2.1 Tight frames with dilation factor 2 7.2.2 Non-tight sibling frames with dilation factor 2 7.2.3 Frames with integer dilation factor 7.3 Wavelet Frames of Splines with Multiple Knots 7.4 The Common Link: Approximate Duals 7.4.1 Background on univariate 5-splines 7.4.2 A particular polynomial 7.4.3 Explicit form of an approximate dual 7.5 Tight Spline Frames with Non-uniform Knots 7.5.1 Piecewise linear tight frames 7.5.2 Piecewise cubic tight frames with equidistant simple knots 7.5.3 Tight frames of cubic splines with equidistant knots of multiplicity 2
151 152 155 158 166 171 175 184 186 190 191 199 203
Reference s
212
8
205 208
Afiine , Quasi-Affin e an d Co-Affine Wavelet s
Philip Gressman , Demetri o Labate, Guid e Weiss and Edwar d N. Wilson 8.1 Introduction _ 8.2 Frames and the three systems X{ip),X*{ip), and X{ip)
215 215 219
Reference s
222
9 Sparsit y vs . Statistical Independenc e in Adaptiv e Signal Representations : A Case Stud y of th e Spike Process
CONTENTS
xiii
Bertran d Benicho u and Naoki Saito 9.1 Introduction 9.2 Notation and Terminology 9.3 Sparsity vs. Statistical Independence 9.3.1 Sparsity 9.3.2 Statistical Independence 9.4 Two-Dimensional Counterexample 9.5 The Spike Process 9.5.1 The Karhunen-Loev e Basis 9.5.2 The Best Sparsifying Basis 9.5.3 Statistical Dependence and Entropy of the Spike Process 9.5.4 The LSDB among 0(n) 9.5.5 The LSDB among GL(n,R) 9.6 Proofs of Propositions and Theorems 9.6.1 Proof of Proposition 9.6.2 Proof of Theorem 9.5.1 9.6.3 Coordinate-wise Entropy of the Spike Process 9.6.4 Proof of Theorem 9.5.3 9.6.5 Proof of Theorem 9.5.4 9.6.6 Proof of Proposition 9.5.2 9.6.7 Proof of Corollary 9.5.5 9.7 Discussion 9.8 Appendices 9.8.1 Appendix A: Proof of Lemma 9.6.1 9.8.2 Appendix B: Proof of Lemma 9.6.3 9.8.3 Appendix C: Proof of Lemma 9.6.5
225 226 227 228 228 229 230 230 231 232 232 233 233 236 236 237 239 242 246 247 247 248 250 250 252 253
Reference s
255
10 Nonunifor m Filter Banks : New Result s and Open Problem s Sony Akkarakara n and P.P. Vaidyanathan 10.1 Introduction 10.1.1 Relevant earlier work 10.1.2 Outline 10.1.3 Notations, definitions and assumptions 10.2 Background: Equivalent Uniform FBs; PR Equations 10.2.1 PR for uniform FBs, and the nonuniform to uniform transform 10.2.2 The general PR conditions for nonuniform FBs 10.2.3 Relation between the nonuniform and uniform PR designs 10.3 Problem Statement, and Unconstrained FBs 10.3.1 Problem statement
259 260 261 263 263 264 265 266 268 269 269
xiv
CONTENTS
10.3.2 FBs with unconstrained complex and real coefficient fil› ters 10.4 Tree Structures 10.4.1 Basics and terminology 10.4.2 Uniform-trees: An incomplete PR theory for nonuniform FBs 10.4.3 Using trees to improve PR conditions on the decimators 10.5 Delay-chains 10.5.1 PR condition on the set of decimators 10.5.2 Testing the PR condition 10.5.3 Delay-chains vs. uniform-trees 10.6 The Class of Rational FBs 10.6.1 Previously known necessary conditions on decimators 10.6.2 The pairwise gcd test 10.6.3 Tree version of strong compatibility 10.6.4 The AC-matrix test 10.7 Conditions Based on Reductions to Tree Structures 10.8 Summary and Comparison of Necessary Conditions 10.9 Concluding Remarks 10.10 Appendices lO.lO.lAppendix A: Proof of Nonuniform Biorthogonality Con› dition (2.3) 10.10.2Appendix B: Derivability of Decimator-sets from a Uniform-tree 10.10.3Appendix C: Proof of Fact 3 10.10.4Appendix D: Proof of Fact 4 10.10.5Appendix E: Testing Tree Version of Strong Compati› bility 10.10.6Appendix F: Algorithm for the AC Matrix Test 10.10.7Appendix G: Proofs of Theorems 6,7 10.10.8Appendix H: Proof of Theorem 8
270 271 271 273 275 276 277 277 278 280 280 281 281 282 285 289 290 291 291 292 293 294 296 297 297 300
Reference s
301
Inde x
303
Beyond Wavelets G. V. Welland (Editor) ' 2003 Elsevier Science (USA) All rights reserved
DIGITAL RIDGELET TRANSFORM BASED ON TRUE RIDGE FUNCTIONS D.L. DONOHO AND A.G. FLESIA Department of Statistics, Stanford University Sequoia Hall, 390 Serra Mall, Stanford, CA 94305-4065
[email protected], edu
[email protected], edu
A b s t r a ct We study a notion of ridgelet transform for arrays of digital data in which the analysis operator uses true ridge functions, as does the synthesis oper› ator. There are fast algorithms for analysis, for synthesis, and for partial reconstruction. Associated with this is a transform which is a digital analog of the orthonormal ridgelet transform (but not orthonormal for finite n). In either approach, we get an overcomplete frame; the result of ridgelet trans› forming an n X n array is a 2n x 2n array. The analysis operator is invertible on its range; the appropriately preconditioned operator has a tightly con› trolled spread of singular values. There is a near-parseval relationship. Our construction exploits the recent development by Averbuch et al. (2001) of the Fast Slant Stack, a Radon transform for digital image data; it may be viewed as following a Fast Slant Stack with fast 2-d wavelet transform. A consequence of this construction is that it offers discrete objects (discrete ridgelets, discrete Radon transform, discrete Pseudopolar Fourier domain) which obey inter-relationships paralleling those in the continuum ridgelet theory (between ridgelets. Radon transform, and polar Fourier domain). We make comparisons with other notions of ridgelet transform, and we investigate what we view as the key issue: the summability of the kernel underlying the constructed frame. The sparsity observed in our current implementation is not nearly as good as the sparsity of the underlying continuum theory, so there is room for substantial progress in future imple› mentations.
2
DIGITAL RIDGELET TRANSFORM
1.1 INTRODUCTION 1.1.1 Ridgelets on the Continuum
Recently, several theoretical papers have called attention to the potential benefits of analyzing continuum objects /(x, y) with (x, y) R^ using new bases/frames called ridgelets [3], [4] and [12] A ridge function p(x,y) = r{ax -f- by), that is to say, it is a function of two variables which is obtained as a scalar function r{t) of a synthetic scalar variable t = ax+hy [20]. Geometrically, the level sets of such a function are lines ax-i-by = t and so the graph of such a function, viewed as a topographic surface, exhibits ridges. The function r{t) is the profile of the ridge function as one traverses the ridge orthogonally to its level sets. In Candes’ thesis [3], a ridgelet is a function pa,h,e{x,y) xjj{{cos{9)x-f sin(^)y - b)/a)/a^/’^ where V’(t) is a wavelet - an oscillatory function obeying certain moment conditions and smoothness conditions. The continuous Ridgelet transform Rf{a,b,0) = {f,Pa,b,9)is defined on functions / in L^ and extends by density to L^. This transform obeys a parseval relation and an exact reconstruc› tion formula. Candes also showed that discrete decompositions were possible, so that for L^ spaces of compactly supported functions one could develop a frame of ridgelets - a discrete family (^an,6n,^n(^)) serving the role of an approximating system. The "classic ridgelets" of Candes are not in L^(R^), being constant on lines t = xi cos 9 -h X2sin^ in the plane. This fact seems responsible for certain tech› nical difficulties in the deployment and interpretation of discrete systems based on Candes’ notion of ridgelet. In [12] Donoho proposed to broaden the concept of ridgelet somewhat, allowing ’Svide-sense" ridgelets to be functions obeying certain localization properties in a radial frequency x angular frequency domain. Under this broader conception, ridgelets no longer are of strict ridge the form pa,b,u{^)y so the elegant simplicity of formulation is lost. However, in exchange, it becomes possible to have an orthonormal set of "wide-sense" ridgelets. These orthonormal ridgelets are believed to be appropriate L^-substitutes for ridge functions, and to fulfill the goal of a constructive and stable system which although not based on true ridge functions are believed to play operationally the same role as ridge functions, compare [12, 13]. For either classic ridgelets or orthonormal ridgelets, the central issue is that such systems should behave very well at representing functions with linear singu› larities. As a prototype, consider the mutilated Gaussian : 9{xuX2) = l(,,>o}e-^?-^’,
X
R2 .
(1.1.1)
See Figure 1.1. This is discontinuous along the line X2 = 0 and smooth away from that line. Due to the singularity along the line, this function has coefficients of relatively slow decay in both wavelet and Fourier domains, so it requires large numbers of wavelets or sinusoids to represent accurately. The rate of convergence of best iV-term superpositions of wavelets or sinusoids cannot be faster than 0{N~^). On the other hand, g can be represented by relatively few ridgelets: the rate of
INTRODUCTION
Figure 1.1. 'Half Dome'- a Mutilated Gaussian
convergence of appropriate AT-term superpositions of ridgelets or ortho-ridgelets can be faster than 0{N~’^) for any m > 0. And the situation is the same for any rotation or translation of p so that the line 0:2 = 0 becomes a Une cos(^)a: -fsin{0)y = t. While perfectly straight singularities are rare, many two-dimensional objects concern imagery with edges, which may be regarded as curved singularities. While ridgelets per se do not provide the right tool for such curved singu› larities, Candes and Donoho have used ridgelets to construct a system called curvelets which gives high-quality asymptotic approximations to such singulari› ties. Curvelets are ridgelets that have been dilated and translated and subjected to a special space/frequency localization explained in [6]. The rate of convergence of an appropriate AT-term superpositions of curvelets is nearly 0(iV~^) in squared error, whereas the comparable behavior for classical systems would by 0{N~^) or worse. 1.1.2 Discretization of Ridgelets
The conceptual attractiveness of this theoretical work drives us to consider the problem of translating it (if possible) from continuum concepts, useful in theo› retical discussion, to algorithmic concepts capable of widespread application. It is initially by no means obvious how to do this or whether it can really be done. The theory of ridgelets is closely related to the theories of Radon transformation , and of rotation and scaling of images, all of which seem natural and simple on the continuum, and for which it is widely believed that there is no simple, inevitable definition for digital data. A number of prior attempts at defining a digital ridgelet transform have been made; these will be discussed in detail further below. In this paper, we propose a definition of digital ridgelet transform with several desirable properties. We believe that this definition is based on a clear understand› ing of the fundamental opportunities and limitations posed by data on a Cartesian
4
DIGITAL RIDGELET TRANSFORM
grid, and has clear superiority over some other notions of discrete ridge let trans› form which are, in our view, false starts. Our definition offers: Analysis and synthesis by true ridge functions. The underlying analysis and synthesis functions depend on (u, v) as p{u + bv) or p{v + bu). This means that the transform is geometrically faithful, and avoids wrap-around artifacts. Exact reconstruction formula. There is an iterative algorithm which in the limit gives exact reconstruction from the ridgelet transform. Near-Parseval Relationship. There is a variant of the DRT, which we call the (pseudo-) Ortho-Ridgelet Transform, in which the energy in coefficient space is equal to the energy in original space, to within a few percent. Fast algorithm. There is a fast algorithm requiring only 0{N \og{N))flops for data sampled in an n by n grid, where N = ’n? \s the total number of data. Continuum analogies. The transform and related objects have structural rela› tionships bearing a strong analogy with all the principal relationships that exist in the continuum case, between ridgelet transform. Radon transform, and Polar Fourier transform. Cartesian data structures. The transform takes data on a Cartesian grid and creates a rectangular coefficient array indexed according to a semi-direct product of simple integer indices measuring scale, location, and orientation. Overcompleteness. The transform takes an n-by-n array and expands it by a factor of 4 in creating the coefficient array. We also compare properties of this DRT with its continuum counterpart, and with other discrete counterparts, particularly as regards sparse representation of objects with discontinuities along lines. We point out certain conceptual and practi› cal advantages of the new transform, over, for example, the Z^ transform proposed by Do and Vetterli [8], and certain advantages over straightforward discretizations of the Fourier plane proposed by Donoho [9] and Starck et al. [22]. Our current implementation provides a frame whose kernel does not have, in our view, sufficient sparsity to provide in the digital setting all the quantita› tive advantages offered by the continuum theory, leaving ample room for further improvements. 1.2 DIGITAL RIDGELETS Let ipj,k{t) = V^j,fc(^;^) be the periodic discrete Meyer wavelet for the m-point discrete circle m/ 2 < t < m/2 with indices JQ < j < log2(m), and 0 < A; < 2^; this is studied in, for example, Kolaczyk’s thesis [18]. This is actually defined as the discrete inverse Fourier transform m / 2 -l
’^jAt)=
Yl
4;’’exp((i27r/m)ti;0
w=-m/2
of a certain complex sequence (cj;’^) which can be derived, e.g. using arguments in [1]. Since the formula makes sense for all t and not only for integers in the range
DIGITAL RIDGELETS
5
m/ 2 < t < m/2, the periodic discrete Meyer wavelet is unambiguously defined not just at integral t, hut in fact for all real t. Figure 1.2 displays a Meyer Wavelet of degree 2. We will also have use for fractionally-differentiated Meyer wavelets, defined as follows. For a certain sequence {6^}) r _ j y/2w/m w ^ 0
" " \ y/Tjt^ W = 0
we apply this as a multiplier to the Fourier coefficients of ipj^k, getting m / 2 -l
V^j,fc(0 =
^^’^w^ ’ exp((227r/m)ti;t).
Yl w=
m/2
(Equivalently, we could define ’ipj^k = ^-^’^j,k, where denotes m-point circular convolution and A is the inverse discrete Fourier transform of (6)).This is equally well viewed as a trigonometric polynomial defined at all t. Figure 1.2 displays a fractionally-differentiate d Meyer wavelet. For reasons that will be clear later, we also call the ipj^k normalized wavelets. In this paper we consider images as n by •
1.
0.5
^ '
^V
i j
ll y
i ^V
i
^ ( 0
^ M i!
-0.5
f^v-
i
1
Figure 1.2. Left side: Meyer Wavelet of degree 2. Right side: Fractionally differenciated Meyer wavelet of degree 2
n arrays indexed by coordinates (u, v) ranging in the square n/ 2 < u,v < n/2 centered at (0,0). Let 6^^ be defined so that tan(^]. ) = 2£/n,
-n/2 ,], (4.3) this says that an orthonormal ridgelet is isometric to a wavelet in which has been antipodally-symmetrized.
Radon-space
THE RIDGELET CONSTRUCTION, AND ITS PROPERTIES
41
The ortho-ridgelet construction may be viewed as transferring a basis from Radon space to real space via an isometry R . If we reflect on the details of the above construction, we notice that the basis we used was not completely arbitrary: it had to consist of elements both in the domain of R and the range of R . Now the range and domain both consist of functions in A, and the easiest elements in both range and domain to describe are functions which are bandpass in t, i.e. functions with support in the frequency domain contained in a compact set separated from the origin. These ideas imposed the following restrictions on the construction. The basis on Radon space was a basis for A rather than L?{dtd9). This meant that its elements had to obey an antipodal symmetry requirement, or equivalently that an element W of the basis had to obey the invariance PAW W. In order to construct such a basis, we started with an orthonormal basis for L’^{dtd9) and operated on it by PA, creating a tight frame with antipodal sym› metry. But as turned out, the tight frame was actually an orthobasis, owing to two special closure properties of these families we used; closure under reflection about the origin in the ridge direction: ^jA-^)-^jA-k(t) ,
(4.4)
and closure under translation by half a cycle in the angular direction: w.,,,(e + 7r) = < , ^ 2 . - . W -
(4-5)
The closure property (4.4) would not hold for certain other prominent wavelet families, such as Daubechies’ compactly supported wavelets. The significance of the closure properties was that for certain pairs {Wx.Wy), PA^X = PA^X’, so that the induced frame consisted of many identical pairs. Systematically removing one element from each such pair, and rescaling the other element, we obtained an orthonormal basis. An ’absence of low-frequencies’ restriction was imposed: the basis in the ridgedirection consisted entirely of bandpass elements, i.e. elements with frequencydomain support in an octave band disjoint from the origin. If we weakened these conditions, the following would still be true. We can always start from an orthobasis for L’^(R) and apply the projector PA, getting a tight frame for A.We can then apply the isometry to this, getting a tight frame for real space. If, in addition, the original basis obeyed appropriate closure under ridgereflection and angular translation, the tight frame in real space can be decimated by a factor of two to form an ortho-basis. The condition t h at the low-frequency terms be absent from all basis elements is simply a regularity condition on the outcome of the procedure. If certain elements in the basis have support near the origin in frequency space, then the construction can still take place; however, some of the corresponding frame elements will have poor decay.
42
IMPLEMENTATION OF RIDGELET PACKETS
In other words, the construction is quite general, but it might lead to a redun› dant set with redundancy two and it might lead to a basis where certain elements do not exhibit good spatial decay. 2.5 RIDGELET PACKET CONSTRUCTION We now propose a class of tight frames based on the remarks just given. In certain cases, these can be subsampled to form orthobases. 2.5.1 General Procedure We begin with a general set of ingredients: An orthonormal basis {U^{t)) for L^(R) for the ridge direction. If the elements are bandlimited, we call this a bandlimited basis. If the elements obey the closure condition the basis will be called a basis closed under reflection. A basis {Vi,{6)) for L^[0,27r) in the angular direction. If the elements obey the closure condition the basis will be called a basis closed under translation. A collection of antipodally-symmetric functions A will be constructed from the two families of bases. Letting A = (/i, i/) group the indices in each of the variables, (5.1)
Wx{t,e) = PA[U^^V,];
as the result of applying an orthonormal projector to the orthonormal basis, the Wx make a tight frame for A. A collection of functions px will be induced by the isometry R : px=T{Wx)
VA.
As an isometry of a tight frame, the px make a tight frame for their span. In fact their span is all of L^(R^). We note the following. First, there is a simple expression for the element px /3A(0 = \^\-’^^{%m)
VAO) + U,H^\)
. V,{e + n))/2
valid for ^ = (|^| cos((9), \^\ sin{0)). Second, if the elements U^ are bandpass, with C ^ Fourier transforms, and if the elements V^, are C then the elements px are likewise bandpass with smooth Fourier transforms; it follows that they are C with spatial rapid decay.
RIDGELET PACKET CONSTRUCTION
43
Third, the general procedure described above has been stated for tensor prod› uct bases U^^Vi,. In general, there is no reason to restrict ourselves in this way. More generally, we may allow a semi-direct product ^A =
PA [U^ 0
V.\^i
VA = (/x, u)
(5.2)
where the basis (V^i^x ^) depends on /i. The orthoridgelet basis defined in the Introduction in fact has this form, as can be seen from the constraint i> jBy and large, the freedom enabled by the rule (5.2) will only be exercised in a limited way, as exemplified by the way it is exercised in the ridgelet orthobasis; the coarsestscale of resolution may be adjusted to the properties of the corresponding ridge element. 2.5.2 Bases of Ridgelet Packets
While in principle, any pair of bases may be used for the above construction, we are interested here in those bases deriving from applying certain principles of time-frequency localization [13, 33]. Definition 2.5.1 We call ridgele t packe t basi s a basis constructed by the above procedure, where the basis U^ is chosen from a wavele t packet s dictionar y and the basis V^ (or V^|^ if rule (5.2) is used) is chosen from a wavele t packet s dictionar y or a cosin e packet s dictionary . When the basis in the angular direction is chosen from the wavelet packets dictionary, we will sometimes speak of the Radon-domain approach to defining ridgelet packets, whereas when the basis in the angular direction is chosen from the cosine packets dictionary, we will speak of the polar Fourier-domain approach. This distinction reflects the structure of the underlying algorithms in the two situations, as we will discuss later. Admittedly, this is artificial to some extent, since the Radon and Fourier domains are related in 1-1 fashion, but we find the distinction helpful. 2.5.3 Radon Approach: Wavelets in both Ridge and Angular Directions
The orthonormal ridgelet basis is built using wavelets in both the ridge and angular directions. Other bases can be built within this framework, by simply varying the base resolution level of the angular wavelets as a function of the resolution of the ridge wavelets. In the ortho-ridgelet case, we start with ridge wavelets ipj^k{i)for i,/c Z and with angular wavelets w^^{6). The key decision is that we limit i> j , and we have e = 1 for i > j , while G {0,1} for i = j . To interpret these choices, focus on the situation where i = j and e = 0. Hence we are looking at a tensor product based on the male-gendered wavelet at scale j , w^^^e) ’ i>j,k{t) Note that for £ = 0, w]^^{e) is a "bump", integrating to 2^/^ ^he tensor product is thus localized near 6 = i/2^, and has each constant-^, varying-^ profile proportional to the wavelet ijjj^k-
44
IMPLEMENTATION OF RIDGELET PACKETS
This has an interpretation in terms of the tihngs mentioned in the introduction. Indeed, the successive terms
{ei... represent an orthogonal set of functions locahzed in the vicinity of the angular interval [27r^/2^,27r(^ + l)/2^)- Thinking now in the polar frequency domain, the orthogonal functions
for k eZ, create an orthogonal set localized near the ’tile’ [27r^/2^, 27r(^+1)/2^) x [2-^,2^"’"^). In this way, the formula (1.1) for ridgelets implements the tiling shown in Figures 2.1 and 2.6. This discussion suggests how we can derive a formula for basis elements which implement a quasi-FIO tiling. The idea is to use the same framework, only instead of taking the base resolution io j A- c, we take ZQ = j/2 -\- C . The orthonormal functions ^io,e
^ ’ijM^^lo.i
^ V^i.fc’ W o + 1,2^ fi V^j,fc’ W o + 1,2^+1 ^ ’^J.k.
form an orthogonal set of functions all localized near the ’tile’ [27r^/2* , 27r(^ -Il)/2*o) X [2-^, 2-^+^). Hence, the angular subdivision is not nearly so fine, so that at frequency 2-^, we have tiles of height 2^ and width 2n 2~^/^. Many other possibilities could be considered. Perhaps the simplest is to pick the base angular resolution fixed, independent of j : io = 3 (say). Then the functions
form an orthonormal set, each one localized near a tile of fixed width 27r/8 and height 2^. All these constructions have the qualitative property that a given basis element generated from exhibits an orientation localized to directions near 9i^e = 27r£/2*, and a scale normal to that direction of scale 2~^. To see this, note that the basis element is generated by which can be written as PA (a:) = ^ n^Uxi cos(^) + X2 sm{e))wl,{e))/2de -f ^ fi^f ki^i cos(^ -f TT) -f X2 sin(^ + 7r))wl^{e 4-
n))/2d0,
where ip’^j^ = A’^ipj^k- This shows that each basis element is an angular "average" of ridge functions V^^^(xi cos(^) -h X2 sm{6)) over ^ in a 2~* vicinity of 6i£. The only "location-like" parameter here is k, which sets the position of the underlying ridge near xi cos{9) + X2 sin(^) = tj^k, where
RIDGELET PACKET CONSTRUCTION
45
tj^k = k/2K It follows, in particular, that the system, while offering an orientation, a ridge, and a scale parameter, does not offer a traditional location parameter. In later sections, we will give illustrations of digital frame elements inspired by these constructions. 2.5.4 Radon Approach: Wavelet Packets in the Ridge Direction By Wavelet Packets basis in the ridge direction, we mean the use of the principle of local cosine bases of Coifman and Meyer (1989) applied in the radial frequency variable A. In our opinion, the best references for understanding this construc› tion are the article of Auscher, Weiss and Wickerhauser [1] (in English) and the monograph of Yves Meyer (in French) [26] . We note that, to avoid confusion, our notation is nonstandard, since typically the term wavelet packets refers to bases constructed by applications of special filter banks, and the specific idea we discuss now cannot be implemented through finite-length filter banks. One chooses a partition in the 1-dimensional frequency variable according to the general rules of symmetric recursive dyadic partitioning. One takes the initial sequence of breakpoints {2^ : j > 0} and views this as referring to the partition { ( - l , 0 ] u [ 0 , l ) , ( - 2 , - l ] U [ l , 2 ) , ( - 4 , - 2 ] u [ 2 , 4 ) , . . .} and one considers all partitions reachable from this one by repeatedly applying midpoint splits to a pair in the partition, producing a new pair. For example, we could split [0,1) into [0,1/2) , [1/2,1 ) and also ( - 1 , 0] into ( - 1 / 2 , 0 ] , ( - 1 , - 1 / 2 ] and then replace ( - 1 , 0] U [0,1) in the initial partition by the pair ( - 1 / 2 , 0] U [0,1/2 ) and ( - 1 , - 1 / 2 ] U [1/2,1) , producing a new partition of this sort. One may, if one likes, impose a balance condition on partitions, allowing only partitions in which adjacent intervals differ by a factor of two in length. Associated with any partition reachable in this way is an orthonormal basis, produced as follows. To each interval / in the partition we associate a window wi{X) which is smooth and nonnegative, 1 near the center of the window, and vanishing outside a slight enlargement of the window. The squares of the windows together should form a partition of unity: YJI’^’]W 1’ ’^^- Then we define a collection of trigonometric functions (j)i^k{^)associated with the window which make an orthonormal set for L’^{I)- If the interval does not abut 0, these functions are chosen from the DCT-FV system. If the window does abut 0, we view / and / as a single interval / and these functions are chosen from the DST-III system. The basis is then the collection
where /x = (/, k) is an index pair unifying the indices / and k. Some examples of this construction are quite familiar. Meyer Wavelets. If we use breakpoints {2^ : cx ) < j < oo}, we get a partition into intervals Ij = ( - 2 - ^ + \ - 2 ^] U [2-^,2^+^). The basis element U^ with index fjL = {Ij, k) is then precisely an orthonormal Meyer wavelet ipj^k-
46
IMPLEMENTATION OF RIDGELET PACKETS Wilson-like Basis. If we use breakpoints { 1 , 2 , 3 , . . . }, we get a partition into intervals Ij = {-j,-{j-l)]u\j-l,j),
(5.3)
and we obtain in this way elements familiar to those who understand [1] and who have studied the construction of the Wilson basis [14] . In effect, the basis elements are windowed sinusoids of frequency roughly j , exponentially localized near a position proportional to k in the time domain. Other examples of the construction may seem more exotic: Intermediate
Coherence Length. Suppose we use breakpoints {1,2,4,6,8,12,16,20,24,28,32,...},
where in general the 2j-th and 2{j -\- l)-th initial intervals [2^, 2^"’’^) and [2^+1,2-^+^) are recursively subdivided j/2 times, yielding a family of 2^/2 subintervals. Then we obtain a basis where the typical elements supported near high frequency cu have a frequency localized in a band of width about ^/uj and a time localization, according to the Heisenberg principle, to a correspondingly short interval of length about l/y/uj. This says that the time coherence of effects at frequency LJ is not as short as in the wavelet system, where it is proportional to 1/ct;, nor as long as in the Gabor system, where coherent effects last for about one unit of time. Increasing Coherence Length. If we use breakpoints { 1 , 2 , 3 , 4 , 4 1 , 5 , 5 ^ , 6 , 6 ^ , 7 , 7 1 , 8 , 8 ^ , 8 ^ , 8 ^ , 9 , . . . }, where in general the j-th initial dyadic interval [2-^,2^’^^) is subdivided dyadically through 2j 2 complete generations, then we obtain a basis where the typical elements supported near high-frequency UJ have a frequency localized in a band of width about l / \ / u; and a time localization, according to the Heisenberg principle, to a correspondingly short interval of length about y/uj. This says that the time coherence of effects at frequency to is not as short as in the Gabor system, where coherent effects last for about one unit of time, nor as long as in the Fourier system, where coherent effects last for infinite time. With any of these choices, we can then subdivide the angular variable in a fashion subordinate to the ridge frequency variable, according to the same principle as in the ortho ridgelet basis. Let V^\^ be simply the periodized Meyer wavelet as in the ortho ridgelet basis - under a low frequency constraint to be determined below - and let W^ be a wavelet packet basis based on a different partition than the dyadic wavelet partition. Consider for example the Wilson-like basis partition (5.3) based on integer breakpoints. Choose the low-frequency constraint on K;^^ SO that i > j i.e. so that the angular scale is finer than the ridge frequency. It results that for j > 0, the px are bandlimited and of rapid decay.
RIDGELE T PACKET CONSTRUCTIO N
47
For each px we have from (4.3) the formula
which gives the exphcit formula /^A(X) = ^ !{U^{x, cos(^) + X2 sm{e))wl,{e))/2de + ^ /(^A^(^i cos(^ + TT) + X2 sin(^ 4- ^ ) ) < , (^ + iT))l2de. Now roughly speaking, / 7 ^ , with /i = (j, /c) is a sinusoid of frequency j , say local› ized to an interval of length ^ 1 situated near t ^ k. Hence, the ridge function /7+(xi cos(^)+X2 sin(^)) is localized near xi cos{6)-\-X2sm{6) = k. Similarly wl^^{6) is localized near 0 = 6i^£ = 27r^/2\ It follows that the integrand is large for x in a range where x ^ {kcos{6i^£),ks\n{6i^e)), so we may expect that for ^ == 0 and i = io{fi), the function px concentrates near x ^ (/ccos(^i^^), A:sin(^i/)). For e = I and i > iQ, one must argue by cancelation, which is more subtle. 2.5.5 Polar Fourier Approach: Wavelet 0 Cosine Packet Let now (U^) be simply the standard Meyer wavelet basis for R, just as in the ortho-ridgelet basis (1.1). Let Vj^|^ however, be a cosine packet basis based on a recursive dyadic partition of the angle domain. Consider for example, a partition based on dividing the angular domain into 2^ equal sectors. Use the cosine packets subordinate to this partitioning. In the polar Fourier domain, things are very simple, because Meyer wavelets are the Fourier transforms of cosine packets in the frequency domain. Hence we have cosine packets in A times cosine packets in 6. Hence, bivariate cosine packets are being used, subordinate to a recursive dyadic partition. For each px we have the explicit formula PA(:r) = ^ / ( t ^ / J xi cos(^) + x^ sm{e))V,{e))/2de + ^ / ( ^ i ( ^ i cos(^ + TT) + X2 sin(^ + 7r))K(^ + 7r))/2d^. Now, roughly speaking, -0^^ is a wavelet of scale 2~^, localized near t « tj,k = k/2K Hence, the ridge function ip^i^{xicos{0) +a:2sin(^)) is localized near xi cos(^) + 0:2 sin(^) = tj^k- Similarly V^ is localized to an interval Jm,eIt follows that the integrand is large for x in a range where |x| ^ tj^k, and ^ ^ Jm,i- We may expect that the function px is large in the neighborhood where x ^ –{\tj^k\ cos(^i £), |tj,fc| sin(^i £)). Knowing the exact shape of the support requires additional insight. Now we make the more detailed assumption that V^, is a sinusoid in 6 of fre› quency 27r ki localized near the interval Jm^e- This allows us to study the details of Px on its support. For large \tj^k\ and large m, the integrand is approximately of the form 1pj’,,{xi COs{em,l) + X2 Sin{9m,l))’Wj^^ ((9)0fci (0).
Hence it has approximately the form of a wavelet function in the ridge direction and the form of a localized sinusoid in the transverse direction.
48
IMPLEMENTATION OF RIDGELET PACKETS
2.6 IMPLEMENTATION ON DIGITAL DATA Ridgelet Packets bases for digital data can be constructed based on an adaptation of a circle of ideas associated to digital implementation of the Radon transform, polar Fourier transform, and ridgelet transform [3, 17]. 2.6.1 Fast Slant Stack Averbuch et al. (2001) [3] describe a realization of the Radon transform suited for n-hy-n image data, called Fast Slant Stack, claiming that the transform is geometrically accurate and can be implemented by a fast algorithm. The geometric accuracy, for example, implies that the backprojection of a point in Radon space is a true ridge function, i.e. a true object of the form ip{x H- sy), where ’ 0 () is delta-like. This scheme has been deployed by Donoho and Flesia [17] to produce a discrete ridgelet transform based on true ridge functions. In our work for this paper, we have used the same scheme to provide a digital implementation of ridgelet packets. 2.6.2 Pseudopolar FFT Underlying the Fast Slant Stack is a notion of digital polar transform Fourier called pseudopolar F FT in [3] . The key point is to view the digital Fourier domain not as a cartesian grid, but instead as a special pointset as shown in Figure (2.7). Then define the pseudopolar Fourier transform as the evaluation of the Fourier transform n-l
H0=
Yl
^(^i’^2)exp{-(xi6+X2C2)}
a:i ,X2=0
at the An? points of this pointset. The pointset can be viewed as a set of "concentric squares" stacked inside each other (like Chinese boxes), with equispaced points along the boundary of the box. The half-width of a side functions as a pseudo radius, and the arclength along the perimeter of the box functions as a pseudo angular variable. As shown in [3] , the evaluation of the Fourier sum on this set of gridpoints can be performed in order A^ log(A^) flops, where N = n? \s the total number of pixels. The underlying ideas that allow rapid evaluation of these specific gridpoints date back to work of Pasciak [27] , Edholm and Herman [18] , and Lawton [24] , working variously in Medical Imaging and in Synthetic Aperture Radar. The resulting set of pseudopolar values may be viewed as a 2n by 2n array: 2n points on each line through the origin, and 2n lines through the origin, grouped in columns as diflferent lines through the origin, in rows as different ’radii’. We define the pseudopolar F FT P{I) to be the transform from n by n arrays to 2n by 2n arrays produced in this way. Note t h at the pseudopolar grid samples the region near the origin more finely than the region near the boundary. In fact the spacing between samples on line segments varies inversely with distance of the segment from the origin. Define the
IMPLEMENTATION ON DIGITAL DATA 1
’
’
^^,
/ \
"-^
!
"-.^r^, ’.
- - ^ 1 " "^
,
/
- ^
y"
/
’ /^
/ 1
-r y. / . y
-
""
iTtri-" ;! \ K;^r ^ ~-^.
’--" y
^
^ > ’’ ’^^
49
^
/
/
V
-
Figure 2.7. Pseudopolar Fourier Grid
normalized pseudopolar FFT P{I) to be the result of applying a simple rescaling of entries in P[I) according to the square root of the local sample spac› ing in the pseudopolar grid at the corresponding grid point. Since P{I) is a discrete analog of F{r,6) = / ( r c o s ( ^ ) , r s i n ( ^ ) ), sampled at specific points in (r, ^ ) , the definition of P{I) is very analogous to defining in the continuum case F{r,6) r ^ / ^ / (r cos(^),rsin(^)). Recall that / i-^ F is an isometry from L’^{dxdy) to L?[drd6)\ we can’t get quite so much in the digital case: Instead we have C l | | / | | 2 < | | P ( / ) | | 2 < C 2 | | / | | 2,
(6.1)
where empirically, C2/C1 < 1.1. Note t h at if we had C\ = C2 then, up to normalization, P would be an (^ isometry. In that sense, the mapping / ^-> P{I) is a digital analog of the polar Fourier Isometry.
2.6.3 Digital Radon Domain If we apply a 1-dimensional inverse F FT to each column of the 2-D pseudopolar F FT array, we create a new 2n-hy-2n matrix. This matrix is a digital Radon transform of / ; each column gives the sums of (an interpolant of) / along a family of equispaced parallel lines, where the slope of the lines in that family is indexed by the column index (which provide a pseudo-angular variable) [3] . Call the overall mapping S{I) the slant stack. If we apply instead a 1-dimensional inverse F FT to each column of the 2-D normalized pseudopolar F FT array, we create another new 2ri-by-2n matrix. This matrix is a preconditioned digital Radon transform of / . Call the overall transform mapping S{I) the normalized slant stack. Because of the near-isometry property of P{I), we have Ci||/||2 0 such that, (2.12) The constants A and B are called the Riesz bounds. If Un,k is an orthonormal sequence, then ^ = B = 1. If / has unit norm, then
^<Ei/"
B
(2.13)
n,k
If A is much smaller than 1, then the coefficients fn,k in (2.13) can be very large. Con› versely, if B is very large, the coefficients fn,k can become extremely small. In order to obtain decompositions that are numerically stable, and coefficients that neither explode nor vanish, one would like to have Riesz bounds close to 1.
m] 67
BIORTHOGONAL WINDOWED FOURIER BASES
-200
-100
0
100
400
SOO
800
-100
700
0
100
200
400
SOO
600
700
Figure 3.7. Left : real part of Un,k, with an = 0, an+i = 512, and 6 = 128. Right : real part of the dual basis Un,k
3.2.1 Implementation by folding In practice, in order to expand a function / into t he basis Un,k we do not calculate t he correlation between / and the basis {wn.fc}- Instead we transform / restricted to [an S, an-\-i H- S] into a smooth periodic function onto [an, an+i], and expand it into the basis {En,k}- To do this we fold the overlapping p a r ts of the window bn and of t he h u mp function h back into the interval, across the endpoints of the interval, with some folding and unfolding operators. T he advantage of t he procedure is t h at we can preprocess the d a ta with t he folding operators and then use a conventional F FT to calculate t he expan› sion into t he basis {En,k}’ We will follow the construction of Wickerhauser in [1] . U n i t a r y f o l d i n g a n d u n f o l d i n g . We define t he unitary folding operator Uar^ and its adjoint t he unfolding operator U*^ as follows:
’ )m-rc-
UUit)
I m,
an S < t < an, an < t < an -\- S, otherwise;
(2.14)
- t ), if a - (5 < t < a , - t), if a < t < a + (5, otherwise.
(2.15)
f ^ ) / ( 2 a - t ) , if ^ ) / ( 2 a - t ) , if
)/( = < / , T : . a ^ i ^ n , f c > = < Ta^ ,a^ + J , Er.,k >
To summarize, the inner product < f,Un,k > is calculated as follows: 1 calculate Tar^,an^if using the folding operator Ta^.an+i 5 2 expand T a , a ^ i / into En,k using an F F T. Conversely the reconstruction of / from < f,Un,k > is done as follows
( 2 . 2 3)
CHOICE OF THE BELL FUNCTION
69
Figure 3.9. The bell b lives over the interval [—1/2,3/2)
1 recover Tar,,ari+if using an inverse FFT, 2 calculate the smooth orthogonal projection Pan,ar^^if ^arx,an^.i^a ,a ^i/ using the folding operator T^^ ^^ . 11 3 add successive -Pa^.a^x+i/ ^^ recover the complete signal / = Z^nez ^"n,an+i/3.3 CHOICE OF T H E BELL FUNCTION
The key to the success of the brushlet is a bell function bn that satisfies the conditions (2.2) and (2.3), and that has a Fourier transform with a fast decay. In this section we describe several "optimized" bells that were constructed recently [7, 8]. All bells bn are obtained from a prototype bell b by translation by (In+ S)/2, and by dilation by In -\- S :
The prototype bell b is defined on [ 1/2,3/2 ] (see Figure 3.9). 3.3.1 The orthonormal bell of Wickerhauser
The simplest orthonormal bell b is given by : 6 ( a ; ) = s i n ^ ( a : + l / 2 ).
(3.2)
Unfortunately, this bell is not differentiable at x = - 1 / 2 and 3/2, and therefore will have a very slow decay in the frequency domain. Wickerhauser made b(x) in (3.2) smoother by "flattening" it at both end points. For x G [-1/2,1/2] , and s N, he defines 6 ^ ( x ) = s i n ^ ( x. + l/2)
(3.3)
with xo
X
and
Xj = - sin(7ra:j_i).
("^-4)
This bell is symmetric : if x [1/2,3/2] we define b^{x) = 6^(1 - x) (see Figure 3.10). One can show by induction that b^ has 2 ^ - 1 vanishing derivatives at - 1 / 2 and 3/2, and thus b^ £ C^^"^ This bell gives rise to an orthonormal basis. The magnitude of the Fourier transform of 6* (for an interval of N=512 samples) is shown in Figure 3.10. As s increases the main lobe becomes slightly wider, but the side lobes become much smaller. 3.3.2 Optimized bell of Matviyenko
Matviyenko constructed some optimized bells [8]. He considered the approximation of the constant p = 1 over the interval [an,an+i) generated from the first K coefficients < A : = 0 , . . . , K - 1:
70
STEERABLE WAVELET PACKETS
Figure 3.10. Left : orthonormal bells 6^, s — 1,2,3. Right : magnitude of the Fourier transform of 6* K -l
^
X ] Pn,k’ipn,k
. w i th
Pn.fc =
/ P^{^) ’^n,k{x)
dx
(3.5)
"^
l. n
T he norm of t he residual error is
X ] Pn,k ^n,fc||2
(3.6)
Matviyenko argues in [8] t h at this family of bells should yield a sparse representation of oscillatory signals of the form c(x) = cos(u;3: -f (/?). Instead of reproducing exactly p with one coefficient, Matviyenko designed a family of bells t h at minimize the residual error (3.6). Matviyenko shows t h at minimizing t he sum oo
Y l \p’’r^,kl2
(3.7)
is related to minimizing t he residual error (3.6). He then finds t he bell 6 t h at minimizes (3.7) under t he constraint : b{x) -4- b{-x) = 1
for all x G [0,1/2] .
(3.8)
T he solution of t he optimization problem is a bell b (x) given by : f ^ ( 1 + E f j o’ 9k sm{k + l/2)7rx) if - 0 . 5 < x < 0.5 ^"^ W = S 1(1 + Ef="o’(-l)^^fc cos(A: H- l/2)7rT) if 0.5 < a: < 1.5 [ 0 otherwise
(3.9)
T he gk are calculated numerically in [8] . K influences the steepness of the bell. All bells 6 K are bounded by 1, and t he dual bells IK are bounded by (>/2 4- l ) / 2 . These bounds guarantee t h at the Riesz bounds will be ^4 = 1 and B = 2 for all K. Figure 3.11 shows t he bell b^ and the dual bell b^ ioi K = 1 and 3. T he magnitude of t he Fourier transform of the bells b^ and b^ (for an interval of N = 5 12 samples) is shown in Figure 3.12. As K increases t he side lobes become much smaller. This observation, and equation (3.7) seems to indicate t h at a large K should provide a smaller error, and a better frequency resolution. In practice, as shown in [9] , small values of K often provide
CHOICE OF THE BELL FUNCTION
71
Figure 3.11. Matviyenko's optimized bell b^, and dual bell b^, for K =^ 1 (left), K := 3 (right)
Figure 3.12. Left : Matviyenko's bell : magnitude of the Fourier transform of b^. Right magnitude of the Fourier transform of b^
better performances. One can either choose to use b^ or b^ to compute the coefficients. Because b^ is optimized to minimize the residual error, one should use b^ for the analysis and b for the reconstruction. 3.3.3 Modulated Lapped Biorthogonal Transform ( M L B T )
A simple way to smooth sin f (x + 1/2) at both end-points is to take the square of the bell. As a result the following bell is in C^ (M) :
b{x):
sin^ [ f (X + 1/2)] = l - c s ^ j ^ + l/2) if ^ g [_i/2,1/2 ] b{x)= b{l-x) i f x e [1/2,3/2]
(3.10)
Malvar proposed in [7] the following dual bell : r 1 - cos(7r(3: + 1/2)°] + 0 -.
b{x)= I
’ 2 + 0’ ’ ’
\b{x) = b{\-x)
^ , , /o , / , i
’f ^ ^ 1-1/2,1/2]
if xG [1/2,3/2]
(3.11)
72
STEERABLE WAVELET PACKETS
Figure 3.13. MLBT's bell 6 and the associated dual bell b. Right : magnitude of the Fourier transform of b and 6 For a = 1 and /? = 0 we clearly find the square of sin f (a: + 1/2). T he bell is C\R) if a > 1. Because the bell is smoother, it will have a faster decay in the Fourier domain, and a better frequency selectivity of the associated basis functions [7] . T he analysis bell b is derived from the dual bell b using (2.6). In this paper we use t he following values of the parameters : a 0.85, and ^ = 0. Figure 3.13 shows t he graphs of 6 and 6. T he two bells are similar to the optimized bells of Matviyenko. T he Riesz bounds are ^ = 1 and B = 1.458. T he magnitude of the Fourier transform of the analysis bell and the dual bell (for an interval of N=512 samples) is shown in Figure 3.13. T he Fourier transform of b has a wider main lobe t h an the Fourier transform of 6 b ut a better stop-band attenuation (smaller side lobes). 3.4 BIORTHOGONAL BRUSHLET BASES Inspired by the duahty between local trigonometric bases and wavelet packets for a function of one variable [12] , we propose to construct wavelet packets of two variables in t he Fourier domain using trigonometric bases. We replace the local cosine bases by local Fourier bases. 3.4.1 One dimensional case Let / G I/^(M), and let / be the Fourier transform of / . We define a cover of t he frequency n = -\-oo
.
.
(4.1) uJn is the center of each interval of size / ^ Let Un,k be the local Fourier basis associated with this cover. We expand / into the basis Un,k / = 2^
fn,kUn,k
(4.2)
We then take the inverse Fourier transform . Let 7pn,k the inverse Fourier transform of Un,k’
^the center of the interval was called Cn when the analysis was performed in the original time domain, since we work in the frequency domain we prefer to use Un
BIORTHOGONAL BRUSHLET BASES
73
Since the Fourier transform is a unitary operator, we obtain a new pair of biorthogonal bases by applying the inverse Fourier transform on Un,k and Un,kL e m m a 3 . 4 .1 {ilJTn,j,’ipn,k j,k,m,n
G Z} are biorthogonal
bases for L^(]R).
We call {ipn,k} and 1, wavelet approximation is suboptimal. It is important to note t h at the smoothness of the discontinuity curve is irrelevant to the performance of the wavelet approximation. How can we improve the performance of the wavelet representation when the disconti› nuity curve is known to be smooth? Simply looking at the wavelet scheme in Figure 4.1(a) suggests t h at rather t h an treating each significant wavelet coefficient along t he disconti› nuity curve independently, one should group the nearby coefficients since their locations are locally correlated. Recall t h at at the level j , the essential support of the wavelet basis functions has size 2~^. T he curve scahng relation (2.1) suggests t h at we can group a b o ut c2^’^ nearby wavelet basis functions into one basis function with a linear structure so
REPRESENTING 2-D PIECEWISE SMOOTH FUNCTIONS
87
t h at its width is proportional to its length squared (see Figure 4.1). This grouping oper› ation reduces the number of significant coefficients at the level j from 0(2^) to 0{2^^^). Consequently, this new representation provides the same approximation error as wavelets in (2.4) with only M’ ~ Zl/^o^^^^ ^^ 0{2^^^) coefficients. In other words, the M - t e rm non-linear approximation using this improved wavelet representation decays like 11/ - /^^P^ ^^^-^^-^’^*)||2 ^ (9(M~^).
(2.6)
Comparing with (2.5), we see t h at for C^ discontinuity curves, the new representa› tion is superior compared to wavelets and in fact achieves the optimal rate. T he curvelet system achieves this optimality using a similar argument. In the original curvelet con› struction [5] , the linear structure of the basis function comes from the ridgelet basis while the curve scaling relation is ensured by suitable combination of subband filtering and windowing. 4.2.3 A filter bank approach for sparse image expansions T he original definition of the curvelet transform as described in Section 4.2.1 poses several problems when one translates it into the discrete world. First, since it is a block-based transform, either the approximated images have blocking effects or one has to use overlap› ping windows and thus increase the redundancy. Secondly, t he use of ridgelet transform, which is defined on a polar coordinate, makes the implementation of the curvelet trans› form for discrete images on rectangular coordinates very challenging. In [8-10] , different interpolation approaches were proposed to solve the polar versus rectangular coordinate transform problem, all required overcomplete systems. Consequently, the version of the discrete curvelet transform in [9] for example has a redundancy factor equal to 16 J -f- 1 where J is t he number of multiscale levels. Comparing the wavelet scheme with the curvelet scheme in Figure 4.1, we see t h at t he improvement of curvelets can be loosely interpreted as a grouping of nearby wavelet coefficients, since their locations are locally correlated due to the smoothness of the discontinuity curve. Therefore, we can obtain a sparse image expansion by first applying a multiscale transform and then applying a local directional transform to gather the nearby basis functions at the same scale into linear structures. In essence, we first use a wavelet-hke transform for edge detection, and then a local directional transform for contour segment detection. Interestingly, this approach is similar to the popular Hough transform [11] for line detection in computer vision. W i th this insight, we proposed a double filter bank approach for obtaining sparse expansions for typical images with smooth contours (Figure 4.2(b)). In our newly con› structed pyramidal directional filter bank [12] , the Laplacian pyramid [13] is first used to capture the point discontinuities, then followed by a directional filter bank [14] to link point discontinuities into linear structures. T he overall result is an image expansion using elementary images Uke contour segments, and thus it is named the contourlet transform. T he contourlet transform offers a flexible multiresolution and directional decomposi› tion for images, since it allows for a different number of directions at each scale. For the contourlet transform to satisfy the anisotropy scaling law^ as in the curvelet transform, we simply need to impose t h at the number of directions is doubled at every other finer scale of t he pyramid [12] . T he contourlet transform is almost critically sampled, with a small redundancy fac› tor of up to 1.33. Comparing this with a much larger redundancy ratio of the discrete implementation of t he curvelet transform [9] mentioned above, the contourlet transform
88
CONTOURLETS
Subband Decomposition
(a) Curvelet transform
(b) Contourlet transform Figure 4.2. Two approaches for dealing with images having smooth contours, (a) Curvelet transform: block ridgelet transforms are applied to subband images, (b) Contourlet transform: image is decomposed by a double filter-bank structure, where the first one captures the edge points and the second one links these edge points into contour segments. The gray areas in the boxes represent the support sizes of the filters.
is much more suitable for image compression. Furthermore, the contourlet transform can be designed to be a tight frame, which implies robustness against the noise due to quan› tization or thresholding. Finally, the contourlet transform is implemented efficiently via iterated filter banks with fast algorithms. In the next section we will describe such a filter filter bank in detail.
PYRAMIDAL DIRECTIONAL FILTER BANK
89
4.3 PYRAMIDAL DIRECTIONAL FILTER BANK 4.3.1 Multiscale decomposition One way of achieving a multiscale decomposition is to use a Laplacian pyramid (LP) as introduced by B u rt and Adelson [13] . T he LP decomposition at each step generates a sampled lowpass version of the original and the difference between the original and the prediction, resulting in a bandpass image (see Figure 4.3(a)). T he process can be iterated on the coarse version.
H
(a)
(b) Figure 4.3. Laplacian pyramid scheme, (a) Analysis: the outputs are a coarse approximation c and a difference d between the original signal and the prediction. The process can be iterated by decomposing the coarse version repeatedly, (b) The proposed reconstruction scheme for the Laplacian pyramid. A drawback of t he LP is t he implicit oversampling. However, in contrast to the crit› ically sampled wavelet scheme, the LP has the distinguishing feature t h at each pyramid level generates only one bandpass image (even for multidimensional cases) which does not have "scrambled" frequencies. This frequency scrambling happens in t he wavelet filter bank when a highpass channel, after downsampling, is folded back into t he low frequency band, and thus its spectrum is reflected. In the LP, this effect is avoided by downsampling t he lowpass channel only. In [15] , we study the LP using the theory of frames and oversampled filter banks. We show t h at the LP with orthogonal filters (that is, h[Ti]= ^[ n ] and g{n] is orthogonal to its translates with respect to the subsampling lattice) is a tight frame with frame bounds equal to 1. In this case, we suggest the use of the optimal linear reconstruction using the dual frame operator, which is symmetrical with the forward transform (see Figure 4.3(b)). Note t h at this new reconstruction is different from the usual reconstruction and is crucial for our contourlet expansion described later.
90
CONTOURLETS
4.3.2 Directional decomposition
In 1992, Bamberger and Smith [14] introduced a 2-D directional filter bank (DFB) that can be maximally decimated while achieving perfect reconstruction. The DFB is effi› ciently implemented via a /-level tree-structured decomposition that leads to 2^ subbands with wedge-shaped frequency partition as shown in Figure 4.4. (7r,7r)
(—TT, — T T )
Figure 4.4. Directional filter bank frequency partitioning where / = 3 and there are 2^ = 8 real wedge-shaped frequency bands.
The original construction of the DFB in [14] involves modulating the input signal and using diamond-shaped filters. Furthermore, to obtain the desired frequency partition, an involved tree expanding rule has to be followed (see [16, 17] for details). As a result, the frequency regions for the resulting subbands do not follow a simple ordering as shown in Figure 4.4 based on the channel indices. In [27], we propose a new formulation for the DFB that is based only on the QFB’s with fan filters. The new DFB avoids the modulation of the input image and has a sim› pler rule for expanding the decomposition tree. Intuitively, the wedge-shaped frequency partition of the DFB is reafized by an appropriate combination of directional frequency splitting by the fan QFB’s and the "rotation" operations done by resampUng, which are illustrated in Figure 4.5 and Figure 4.6, respectively.
yo
yi
Figure 4.5. Two-dimensional spectrum splitting using the quincunx filter banks with fan filters. The black regions represent the ideal frequency supports of each filter.
Using the multirate identities, we can transform a /-level tree-structured DFB into a parallel structure of 2’ channels with equivalent filters and overall sampling matrices. Denote these equivalent synthesis filters as G^ , 0 < A: < 2 \ which correspond to the subbands indexed as in Figure 4.4. The oversampUng matrices have diagonal form as:
PYRAMIDAL DIRECTIONAL FILTER BANK
(a)
91
(b)
Figure 4.6. Example of a resampling operation that is used effectively as a rotation operation for the DFB decomposition, (a) The "cameraman" image, (b) The "cameraman" image after being resampled. ^(,) ^ ^
f d i a g ( 2 ^ - \ 2)
for 0 < /c < 2’"^
\ d i a g ( 2 , 2 ^ - i)
for 2^-^ < /c < 2^
which correspond to t he basically horizontal and basically vertical subbands, respectively. W i th this, it is easy to see t h at the family \9k\n-S^,!^m]]
,
(3.2)
obtained by translating t he impulse responses of the synthesis filters G\. over the sam› pling lattices S\^\ is a basis for discrete signals in /^(Z^). This basis exhibits b o th directional and locahzation properties. Figure 4.7 demonstrates this fact by showing t he impulse responses of equivalent filters from an example D F B. These basis functions have linear supports in space and span all directions. Therefore (3.2) resembles a local Radon transform and the basis functions are referred to as Radonlets. 4.3.3 Multiscale and directional decomposition T he directional filter bank (DFB) is designed to capture the high frequency components (representing directionality) of images. Therefore, low frequency components are han› dled poorly by the D F B. In fact, with the frequency partition shown in Figure 4.4, low frequencies would "leak" into several directional subbands, hence D FB does not provide a sparse representation for images. To improve t he situation, low frequencies should be removed before using the D F B. This provides another reason to combine the D FB with a multiresolution scheme. Therefore, t he LP permits further subband decomposition to be appUed on its band› pass images. Those bandpass images can be fed into a D FB so t h at directional information can be captured efficiently. T he scheme can be iterated repeatedly on the coarse image (see Figure 4.8). T he end result is a double iterated filter bank structure, named pyramidal directional filter bank ( P D F B ), which decomposes images into directional subbands at multiple scales. T he scheme is flexible since it allows for a different number of directions at each scale. W i th perfect reconstruction LP and D F B, the P D FB is obviously perfect reconstruc› tion, and thus it is a frame operator for 2-D signals. T he P D FB has the same redundancy
92
CONTOURLETS
' .'
/
/
\
," ,"' .,-'
y
\
\ \
\
Figure 4.7. Impulse responses of 32 equivalent filters for the first half channels of a 6-levels DFB that use the Haar filters. Black and gray squares correspond to + 1 and - 1 , respectively. Because the basis functions resemble "local lines", we call them Radonlets.
(7r,7r)
\ (2,2)V
^ multiscaledec.
directional dec.
i-n,-n)
(a)
(b)
Figure 4.8. Pyramidal directional filter bank, (a) Block diagram. First, a standard multiscale decomposition into octave bands is computed, where the lowpass channel is subsampled while the highpass is not. Then, a directional decomposition with a DFB is applied to each highpass channel, (b) Resulting frequency division, where the number of directions is increased with frequency.
as the L P: up to 3 3 % when subsamphng by two in each dimension. Combining the tight frame and orthogonal conditions for the LP and D F B, respectively, it is easy to obtain the following result for the P D FB [12] . P r o p o s i t i o n 4 . 3 .1 The PDFB is a tight frame with frame orthogonal filters are used in both the LP and the DFB.
bounds equal to 1 when
Let us point out t h at there are other multiscale and directional decompositions such as t he cortex transform [18] and the steerable pyramid [19] . Our P D FB differs from those
MULTIRESOLUTION ANALYSIS
93
in t h at it allows different number of directions at each scale while nearly achieving critical sampling. In addition, we make the link to continuous-domain construction in Section 4.4 4.3.4 PDFB for curvelets Next we will demonstrate t h at a P D FB where the number of directions is doubled at every other finer scale in the pyramid satisfies the key properties of curvelets discussed in Section 4.2.1. T h at is, we apply a D FB with [no-j/2\ levels or 2^’’o~^/^^ directions to the bandpass image bj of the LP. Thus, the P D FB provides an efficient discrete implementation for the curvelet transform. (7r,7r)
(—TT, — T T )
Figure 4.9. Resulting frequency division by a pyramidal directional filter bank for the curvelet transform. As the scale is refined from coarse to fine, the number of directions is doubled at every other octave band. A LP, with downsampling by two in each direction, is taken at every level, providing an octave-band decomposition: the LP bandpass image bj at the level j creates a subband with a corona support based on the interval [7r2~-^, 7r2"^^^), for j == 1, 2 , . . . , J. Combin› ing this with a directional decomposition by a D F B, we obtain the frequency tiling for curvelets as shown in Figure 4.9. In terms of basis functions, a coefficient in the LP subband bj corresponds to a basis function t h at has local support in a square of size about 2^. Then, a basis function from a D FB with [no j / 2 j iterated levels has support in a rectangle of length about 2’^ ~-^^^ and width about 1. Therefore, in the P D F B, a basis function at the pyramid level j has support as: width ^2’ and / e n ^ t / i^ 2 ^ 2 " - ^ / ^ = 2 " 2 ^ / ^ (3.3) which clearly satisfies the anisotropy scaling relation (2.1) of curvelets. Figure 4.10 graphically depicts this property of a P D FB implementing a curvelet transform. As can be seen from the two pyramidal levels shown below, the support size of the LP is reduced by four times while t he number of directions of the D FB is doubled. W i th this, t he support size of the P D FB basis images are changed from one level to next in accordance with t he curve scaling relation. Also note t h at in this representation, as t he scale is getting finer, there are more directions. 4.4 MULTIRESOLUTION ANALYSIS As for t he wavelet filter bank, the iterated P D FB can be associated with a continuousdomain system, which we call contourlet. This connection will be made precise by studying
94
CONTOURLETS
DFB
LP
Contourlet
O Figure 4.10. Illustration of the contourlet basis images that satisfy the curve scaling relation. From the upper line to the lower line, the scale is reduced by four while the number of directions is doubled.
the embedded grids of approximation as in the multiresolution analysis for wavelets [20, 21]. T he new elements are multiple directions and the combination with multiscale. 4.4.1 Multiscale Suppose t h at the LP in the P D FB uses orthogonal filters and downsampling by two is taken in each dimension. Under certain conditions, the lowpass filter G in the LP uniquely defines an orthogonal scaling function (^(^) G L^(M^) via the two-scale equation [22, 3] (t>{t)= 2 ^
g[n](t){2t- n)
nGZ2
Denote i>j,n = 2
t-2’n 2^
j
^Z,ne\
(4.1)
T h en the family {(t>j,n)n^i2 is an orthonormal basis of Vj for all j G 2 T he sequence of nested subspaces {V?}.^ ^ satisfies the following invariance properties: Shift invariance:
f{t) e Vj ^
Scale invariance:
f{t) G Vj 2
(4.11)
98
CONTOURLETS
(4.12)
p i l W = />*!!(«-2^Sl"n) Proof: By direct substitution and a change of variable.
O
Consequently, the subspaces W^j^\ ^ satisfy the following shift invariant property:
/WeW^ilV,
^
/(t - 2^sj,m{t)
1=0 n
= E
\mGZ2
/
( E E ffl"i2"+’^’i/’i"’ - "11 ’^^’"(*)
1 is a given constant, then we t r e at t he current stencil as a smooth stencil. Otherwise, we conclude t h at there are discontinuities contained in it. T he choice of constant r depends on t he grid size A x, and also t he intensity of t he j u m p s. In fact, t he ratio between a high frequency coefficient at t he rough regions and that at the smooth regions is of the order of \[f^’^\x)]\0{Ax^’^-P^). When Ax becomes small, this ratio is large. We can choose r as any number such t h at (1 + 0{Ax)) r|/?j,i_i| for each stencil.
118
ENO-WAVELETS
T he above described detection method may not be rehable if the function is polluted by noise, especially when t he noise is "large". This is because the high frequency coeffi› cients ^ ’s may not be able to measure the correct order of smoothness of the functions. Indeed, the high frequency coefficients have the order \\f^^\x)^an^^\x)\\0{d^x^), where n ( x) is the random noise and a a positive number indicating the noise level. In general, t he derivatives of the noise n^^"^ (x) have large values. T he noise term arv^^ (x) can dom› if the noise level a is large and thus, the high frequency inate the function term f^^\x) coefficients /?’s may not be able to detect certain discontinuities, e.g. if the j u mp is small or t he discontinuity is in the higher derivatives. In this situation, we need to use heuristics to locate t he exact position of the essential discontinuities [11] .
5.2.3 A Simple Example In t he last p a rt of this section, to better illustrate the idea of the construction of E N Owavelet transforms, we give a simple example in the ENO-Haar case. We consider comput› ing the transform coefficients of the following initial d a ta containing two discontinuities at [0,1] and [2,10] respectively: ( 0 0 0 1 1 1 2 10 11 1 2 ) . T he s t a n d a rd Haar produces t he low and high frequency coefficients:
We notice t h at comparing to their neighbors, there are two relatively large high frequency coefficients corresponding to the two jumps. T he corresponding linear approximation by setting /3 = 0 is: ( 0 0 0.5 0.5 1 1 6 6 11.5 1 1 . 5 ) , which does not re^^ver the discontinuities correctly. Using t he ENO-Haar wavelet, we break the initial d a ta sequence into three smooth pieces as shown in the following two rows: / y 1 111 w \ \000x 2101112^’ where x, y, z and w are some smooth extensions of the corresponding pieces. In fact, we extend x in a way such t h at t he low frequency coefficient 6:2 (boxed in (2.13)) based on t he stencil (0, x) is the same as the previous a i , which is based on the stencil (0,0) giving X = 0. Similarly, we extend y in a. way such t h at the high frequency coefficient ^2 (boxed in (2.13)) is zero giving y = 1. Therefore we compute the high frequency coefficients $2 based on stencil (0, x) and the low frequency coefficients ^2 based on stencil {y,l) by using the corresponding standard filters giving ^2 = 0 and a2 = - ^ . Similarly, we determine w = 0 according to d4 = ^3 (boxed in (2.13)), then compute ^4 = - ^ , and 2; = 10 by ^4 = 0 and then a^ = ^ . T h us we have the coefficients: 2
2
72 V2
.00
2
72 20
v/2
23
,/?=L
.
f^
1
•
(2-13)
V2 ,
Since we know how we extended a 2, 1^2, ^4 and /?4, we do not need to store them. In fact, we just need to store t he low and high frequency coefficients as:
THEORY: ERROR BOUND AND STABILITY
119
(0;l;^ai^)-/^=(ooo^-^). which have the same storage schemes as the standard Haar wavelet transform. W h en we reconstruct t he Hnear approximation, we can first recover d2, p2^ 0^4 and ^4 by the same way as in the forward transform, and then apply the s t a n d a rd inverse filters to t he smooth d a ta to build the approximation. In fact, in this case the linear approximation is ( 0 0 0 1 1 1 1 10 11.5 1 1 . 5 ). We notice t h at t he first discontinuity is perfectly retained, and t he second one is more accurate t h an t h at of the standard transform, although it is not exactly recovered. More importantly, this approximation preserves the discontinuities sharply in contrast to the s t a n d a rd Haar wavelet which takes the average at the discontinuity. We would like to close this section by making the following two remarks, (i) T he ENO-wavelet transforms are just simple modifications of t he s t a n d a rd wavelet transforms near discontinuities. T he computational complexity of the algorithms remains 0{ri) and they are relatively easy to implement, (ii) Like other wavelet transforms, 2-dimensional or even higher dimensional transforms can be formed by tensor products. In the numerical example section, we will give a 2-dimensional example.
5.3 THEORY: ERROR BOUND AND STABILITY In this section, we present t he ENO-wavelets approximation error bound for piecewise continuous functions and t he stability of t he algorithm. We do not give proof. They can be found in [11] and [46] . Given a function f{x) in L^, in s t a n d a rd wavelet theory [38] [21] [42] , it can be linearly approximated by its projection fj{x) in Vj as in (2.6) and (2.7). This linear approximation has a s t a n d a rd error estimate which we s t a te in the following theorem, see also [42] . T h e o r e m 5.3.1 Suppose the wavelet ip{x) generated by scaling function 0(x) has p van› ishing moments, fj{x) is the approximation of f(x), which has boundedp-th order deriva› tive, in Vj with basis (f)j^k[x), then, (3.1)
\\S(x)-SA^)\\, (1.4)
INTRODUCTION AND BACKGROUND
139
where Z is the normaUzing constant, U\ is the energy function that depends on a hyperparameter A (or a set of hyperparameters, in some cases) and takes the form t^A(/) = A ^ f / . ( / ) .
(1.5)
i
Here each C/t(/) is a local energy function depending on the value of /(i ) and the values of /(/c) for k 6 ^(i), where d(%) is a neighborhood system of pixel i. A commonly used family of priors is the generalized Gaussian MRF (GGMRF) [2] which enjoys the nice property of convexity. The local energy function for GGMRF is
^f""(/)- E ^^i.fc|/»-/Wr
1 < P < 2,
(1.6)
where Wi^k> 0 are the weights that control the space resolution properties [17]. Some non-convex energy functions are also studied. An important example of such energies is the thin plate (TP) prior which has the local energy function given by Urif)
= fhhiif
+ 2h.(if
+ / v(i)’ .
(1.7)
The TP energy function is a discrete approximation of bending energy of a thin-plate. Here and hereafter we denote by /^(i), fv{i), fhh{i), fhvi}) and fvv{i) the discretized version of the first and second order partial derivatives of a image / at pixel i along the vertical or horizontal directions. Both of these models are based on the assumption that images are equally smooth in all directions and therefore they tend to smooth out local regions uniformly, causing lost of edge information. Other priors, such as the weak membrane (WM) and weak plate (WP) priors, extend the smoothing models by allowing for spatial discontinuities [18-21] . In these models, line processes [22] are introduced. For example, the local energy for WM can be written as Ur^’if) = (1 - i.(i))/h(i)’ + (1 - lh{i))U{if + a{lH(i)+ L{i)). (1.8) In this function, the binary variables Ih and Iv form horizontal and vertical line processes, respectively. The last term in the energy function penalizes the creation of the disconti› nuities, charging an amount of a at each such site. A disvantage of such a prior is that it dramatically increases the computational complexity due to the presence of binary variables. When a prior distribution is incorporated with data y, one obtains the posterior distribution:
P K / I . , A ) = P ! M | I ( M.
(1.9)
The MAP approach [22] in this framework is to estimate the parameter / by maximizing the posterior probabihty, or equivalently to find /* = argmin[-L(y|/) + Ux{f)\
(1-10)
This minimization problem is commonly solved using iterative methods. Many algorithms have been developed in the recent years. Most of them are successors of the original EM algorithm [14]. An alternative approach is the ICD method [11] in which the pixels are updated sequentially. Given the current estimate /^^\ the update for the ith pixel is given by
140
MECHANICAL IMAGE MODEL #+’> = argmin hL{y\f) + U,{f)] ,., z>0
^~-’i
^^
(1.11)
where/i’+’^=/(^+i)(z ) and
fi%] = {fr’\---jl’_r\zJ^’^l-’-jl.’’}.
(1.12)
6.2 MATERIALS AND METHODS 6.2.1 A Mechanical Image Model Since t he choice of t he prior energy function Ux(f)is crucial for reconstructing tomo› graphic images, we t ry to understand t he principle for t he design of image priors by building a mechanical image model. We consider t he transverse motion of a surface consisting of a collection of pixtrons. A single pixtron is an analogue of a particle in physics with unknown mass (or charge) which may vary with respect to t he time t. T he image intensity / ( x , t) at location a: G n C M^ and time t G M is modeled as t he vertical displacement of a pixtron, or t he average of displacements of several pixtrons. For a fixed time t, we will often omit t he variable t and denote f{x)t he image instead of / ( x , t). A tomographic reconstruction from a given tomographic d a ta y is interpreted, in this mechanical model, as a computer experiment in which t he system of pixtrons is placed inside a field of potential energy given by L{y\f). T he image prior Ux{f)is then t he kinetic energy of the system of pixtrons
2L
Ux{f.t)= ^ / Hx^t)
dx,
(2.1)
where t he nonnegative function X{x,i) is t he mass of t he pixtron at location x G 11 a nd time t. If t he tomographic d a ta y is determinstic and complete, each pixtron is t r a p p ed by t he potential energy to its equilibrium position such t h at Pf = y. Under such a circumstance, t he image prior U\{f,t) is redundant, a nd we can consider t h at pixtrons a re massless, \{x, i) = 0, or equivalently, t he image f{x,t) is independent of time, t h at is, = 0. Prom either point of view, t he total kinetic energy at any given time t is zero. If t he tomographic d a ta y is random or incomplete, certain pixtrons in t he system gain freedom of motion along the vertical direction. In this case, the least action principle for dynamical systems asserts t h at the system of pixtrons always tends toward its lowest energy configuration, a nd therefore, the optimal estimate / of the reconstruction is /(x,0 = argmin / [-L{y\f)^UxU.t)]dt, />o Jo
(2.2)
for a given positive number T G M. For given boundary conditions of / ( x , t) at t = 0 and t = T, solving t he minimization problem (2.2) with t he total variation method leads to a wave equation for f{x,t). T he solution of this equation describes precisely t he deformation process of the image from / ( x , 0) to / ( x , T)during t he time interval [0,T] . In tomographic reconstruction, our goal is to find t he final image / ( x , T) = / * ( x ) given by t he Bayesian principle (1.10) with an initial guess / ( x , 0 ) . Due to t he fact t h at t he boundary condition / ( x , T) = / * ( x ) is indeterminate, t he variational problem (2.2) itself is ill-posed. We thus weaken t he least action principle in following two aspects.
MATERIALS AND METHODS
141
First, instead of using the total variation method to solve the dynamical problem, we impose an a priori conservation law for the Lagrangian formulation of the variational problem. The choice of such a prior can be made from our knowledge of image processing. For instance, from an axiomatic approach of the multiscale analysis for image processing [10], we can assume that the velocity field {x,t) is of the form: (2.3)
^=F(V^f,Vf,t),
for a continuous function F. In the next subsection, we will consider the kinetic energy induced from a level-set evolution of the image f{x,i). Second, we assume that the time-dependent energy function Ux{f, i) with the velocity field given by (2.3) possesses the following growth condition: whenever
UxU.ti)>Ux{fM).
ii < ^2,
(2.4)
for each fixed function f{x) independent on t. We then apply the direct minimizing method to approximate the solution of (2.2) with the boundary conditions /(x,0) = /( )(x ) which is an initial guess and f{x,T) f*{x)which satisfies (1.10). We divide the interval [0, T] into K subintervals by introducing the points: 0 = tC* < t(^) < .
< t^"-’^ < t"^> = T,
(2.5)
and replace the function f{-,t) by the "polygonal line" with vertices (t( \/( )) , {t^’\f^%...
( i ( ^ - ^ \ / ^ - ^ ) ) , (i(^\/(^^),
(2.6)
where/(’^^(a:) = /(x,t(_^)). We claim that if {/^^^: k = 0,... ,K} is the minimizer of the function
J{f \
..., f^"^) = J2 1-^(3/1/*’*) + Uxif^’Kt^’^)] ,
(2.7)
fc=0
then it is necessary to satisfy the following monotonicity condition: [ - ^ y l / C ’ - " )+ {/,(/) + [/A(/"=\ -L{y\f^’^) + Ux{f^’\t^’’^),
(2.10)
where we have used the fact that L{y\f^^~^^) = L{y\f^^’^)since L{y\f) is independent of t. This shows that J ( / < ’ ’ \ . . . , / ’ ^ ’ ) > ^ ( / < ’ ’ \ . . . , / " ’ - ’ ’ , / " = ’ , / < ’ ’ + ’ * , , . . . , / ,< ’ ’ ’ ) which contradicts the assumption that {/^’^^: A: = 0,...,K}
(2.11)
is the minimizer of
142
MECHANICAL IMAGE MODEL
In practice, because t he desired boundary condition / * ( x ) is unknown, one often searches for a sequence {f^^^:fc= 0 , . . ., K} only satisfying the monotonicity condition (2.8) without requiring f^^\x) f*{x). In particular, the sequence obtained from t he ICD method (1.11) can be viewed as such a sequence.
6.2.2 Kinetic Energy Induced from Level-Set Evolution In this subsection, we propose a novel image prior for Bayesian tomographic reconstruc› tion. T he prior is based on the mechanical image model discussed in the previous sub› section and the level-set evolution driven by the mean curvature motion. For each fixed c G M, t he c-level set Tc of / ( x , t) at time t = 0 is defined by (2.12)
rc = {xeQ: f{x,t = 0)=:c}.
We consider the evolution of Tc due to the transverse motion of pixtrons in Q. To do this, we let x{t) be a differentiable trajectory of a point x on Tc for which the following equation is satisfied:
with a speed function /3. Substituting the expression in the equation dx df -J-{x{t),t) + Vf{xit),t)--{t)
= 0,-
(2.14)
we obtain | [ = -/3|V/|.
(2.15)
An i m p o r t a nt case is /3 = 1, which means the curve x{t) is moving along the normal direction at the unit speed. We also assume t h at all the pixtrons have t he same mass 2XG{t) at a given time t. In this case, t he kinetic energy (2.1) is reduced to Uxcif.t)= l l
2Xc{t)\^{x,t)\ dx= Xc{t)j \Vf{x,t)\^dx.
(2.16)
In practice, t he Gaussian M RF with the local energy function given by (1.6) with p = 2 can be considered as a implementation of (2.16). It is well known t h at G M RF is not eflficient in preserving edges (see [2]) . In order to preserve edges in the image / via M AP reconstruction, we consider t he mean curvature evolution of level sets [7] . T he mean curvature motion drives each level set Tc of / at a speed proportional to its normal mean curvature field /cr^ By a straight› forward computation [23, §5.4.5] , we have
where V|/ =| V / | V - ( | | ^ )
(2.18)
is the second directional derivative along the direction ^ orthogonal to the gradient of / . It then follows from (2.15) with /3 = hcr^ t h at %
= -Vlf.
(2.19)
MATERIALS AND METHODS
143
Assuming t he spatially homogenous mass distribution 2 A E ( 0 ^^ pixtrons, t he kinetic energy induced from the level-set evolution driven by the mean curvature motion is given
^E\
{f.t)= \j
2\E{t)\%{^A
dx = XE{t) j[V\f{x,t)fdx.
(2.20)
For M AP reconstruction, U\^{f,t) allows us to smooth the images discriminatorily so t h at i t only encourages t he smoothing in t he directions along edges while penalizing any a t t e m pt of blurring across edges. In particular, it favors piecewise regions where the image distribution / is of the form Ua,h{x) ’^{a x -\-h) for some function u, constant vector a and constant scalar 6, due to the fact t h at V\ua,h = 0. For edge-preserving image denoising, one needs the joint effort of both Ux^{f) and Ux^{f)^ we thus propose the following image prior based on our image model: UxAf)- Uxaif)+ UxM)- A / (/z|V/(x)|^ + (1 - ^,){Vlf{x)f)dx , where X = XQ -\- X^ and ^ = XG/{XC
(2.21)
+ A^).
6.2.3 Numerical Implementations To compute t he energy function Ux^f)numerically, we first discretize the second-order directional derivative using t he 8-point neighborhood system at each pixel x ^Vt. For an arbitrary fixed point x (xi,0:2) and a direction vector (a,6) with o?-\-h^ 1, t he second-order directional derivative of / along the direction (a, 6) is given by the second-order derivative of the function g{t) f{xi4- at, 2:2 + bt):
|f(O) = a=g(.) + 2 a 6 ^ ( . ) + 6^0(.).
(2.22)
We approximate this derivative using a 9-point difference formula:
§ ( 0 ) ^ E E ^^.^/(^i + ^^’ ^2 + Ihl k=-H = -l
(2.23)
where h is t he step size in b o th directions. To determine t he values of ak,h we substitute t he fourth-order Taylor series of/(xi-f/c/i, X2+//1), /c, / = - 1 , 0 , 1 , into (2.23) and compare t he coefficients of t he partial derivatives in this expression to those in (2.22). By solving t he resulting system of Unear equations we obtain Q;o,o = tto.i = 0:0,-1 =
_2c /i2 ’ c a^
b’-c 0^1,0 = 0-1,0 = 1 -h a6 2h^ 1 ab o i - 1 = a-1,1 = 2^2 o i ,i = o _ i , _i =
c ’ c ’
where c is a free constant (we have 9 unknowns and 8 equations). T he remainder of this approximation is R{x)h?’ for some point x’= (a:i,X2), where R{x^) is given by
144
MECHANICAL IMAGE MODEL
1 ,
d^f
, ,s
d*f
-^(^') dx\dx2 (^') + ^ dxidx\
-i^’U^-r
We then take c = 1 to eliminate the first term of R(x’). Suppose t he image / has the size N = nxn. For each grid index {i,j), 1 < i,j < n we let k in-\- j . T h en it follows from the above computations t h at the difference formula for the second-order directional derivative of pixel k along the direction (a, 6) can be written as Dl,t)f{k) = a^fhhik) + 2abfhv{k) + b^fvv(k), (2.24) where, with fij
=
f{i,j),
fhh{k) = - ^ ( A + i ,, - 2f^,J -h A - i , , ); fvv{k) = {/,,,
+
,-2f^,j+f^,j-i);
fhv{k) = T r ^ ( / i + i , i +i - fi+i,j-i
+ /i-i.j-i)
- fi-i,j+i
We next take (a, 6) to be the direction vector ^ perpendicular to the gradient of / at X = ( x i , 2 : 2 ) £ Q:
[aM-i-
. ^ ’ ": \dxi)
’^’’
.
(2.25)
’^\dx2)
To approximate t he first-order partial derivatives with all of 8 neighbors at each pixel /c, we use t he following difference formulas: fh{^) = 47^[-^’^+l’> ~ /»-l.J + l^{fi + l,3^\ - fr-lj-l
- / i - l , J + l + /i + l , j - l ) ] ,
4 ^ [./"^’-J’ + I ~ / i . J -1 + -^{U^\,3’r\ - fi-l,j-l
-^ fi-l,j + l - / i + l , j - l ) j -
A ( ^)
In our numerical computations, we let e be a small positive number close to machine precision of double floating point and use the following modified approximation of V | / :
Therefore f^^{k) does not always agree with the value of D^f{k) obtained from (2.24),
lifhiky ^ Mkf = 0.
W i th t he first term in (2.21) implemented as G M R F, which is given by (1.6) with p = 2, t he discretization of (2.21) can be read as N
UxAf)= ^Y,[^^E i =l
ked{i)
« ’ . . ^ [ / W - / W ]’ + ( I - M) E ked{i)
/«W].
(2-27)
where Wi^k = (4 -h 2\/2)~^ for t he nearest neighbors k of i and Wi^k == (4 -h 4\/2)~^ for diagonal neighbors k of i [2] . We note t h at the computation of the second inner s um of the energy function requires the values of 24 neighbors for each pixel.
145
RESULTS AND DISCUSSION
10
120
130
140
Figure 6.1. An example of one-dimensional prior functions
In t he implementation of our reconstruction algorithm, we have adapted t he ICD approach (1.11) in which each pixel is u p d a t ed in sequence by minimizing t he corre› sponding one-dimensional log-posterior function. Since t he prior is a nonconvex function, there could be many local minima and maxima in a given searching interval. An example of one-dimensional prior functions is plotted in Figure 6.1, in which there are two minima and one maximum. Therefore in our current implementation, we apply Brent’s method to search for a local minimum of t he one-dimensional target function, -L{y\f) -h Ux(f). Brent’s m e t h od only requires computation of the values of the target function. To fur› ther simplify t he computations, we follow Bouman and Sauer [2] to approximate the log-HkeUhood function (1.2) by the quadratic function for t he emission case:
L{y\f) ^ -\{y
- PffD{y
- Pf) + c{y)
(2.28)
where P is t he forward projection matrix, D == diag{t/^ ^ } , and c{y) is a function of d a ta independent of the parameter set / and therefore can be ignored in later computations. 6.3 RESULTS A N D DISCUSSION 6.3.1 Simulation Results Our algorithm has been tested with t he simulated emission d a t a, which were posted on t he web site h t t p: / / d y n a m o. eon. p u r d u e. e d u / ~ b o u m a n / s o f t w a r /e tomography/. This set of d a ta was generated from a magnetic resonance imaging (MRI) reconstruction image. Figure 6.2(a) shows the original p h a n t om of size 256 x 256 pixels. T he cross-section was assumed to be of size 40.5cm square with an average emission r a te of 0.2mm~^ and a maximum emission rate of 0 . 7 m m ~^ T he projection d a ta was calculated at 128 evenly spaced angles each with 256 parallel projections. T he photon noise were simulated by Poisson random variables with t he appropriate means. We have performed three sets of tests. First of all, a good prior should be capable of preserving significant anatomic features of the original image, which is referred as t he morphological principle. Figure 6.2 shows the test results with noise-free projection
146
MECHANICAL IMAGE MODEL
data and A = 1. We observe that the MAP reconstruction with Gaussian prior (2.16) destroys most of the significant edges after only 5 iterations, while the reconstruction with the proposed edge prior (2.20) keeps almost all sharp edges. For a large number of iterations up to 100, the Gaussian image almost becomes a single gray scale as shown in Figure 6.2(e), while the edge prior continues to preserve most of significant features of the original image. Figure 6.2(f). The second test demonstrates the limitation of the static selection of hyperparameters A and /z. Figures 6.3(c)-(f) show the MAP reconstructions with 6 iterations, in which we have fixed A = 0.001, and /x = 1.0, 0.8, 0.5 and 0.2, respectively. The resulting images are either too fuzzy as can be seen in Figures 6.3(c) and (d) because of a large amount of Gaussian smoothing; or appear to be too edgy, as is shown in Figures 6.3(e) and (f) because the edge prior "enhances" every possible edge including the noise. One of the causes of such phenomena is due to the fact that Brent’s method used in our current implementation only searches for a "convenient" local minimum during the I CD updating.
Table 6.1. Dynamic setting of hyperparameters for Ux Af) 6 5 No. of Iterations 1 2 3 4 A 0.2 0.02 0.002 0.0002 0.0002 0.0002 0.0 0.0 1.0 1.0 1.0 1.0 M Finally, Figure 6.3(a) shows the filtered backprojection reconstruction without using any smoothing filter. The reconstruction contains strong noise by which significant details of anatomy are concealed. Figure 6.3(b) shows the MAP reconstruction after 6 iterations with the proposed prior and a dynamic setting of hyperparameters. The selection of hyperparameters A and fj, for this experiment is listed in Table 6.1. The strategy for the selection is to use the convexity of the Gaussian prior to obtain a fast descent of the reconstructed image to a neighborhood of a "global" minimum during the first few iterations, and then to take advantage of the edge prior to enhance edges of the image during the last 2 iterations. In comparison with pubhshed reconstructions from the same data set [24, 25, 5], the image Figure 6.3(b) produced by the proposed algorithm contains richer and sharper edges as well as fewer artifacts. Further quantitative evaluation is still under the way. 6.3.2 Discussion Our purpose in introducing an explicit mechanical image model is to facilitate adaption of many efficient methods for image processing either using nonlinear PDEs or total variation in the context of Bayesian tomographic reconstruction. Two examples presented in this paper have shown that both old and new image priors could be derived based on our mechanical image model. Other existing image prior models may be derived in a similar fashion (for example, the thin-plate spline model proposed in [3], see (1.7)). New image priors with different characters can also be derived based on this image model. For instance, in order to further "enhance" edges during an image reconstruction, one can consider the "relative mass ratio" fi in (2.21) to be a function of |V/|, which separates the behaviors of pixtrons closed to an edge from those in a smooth region of the image. Therefore, rather than using (2.21), we can investigate an image prior analogue to the anisotropic diffusion proposed for image processing by Alvarez, Lions and Morel [9]:
RESULTS AND DISCUSSION
147
Figure 6.2. Tests with noise-free data (subfigures are labelled from top-left to bottom-right), (a) Original image, (b) Reconstruction with filtered-backprojection. (c)-(f) MAP reconstructions with (c) Gaussian prior, 5 iterations; (d) edge prior, 5 iterations; (e) Gaussian prior, 100 iterations; and (f) edge prior, 100 iterations
148
MECHANICAL IMAGE MODEL
Figure 6.3. Tests with noisy data (subfigures are labelled from top-left to bottom-right), (a) Reconstruction with filtered-backprojection. (b) MAP reconstruction with dynamic hyperparamter setting listed in Table 6.1. (c)-(f) MAP reconstructions with A = 0.001, and (c) M = 1.0; (d) fi = 0.8; (e) ^ = 0.5; and (f) /i = 0.2
REFERENCES t/A(/) = A £ ( 5 ( | G * V / | ) | V / |^ + [ l - g ( | G * V / | ) ] [ V | / ] ’ ) d x ,
149 (3.1)
where the function g{s) > 0 is a nonincreasing function satisfying g{0) = 1, and G is a convolution kernel (for example, a Gaussian function). The relative mass distribution fi = g{\G * V/l) in (3.1) controls the kinetic energy of each pixtron: if | V / | has a small mean in a neighborhood of a pixel x, this pixel x is considered an interior point of a smooth region of the image and the pixtron is therefore more actively moving towards the average position of its neighbors; if | V / | has a small mean in the neighborhood, x is considered as an edge pixel and then the kinetic energy of the pixtron will be so low that it is more likely to be trapped by the potential field L(t/|/), since g{s) is small for large s. More importantly, we beUeve that this mechanical image model may motivate a more systematic approach for Bayesian tomographic reconstruction, not only for developing new families of image priors to suit variety of applications, but also for hyperparameter estimations, since the "physical" meanings of these priors are very clear in our image model. 6.3.3 Conclusion In conclusion, we have proposed an explicit mechanical image model for Bayesian tomo› graphic reconstruction. A new image prior based on the mean curvature motion has been derived from this image model and tested with simulated tomographic data. The perfor› mance of the new image prior meets the requirements of our design. Improving image priors and more quantified tests are the focus of our further work. ACKNOWLEDGEMENTS This work was supported in part by the UM Research Board (#8-3-40641), University of Missouri, USA. REFERENCES [1] R. Leahy and C. Byrne, "Recent developments in iterative image reconstruction for PET and SPECT," IEEE Trans. Med. Imag., vol. 19, no. 4, pp. 257-260, 2000. [2] C. Bouman and K. Sauer, "A generaUzed gaussian image model for edge-preserving map estimation," IEEE Trans. Med. Imag., vol. 2, no. 3, pp. 296-310, 1993. [3] S. J. Lee, A. Rangarajan, and G. Gindi, "Bayesian image reconstruction in SPECT using higher order mechanical models as priors," IEEE Trans. Med. Imag., vol. 4, no. 4, pp. 669-680, 1995. [4] E. Jonsson, S. Huang, and T. Chan, "Total variation regularization in positron emis› sion tomography," Reports 98-48, U.C.L.A. Computationa l and AppHed Mathe› matics, November 1998. [5] T. Frese, C. A. Bouman, and K. Sauer, "Adaptive wavelet graph model for Bayesian tomographic reconstruction," preprint, 2001. [6] D. F. Yu and J. A. Fessler, "Edge-preserving tomographic reconstruction with non› local regularization," preprint., 2001. [7] S. Osher and J. A. Sethian, "Fronts propagating with curvature-dependen t speed: algorithms based on Hamilton-Jacob i formication," Journal of Computation Physics, vol. 79, pp. 21 - 49, 1988.
150
REFERENCES
[8] L. I. Rudin, S. Osher, and E. Fatemi, "Nonlinear total variation based noise removal algorithms," Physics Z)., vol. 60, pp. 259 - 268, 1992. [9] L. Alvarez, P. L. Lions, and J. M. Morel, "Image selective smoothing and edge detec› tion by nonhnear diffusion (II)," SIAM J. Num. Anal., vol. 29, pp. 845 - 866, 1992. [10] L. Alvarez, F. Guichard, P. L. Lions, and J. M. Morel, "Axioms and fundamental equations of image processing," Arch, for Rat. Mech., vol. 123, no. 3, pp. 199 257, 1993. [11] C. Bouman and K. Sauer, "A unified approach to statistical tomography using coor› dinate descent optimization," IEEE Trans. Med. Imag.., vol. 5, no. 3, pp. 480-492, 1996. [12] R. Brent, Algorithms for Minimization Without Derivatives, Prentice-Hall, 1973. [13] A. J. Rockmore and A. Mackovski, "A maximum likelihood approach to emission image reconstruction from projection," IEEE Trans. Nucl. Set, vol. 23, pp. 14281432, 1976. [14] L. Shepp and Y. Vardi, "Maximum Ukelihood reconstruction for emission tomogra› phy," IEEE Trans. Med. Imag., vol. 1, pp. 113-122, 1982. [15] Y. Vardi, L. A. Shepp, and L. Kaufman, "A statistical model for positron emission tomography," J. Am. Stat. Assoc, vol. 80, pp. 8-37, 1985. [16] R. Leahy and J. Qi, "Statistical approaches in quantitative positron emission tomog› raphy," Statistics and Computing, vol. 10, no. 2, pp. 147-165, 2000. [17] J. W. Stay man and J. A. Fessler, "Regularization for uniform spatial resolution prop› erties in penaUzed-Ukelihoo d image reconstruction," IEEE Trans. Med. Imag., vol. 19, no. 6, pp. 601615, 2000. [18] V. E. Johnson, W. H. Wong, X. Hu, and C. T. Chen, "Image restoration using Gibbs priors: Boundary modeling, treatment of blurring, and selection of hyperparameter," IEEE Trans. Patt. Anal. Mach. IntelL, vol. 13, pp. 413-425, 1991. [19] P. J. Green, "Bayesian reconstructions from emission tomography data using a mod› ified em algorithm," IEEE Trans. Med. Imag., vol. 9, pp. 84-93, 1990. [20] K. Lange, "Convergence of EM image reconstruction algorithms with Gibbs smooth› ing," IEEE Trans. Med. Imag., vol. 9, pp. 439-446, 1990. [21] A. Rangarajan, S. J. Lee, and G. Gindi, "Mechanical models as priors in Bayesian tomographic reconstruction," in Maximum Entropy and Bayesian Methods, K. M. Hanson and R. N. Silver, Eds., pp. 117-124. Kluwer Academic Pubhshers, Dor› drecht, 1996. [22] S. Geman and D. Geman, "Stochastic relaxation, gibbs distributions and the Bayesian restoration of images," IEEE Trans. Patt. Anal Mach. IntelL, vol. 6, pp. 721-741, 1984. [23] Ming Jiang, "Mathematical models in computer vision and image processing," 1999, Lecture Notes, Department of Information Science, School of Mathematics, Peking University. [24] T. Frese, C. A. Bouman, and K. Sauer, "Multiscale models for Bayesian inverse problems," in Proc. SPIE Conference on Wavelet Applications in Signal and Image Processing VII, M. A. Unser, A. Aldroubi, and A. F. Laine, Eds., 1999, vol. 3813, pp. 85-96. [25] T. Frese, C. A. Bouman, G. D. Hutchins N. C. Rouze, and K. Sauer, "Bayesian multiresolution algorithm for PET reconstruction," in IEEE International Conference on Image Processing, Vancouver Canada, 2000, pp. 10-13.
Beyon d Wavelet s G. V. Wellan d (Editor ) ' 2003 Elsevie r Science (USA) All right s reserve d
RECENT DEVELOPMENT OF SPLINE WAVELET FRAMES WITH COMPACT SUPPORT CHARLES CHUI AND JOACHIM STOCKLER Department of Mathematics and Computer Science University of Missouri, St. Louis St. Louis, MO 63121 and Department of Statistics, Stanford University Stanford, CA 94305 cchui@stat. Stanford, edu Universitat Dortmund Institut fiir Angewandte Mathematik Vogelpothsweg 81, 44221 Dortmund, Germany Joachim. stoeckler@math. uni-dortmund. de
Abstrac t T he notion of orthonormal wavelets is extended to t h at of tight wavelet frames to allow more flexibility for wavelet construction and redundancy for certain applications. For cardinal splines, t he flexibility indeed permits the existence and construction of compactly supported wavelets. However, while the "matrix extension" approach for the construction of orthonor› mal wavelets is a natural route for constructing such tight wavelet frames of cardinal splines of order greater t h an 1, by using two or more Laurent polynomials to extend a square matrix of dimension two to a rectangular matrix, it happens t h at at least one of these Laurent polynomials could not be divisible by (1 2;)^. In other words, t he spline-wavelet with this par› ticular Laurent polynomial as its two-scale symbol has only one vanishing moment. To increase the number of vanishing moments, the notion of vanishing moment recovery (VMR) functions was introduced. A signiflcant portion of this article is devoted to this relatively recent development. Of course, dilation by 2 can be extended to arbitrary integer dilations while preserving the V MR functionality. This is another topic of discussion in this survey 151
152
FRAMES OF SPLINE FUNCTIONS paper. This extension, as well as extension to vector-valued (i.e. multi-) wavelets, can be considered as special cases of a more general consideration of tight spline-wavelet frames with arbitrary nested knot sequences t h at allow multiple (i.e. stacked) knots. In particular, when m knots are stacked at X = a for TTI*^ order spline functions, we have splines and spline-wavelet tight frames on a half interval [a, oo), and if, in addition, another m knots are stacked at x = 6 > a, the theory apphes to a bounded interval [a, 6]. This study can be considered as the spline approach to construction of nonstationary wavelets on bounded intervals. It is a s u m m a ry of our recent joint work with W. He. Most of the results presented in this paper are valid for bi-frames, and particularly, sibling frames (i.e. bi-frames associated with the same refinable function), when tightness is sacrificed to achieve certain additional desirable properties. We will only consider the properties of symmetry, shift-invariance, and inter-orthogonality .
7.1 INTRODUCTION T he first key ingredient in the construction of the Daubechies scaling functions and wavelets is construction of t he two-scale Laurent polynomial symbols
with d e g S < m
1 and 5(1) = 1, to meet t he orthogonality design criterion | P D W | ’ + | F D ( - Z )P = 1,
| Z| = 1,
(1.1)
Hence, by considering t he corresponding symbol QD{X)
= -z
PD{-Z),
(1.2)
we have t he Daubechies wavelet i/^D, with Fourier transform given by
i>D{>^):= QD (e-"/=) ^D ( I ) ,
(1-3)
where (J>D is the Daubechies scaling function defined by oo
fc=i
which obviously satisfies M^)
= PD ( e / ^ ) ^D ( ^ ) .
(1-4)
Here and throughout this paper, the Fourier transform is defined by
m
= r
J —OD
fix) e’"^^ dx.
Other important ingredients in t he Daubechies paper [22] include the proof of convergence of t he above infinite product t h at defines 2 to which (/)£> belongs.
INTRODUCTION
153
A major portion of this current paper is concerned with cardinal B-sphnes Nm, defined by m-fold convolution of the characteristic function of t he unit interval [0,1] , m > 2, with two-scale polynomial symbol
Hence, there is no need to consider convergence of infinite products or smoothness prop› erties. On t he other hand, t he orthogonality design criterion (1.1) for Pp is now replaced by the inequality
As a consequence, for m > 2, there does not exist a Laurent polynomial Q such t h at the matrix MP,Q{Z)
:=
(1.7)
P{-z)Q{-z)
is unitary for 1^1 = 1. Observe t h at t he choice of QD in (1.2) corresponding to P D in (1.1) in t he construction of Daubechies wavelets is to achieve such " unitary matrix extension" criterion, namely: Mp^ci^{z)Mp^Q^{z) = \, |2| = 1, (1.8) where t he (1,1) entry of t he matrix product on t he left-hand side of (1.8) is precisely t he orthogonahty design criterion (1.1). Here and throughout, the asterisk notation in (1.8) denotes complex conjugation of matrix transposition. an equivalent formulation of By using Mp^Q^{z) as t he right inverse of MP^Q^{Z), (1.8) is given by ’ \PD{Z)\’
+ \QD{Z)\’’
= 1;
(1-9) PD{Z)PD{-Z)
+ QD{Z)QD{-Z)
= 0,
\z\ = 1.
So, with P{z) in (1.5) in place of PD{Z), to compensate for being short of satisfying t he orthogonality design criterion (1.1), as shown in (1.6), it is still feasible to design two or more Laurent polynomials Q ’, , Q ^, L > 2, such t h at
(110)
P{z)P{-z) + E t i Q’i^)Q’i-^)
= 0,
\z\ = 1.
This is called t he unitary extension principle ( U E P) by Ron and Shen in [50] . Indeed, for Q ^ ^ = 1 , . . ., L, t h at satisfy (1.10), t he sphne-wavelets ’ 0^ £= 1 , . . ., L, defined by
ip’{u)= Q’{e-’-^’)Nm ( I )
(1-11)
do generate a tight frame ^ .=
{rPi,:j,keZJ=l,...,L}
(1.12)
154
FRAMES OF SPLINE FUNCTION S
of L^ := L ^ ( R ), with frame bound constant 1; t h at is,
E
E \{f,i>U)f= \\f\\’,f L’.
£=1
j,kez
(1.13)
Here and throughout, the standard notation (1.14)
g,,,{x):=2^’^g(2^x-k)
is used. Of course, the notion of tight frames as defined in (1.13) is a natural generaUzation of orthonormal wavelets, where the only additional requirement is t h at the wavelets must have unit norm. In other words, frame redundancy is achieved when the norms of the wavelets are allowed to be less t h an 1. In Ron and Shen [52] , L = m Laurent polynomials Q^,, Q"^ were constructed to satisfy the U EP (1.10), for any m > 2. This number was later reduced to L = 2 for all m > 2 in Chui and He [13] . Hence, instead of one generator ipo for the Daubechies orthonormal wavelets, we need two generators ip^ and ip^ of compactly supported tight frames of cardinal splines of order m > 2. Of course, additional redundancy can be achieved by applying the Second Oversampling Theorem of Chui and Shi [19] . Recall t h at for integer dilation d > 2, oversampling by p > 1 preserves tight frames, provided t h at p is relatively prime to d. Unfortunately, independent of the number L of frame generators being used, and for all integer dilations d > 2, at least one of the cardinal spline tight frame generators ijj has exactly one vanishing moment for all m > 2. This can be seen easily, ford = 2 say, from t he U EP (1.10) itself, since
E iQ’wi’ = i -
1
’\R{z)\\
1^1 = 1,
(1.15)
where R is some Laurent polynomial with R{1) ^ 0. This is somewhat disappointing, since vanishing moments of higher order contribute to t he great success of applications of wavelets to signal processing, particularly in areas t h at benefit from local extraction of multi-scale details. In a recent joint work [14] with W. He, we introduced the notion of vanishing moment recovery (VMR) functions for the construction of compactly supported tight frames, in terms of the m^^ order B-spline Nm^ that possess vanishing moments of order m. Again, it was shown t h at two frame generators ip^ and ip"^ as in (1.11) suffice. In [14] , it was shown t h at the V MR functions are necessarily quotients of two Laurent polynomials, but a Laurent polynomial S{z),t h at satisfies the "positivity condition": _ 1 _ | P ( . ) |^ S{z^) S{z)
| P ( - Z ) |= ^ ^^ S{-z)
1^1 ^ ^
(^_j,)
already suffices. In other words, by choosing a Laurent polynomial S(z)t h at satisfies (1.16), the "modified U E P" S{z’)\P{z)\’
+ ^ l , \Q’{z)\’ = S{z); (1-17)
S(z’)P(z)P{-z) + ZLi Q’{^)QH-z) = 0,
\z\ = 1,
CHARACTERIZATION OF WAVELET SPLINE FRAMES
155
has Laurent polynomial solutions Q \ , Q ^, even for L = 2, and the compactly sup› ported spline-wavelet tight frame generators V’^ and V’^, as defined in ( 1 1 1 ) , do have vanishing moments of order m, provided t h at S[z) satisfies the additional condition 5 ( z ) - - - V - - h O ( | l - z | 2 m ), t>m{z) where
near 2 - 1 ,
(1.18)
m-l
Y. ^2m{m + k)z^ (1.19) fc=-m+l denotes the Euler-Probenius polynomial associated with Nm- Details and related results will be discussed in this paper. Consideration of t he modified U EP (1.17) in our work [14] was inspired by an earlier work of Ron and Shen [50] , in which it is shown the "fundamental function of multiresolution" Em[z):=
W-)P , = . . .. oo
L
(120)
j-1
^
^
with 2, multi-wavelet tight frames of sphne functions of "multiplicity r" also result from a study of spline tight frames of m^^ order splines on arbitrary nested knot sequences. W h en m knots are stacked at the two end-points of an interval, it also leads to tight frames of m^^ order splines on a bounded interval. This topic will constitute another major topic of this survey article, for which we will report on our joint work [16] with W. He. T he following survey is divided into four major sections. In Section 2, the general topic of wavelet frames of sphnes will be studied. Here, the notions of V MR functions and sibhng frames are introduced and elaborated. Spline wavelet frames with multiple knots which are equally spaced on the real line are discussed in Section 3. Such wavelet frames are also called multi-wavelets in the literature. To extend t he study to splines with nested sequences of knots, the Fourier approach no longer applies. T he notion of approximate duals is therefore introduced in Section 4 to facilitate t he transition from Fourier-domain to time-domain considerations. T he results in this section are applied in Section 5 for constructing tight frames of spline functions with non-uniform knots.
7.2 CHARACTERIZATION OF WAVELET SPLINE FRAMES T he space of all cardinal splines of order m G N is defined by Vo = clos s p a n { N ^ (. - /c); ke
Z}
(2.1)
156
FRAMES OF SPLINE FUNCTIONS
where the closure is taken in Z/^(R) and Nm is the cardinal B-spline of order m (degree m 1) with knots 0 , 1 , . . ., m. Its Fourier transform is given by
(2.2)
N^i.)=[’-f^y. T he integer shifts of Nm are stable in the sense t h at Dm\\{ck}k Z.\\e2
0 is a constant. For integer dilation factor M > 2, the relation Nm{Muj) = Pm,M{^~^’^)^rn(i^) holds with the Laurent polynomial symbol
Note t h at (1.5) refers to the special case M roots of unity are obvious from t he formula
2. T he zero properties of PTn,Mat the M^^
M -l
PmM^)=]g
n (1 - " ’ i ^ ^ ) ’ "’
’ ^^
= « ’ ’ " ^ ’ ’-
(2.5)
fc=l
It is a well-known fact t h at t he scaled spaces Sk = {f(./h)-
f e Vo}
(2.6)
provide L^-approximation order TTI; i.e. the error estimate \\f-nf\\L^ A > 0, such t h at
^ii/iii^ ^ E E |(/’^.>)r ^ ^"-^"’^^ i=i
j,kez
(2-^^)
for all / G L ^ ( R ). If b o th frame constants can be chosen to be equal, t h at is A = B , ^ is called a t i g h t f r a m e , and the tight frame is said to be n o r m a l i z e d \i A = B = I. (c) T he families ^ and ^ are called s i b l i n g f r a m e s , if they are Bessel families and if the duality relation L
(^^) = E E (/’’^i.^) (^1^’S) ^=1 j.fcez
(2.18)
is satisfied for all / , p G L ^ ( R ). T he results in [19] imply t h at both families ^ and ^ are Bessel families, if we only assume t h at every -0^ and -0^ has at least one vanishing moment. We also note t h at b o th
158
FRAMES OF SPLINE FUNCTIONS
families are indeed frames of L ^ ( R ), if the duality relation (2.18) is satis^ed: if B is the constant in (2.16), then A \/B is a lower frame constant of t he family ^ in (2.17), and vice versa. R e m f i rk 7 . 2 .1 We have chosen the terminology of sibling frames in [14] , since b o th sets of generators {i/^^} and ^ ^ } have the same "father", namely the B-sphne Nm- More general families of so-called bi-frames, where the generators stem from two different multiresolution analyses, are given in [25, 24, 32, 51]. T he concept of sibhng frames, however, gives enough flexibiUty for the realization of important properties such as symmetry, small support, and a high order of vanishing moments. These can be achieved when using only M generators for each of t he two families ^ and ^ . T he remainder of this section is split into two parts. First we present results t h at are restricted to dilation factor M 2. This keeps the notational overhead at a minimum. Moreover, some special constructions, such as inter-orthogonal frames, have been derived for this case only. Furthermore, a slick approach to the factorization of positive semidefinite matrices of Laurent polynomials can be given in this case. In the second p a rt we provide extensions to arbitrary integer dilation factors.
7.2.1 Tight frames with dilation factor 2 In t he work by Weiss e t al. [15,19] , Han [18] , Ron and Shen [29,30] , a complete char› acterization of tight wavelet frames with integer dilation factor is obtained in a general setting, where no assumption a b o ut an underlying MRA is made. It was one goal of t he papers [14, 25] to introduce a simpler way for characterizing tight wavelet frames whose generators ^^ are defined from a multiresolution analysis {V^}. Since the integer shifts of t he B-spline Nm are stable, as explained in (2.3), the following characterization is obtained as a special case of [14, Theorem 1]. T h e o r e m 7 . 2 .2 Let Q^, 1 < £ < L, be Laurent polynomials with real coefficients and vanishing at z 1. Then the functions ip^ defined in (2.10) generate a normalized tight frame ^ in (2.13) of L^{Ii), if and only if there exists a Laurent polynomial S with real coefficients and nonnegative (real) values for all z £ TT (the unit circle), that satisfies 5 ( 1 ) = 1 and
Siz’)\P(z)\’+ J2\Q’{^)\’ = S{zy,
(2.19)
L
5 ( / ) P ( z ) P ( - l / z ) + Y^ Q’(z)Q’{-l/z)
= 0.
(2.20)
T he Laurent polynomial S in 7.2.2 governs the order of vanishing moments of the frame generators ip^. We repeat the short argument in [15] , which makes use of (2.19). Let 1 < fi < m, and assume t h at Q’{z) = {l-zyq\z),
l 2, spUnes with multiple knots of equal multiplicity, and non-uniform Bsphnes. Table 7.1 gives the coefficients Uk in (2.36) of S for low order B-splines. Figure 1 shows the approximation of l/^(e~*’^) by the trigonometric polynomial S{e~^’^)for m = /x = 2, which is of fourth order at cj = 0.
Table 7.1. Coefficients Uk of the VMR function S for the 5-spline Nrr
m
5o Si
S2
S3
S4
1 1 2 1 2/3 3 1
1 13/15
4 1 4/3 62/45 1244/945 5 1 5/3
2
134/63 2021/945
Remar k 7.2.4 The result in Proposition 3.5 of [25] differs from the necessary and sufficient condition (2.33). In [25], the authors show that A{z) := S{z) -
S{z’){\P{z:|’ + | P ( - ^ ) | ’ ) >0
(2.37)
CHARACTERIZATION OF WAVELET SPLINE FRAMES
163
for all z G TT. The connection to det A4(2;) can be established as follows. If S{ z) > S{z) > 0, we have det M{z) >S{-z)A{z) > 0, and if S(z) > S{-z) > 0, we have det M{z) >S{z)A(-z) > 0. The result (2.37) is strictly stronger than the necessary and sufficient condition (2.33), unless S{z) = S{-z) for all z TT, which only holds for /x = 1.
Finally, we can conclude that there always exists a pair of generators tp^^ip’^ G Vi with /i vanishing moments, 1 < /x < m, such that the family ^ is a tight frame of L^(R). Theore m 7.2.5 Let 1 < /x < m, and S be a VMRLaurent polynomial, which is nonnegative on TT and satisfies (2.25)and (2.33).Then there exist two Laurent polynomials q^,q^, such that Q\z) = {1 - z)^q\z) and Q^{z) = {I - zYq^{z) define functions ’0 > V’ ^ Vi with fjL vanishing moments, that constitute a normalized tight frame ^ of L^(R). Two elementary steps are required prior to constructing the factorization of the matrix M in (2.30). First, the factors (1 - z)^ are cancelled, and then a conversion to polyphase form is applied. These operations do not affect the semi-definiteness of the matrix. The following example illustrates this procedure. Exampl e 7.2.1 Let m = fj, = 2 and P{z) = ((14- z)/2f. The VMR Laurent polynomial S from Table 7.1 and Figure 1 is S{z)z= [S - z - z~’^)/6. The matrix M is given by
{i-z-^r
M{z)^
0
0
Mo{z)
0
(l-zy 0
(1 + 2-
(1 + 2)2
where
Moiz) = ^
24 + S{z + z-^) + 2^ + 2-2
_(8 - z ’ ’- z-2)
- ( 8 -z^ - 2-2)
24 - 8(2 + 2"^) + 2^ + z-2
In accordance with 7.2.5, we find that detM(e-*’")- ^sin’*u;>0. Hence, the condition (2.33) is satisfied. Making use of the conversion to polyphase form gives 1 1
1 + 2^ + 2-2 4 + 42-2
1 z~
Moiz)
96
4 + 422
16
We denote the matrix on the right-hand side by Mi{z). The factorization from [14] gives Mi{z) = 0
7=e
L 171(1 + ^ ’ ) ; ^
164
FRAMES OF SPLINE FUNCTIONS
Figure 7.2. Generators ipi,il>2 of tight frame with two vanishing moments
P u t t i ng all factors together gives t he factorization (2.30) with
Q{z) =
1
1-z)’ 0
0 {l + zf
1 ^ ( 1 4 . 4 . + ^^)
1 ^ ( 1 _ 4 . + .2)J
which, finally, yields t he Laurent polynomials Q’iz) = ((1 - z ) / 2 ) ^
Q^{z) = ((1 - z)/2f{l
+4z +
z^)/V6.
T he functions ipi, -02 are piecewise linear splines in Vi with 2 vanishing moments. Their graphs are shown in Figure 2. T he family ^ is a tight frame of L^(R).
R e m a r k 7 . 2 .5 In the literature, explicit constructions of t he factorization (2.30) are often pursued by solving a system of quadratic equations for t he unknown coefficients of Q \ . . . , Q ^, which can be elaborated with methods of computer algebra. In [14, 15] , for t he special case of Laurent polynomial matrices of dimension 2 x 2 and L = 2, we find a m e t h od to convert t he matrix equation (2.30) into a linea r system of equations for t he coefficients of Q^ a nd Q^ (more precisely t he coefficients of q^ and q^ after cancellation of t he factor (1 z)^) . We also show t h at there exists a pair (Q^,Q^) of Laurent polynomials, where Q has precisely ni : = 2/i -j- m 1 nonzero coefficients and Q^ has no more t h an 2fi-\-m l K nonzero coefficients; here we define /c = 1, if fi-\-m < 4, and /c 2, otherwise. Moreover, no solutions exist, where both Laurent polynomials have less t h an rii nonzero coefficients. In [15] , an algorithm is presented for t he computation of such solutions. For polynomial matrices of higher dimensions, we refer to [35] , where a constructive proof of the existence of t he factorization (2.30) is implemented. E x a m p l e 7 . 2 .2 Examples of minimally supported tight frames of splines of order m a nd vanishing moments of order /x > 2, for 2 < m < 6, are contained in [14, 25] . As there is no unique solution of t he matrix factorization problem (2.30), different sets of generators {ip^,..., t/;^} can be constructed for t he same V MR function S and t he same number L. Pairs {V^^, V’^} with minimal support were obtained in both papers. For m ^ A^ such an example is given by t he functions in Figure 3, which are defined by
CHARACTERIZATION OF WAVELET SPLINE FRAMES
165
Figure 7.3. Generators V'l,^^2 of tight frame of cubic splines with four vanishing moments and filters of length 9 and 11.
1,2, where z = e~’-’^^’^and qi{z) =(0.130465 -f 1.043722 -h 3.543122^ + 6.426802^ + 4.114162’* + 1.261262^ + 0.1576572^)/2^ q2{z) = 0.074371 + 0.5949672 -f 3.705272^ + 1.239872^ + 0.1549842^. These functions are not symmetric or anti-symmetric. T he filter length of Qi and Q2 is 9 and 11, respectively. These numbers match the values of ni 2 and n\ mentioned in t he previous remark. E x a m p l e 7 . 2 .3 Other "ad hoc" constructions of pairs ((^^(?^ ) or triplets ( Q \ ( 3 ^ , Q^) based on "square roots" of the Laurent polynomials S and A from above are proposed in [25, 33] . These constructions circumvent the factorization of t he Laurent polynomial matrix M by establishing the identities (2.19)-(2.20) more directly. One example goes as follows. Let a,a be Laurent polynomials such t h at a{z)a{\/z) = S{z), a{z)a{l/z) = v4(2). T h en Q\z) = 2 a ( 2 ^ ) P ( - l / 2 ), Q’{z) = a{z’)P{z) define Laurent polynomials, t h at give a factorization (2.30), where t he V MR function S A replaces S. (Note t h at 5 ^4 contains only even powers of 2 and has degree more t h an twice as large as 5.) Therefore, these Laurent polynomials Q^,Q^ have more t h an ni coefficients, in general. This construction does not reveal frame generators of minimal support. A similar construction by H an and Mo [33] provides a triplet (V^i,V^2,V^a) of symmetric/anti-symmetric frame generators (with maximum vanishing moments ^ = m). Their construction makes use of another V MR function 2m-2
e{z) =Y.d^ ( ^ - ^ ) ' fc=0
of degree twice as large as t he degree of the V MR function S in 7.2.4. 6 is chosen to have complex zeros of multiplicity 2; a square root 9i can be found, which is real-valued on
166
FRAMES OF SPLINE FUNCTIONS
TT. T he positivity of A in (2.37), with 0 substituted for 5 , is shown, and the three frame generators are defined by the Laurent polynomials
Q\z)=z0,(z^)P{-l/z),
Q\Q’{z) = ^{a{z^)
–a(l/z^))Piz).
Other triplets {ip^ .ip^^ip^} of symmetric/anti-symmetric generators of a normahzed tight frame are constructed in [25] . These generators have fewer nonzero coefficients t h an the ones found by t he aforementioned constructions. We present, in Section 5, a new method for the construction of tight frames of splines which we developed in our recent joint work [16] with W. He. This method yields a triplet of symmetric generators for m == /i = 4 whose filter lengths are 7, 9, and 11, respectively. T he graphs are shown in Section 5, Figure 10.
Finally, we wish to comment on the number L of generators t h at are needed for t he construction of a tight frame ^ . If M{z) has full rank at some z G I T, then at least 2 Laurent polynomials Q^, Q^ are needed for t he factorization (2.30). This means t h at at least two functions ip^^ip"^ are needed in order to generate a tight frame ^ of L ^ ( R ). It was shown in [14, Theorem 9] and [25, Theorem 3.8] t h at the only case where o n e compactly supported spline function ip e Vi generates a tight frame of Z/^(R) is the case m = 1; examples of such frames are the orthonormal Haar basis ipn and dilates IIJH{-/T^) with odd n. These examples are known from the First Oversampling Theorem in [19] . For all other values of m > 2, however, no compactly supported spline function ip G Vi exists, whose dilates and translates generate a tight frame of L^(R). R e m a r k 7 . 2 .6 Most results discussed in this section remain valid for t he general set› ting with the B-spline Nm replaced by a compactly supported refinable function (w.r.t. dilation by 2), which is piecewise L i p" for some a > 0, has nonvanishing integral over /c); k G Z} are a Riesz basis of the space VQ. In par› R, and whose integer shifts {(/>(. ticular, there always exists a nonnegative V MR Laurent polynomial S t h at satisfies t he positivity condition (2.33) and defines a quasi-interpolation operator as in (2.28), which reproduces all polynomials in the span of the integer shifts of . This result was shown in [14, Theorem 5] by a sophisticated analysis of the positivity condition (1.16), see also (2.33). T he formulation (1.16) exhibits the relation of this condition and t he "transfer operator" which is an operator on L^((0, 27r)) and maps a certain space of trigonometric polyno› mials (that depends on the degree of the two-scale Laurent polynomial) into itself. T he spectrum of this operator was analyzed in [14] in order to show the existence of a V MR Laurent polynomial S which satisfies (1.16). Therefore, pairs ip^.ip^ Vi of generators of a tight frame of I/^(R) can always be constructed. If the integer shifts of (f)are not stable, then t he characterization of tight wavelet frames in 7.2.2 remains vahd, if we allow S to be a quotient of two Laurent polynomials with real coefficients and real values for all z G I T, and with no pole at z = 1. This case is further analyzed in [14] . 7.2.2 Non-tight sibling frames with dilation factor 2 We begin this section with a characterization of compactly supported sibling frames of cardinal spUnes. Recall Definition 1(c) for the notation of sibling frames and duality. In analogy with Theorems 1 and 2, we obtain the following.
CHARACTERIZATION OF WAVELET SPLINE FRAMES
167
T h e o r e m 7.2.6 Let Q^, Q^, I < ^ < L, be Laurent polynomials with real coefficients vanishing at z = 1. The functions ip^, ip^, I < i < L, defined in (2.10) generate sibling frames of L^(R,), with respect to dilation by 2 and integer shifts, if and only if there exists a Laurent polynomial S with real coefficients, which satisfies 5(1) = 1 and S{z’)\P{z)\’
+
(2.38)
Y.Q\z)Q\\/z)^S{z)
(2.39)
S{z^)P{z)P{-\lz) + Y. Q ’ ( ^ ) Q ’ ( -1 A) - 0£=1
Moreover, if all of the functions ip^, ip^, I < i < L, have 1 < /i < m, then S satisfies the approximation property (2.25), where I < H < m, then there exist four compactly i = 1,2, which have /x vanishing moments and generate
/i vanishing moments, for some (2.25). Conversely, if S satisfies supported functions ip ,rp ^ Vi, sibling frames of L ( R ).
For a proof of this result, we refer to Theorems 1 and 2 in [14] , where more gen› eral M R A ’s are considered. T he construction of sibling frames is performed in a similar manner t h at was already described for tight frames. T he identities (2.38) and (2.39) are reformulated as the matrix equation
M{z) -
(2.40)
Q{l/z)Q{z)
where M(z) is t he matrix in (2.31) and Q{z), Q{z) are defined as in (2.32). Clearly, there is no constraint of positive definiteness of M{z), in order t h at the factorization (2.40) exists. T he rank of M{z), z G TT, is a lower bound for the number of columns of t he matrices Q,(z) and Q(z). As mentioned in the previous section, except for m = /x 1, there exists no Laurent polynomial 5 which yields rankA1(2) = 1 for all z G TT. Therefore, the minimal number of frame generators in 7.2.6 is L = 1 for m = y^ 1 and L = 2 for m > 2 and any 1 < ^ < m. T he factorization (2.40) allows for much greater flexibility in finding the matrices Q(z) and Q{z). A simple factorization, if S satisfies (2.25) and ^ < m, can be obtained by mimicking t he first two steps t h at appear in Example 1. This gives
M{z) =
[ ( 1 + 2 - ’ )" - 2 - 1 ( 1 + 2 - 1 )"
Mx{z)
’ ( 1 - 2 )"
( 1 + 2 )" 1
(2.41)
2 ( 1 - 2 ) " - 2 ( 1 + 2 )" J
=: Q(l/2)
=: Q-{zf
where the matrix M.\{z) has Laurent polynomial entries of even powers of z. This type of factorization yields the Laurent polynomials Ql(2) = ( l - 2 ) " ,
Q ^ ( 2 ) = 2 ( l - 2 ) ",
and, by simple calculations, we obtain, for ^ = 1,2, t h at
Q\^) = z’-’
(1
g(2) - s(2^)|P(2)p n M + i 2 - 2 " ’ r 2 - i / 2^ ( 2 - 2 - 1 / 2 ) ’^^’^^ ^ ^^ ^’^
^Siz^
Note t h at only the "trivial" factors (that are due to vanishing moments and transfor› mation to polyphase form) are involved. Hence, no complicated polynomial factorization
168
FRAMES OF SPLINE FUNCTIONS
is required in order to define the Laurent polynomials Q^^Q^ and Q^,Q^ of t he sibUng frames. If S is the V MR Laurent polynomial in 7.2.4, then the following properties can be easily verified:
ip^ has support ^ ^ -f [0, (m 4- /i)/2] and is even (odd) with respect to its center, if /i is even (odd). Since t he B-splines in Vi satisfy the variation diminishing property (see [4]) , tpi and -02 are minimally supported splines in Vi with /x vanishing moments. Moreover, ip^ i s ^ shift by 1/2 of t/>^ T he support of ip^ is contained in the interval ^ + [-(2/i- f m ) / 2 - |- 1, ( 3 / x - | - m ) / 2- 1]. Both functions are even (odd) with respect to the center of this interval, if /x is even /x is even. No factorization (2.40) has been found yet, where m fi is (odd) and m odd and ip^’^ are the minimally supported splines in Vi with ^ vanishing moments. R e m a r k 7 . 2 .7 It is worthwhile to mention t h at both generators V^\’0^ have the same parity. This property is unavoidable, if we choose ip^,ip^ to be minimally supported splines in Vi with t he same order of vanishing moments. By distributing t he order of vanishing moments unevenly (order /x for ip^, ’ 0 \ and fi – 1 for ip^, ip"^, for example) it is possible to create frames with two generators of different parity. Instead of having t he shift-invariance property mentioned above, such frames may possibly give rise to better shift-invariance of the frame decomposition. This was analyzed experimentally by Kingsbury [40, 41] and Selesnick [54] for other types of frames. Preliminary investigations concerning spline frames are contained in [3] . R e m a r k 7.2.8 One may ask if the upper bound m for t he order of vanishing moments in 7.2.6 can be relaxed, at least for one family of the sibling frames. T he answer to this question is negative, as we show next, even if the order of approximation of the V MR function S in (2.25) exceeds 2m. More precisely, if t he functions ip^.ip^ G Vi generate sibhng frames of L ^ ( R ), there always exist i,i\\ 2. We will report in this section that most results from Sections 2.1 and 2.2 have a proper extension to this setting. Some of these results appeared in [17, 34]. However, the main theorem, where we extend the result of 7.2.5, is new and appears here for the first time. We fix m, M G N, M > 2, and recall from (2.4) that P{z):=Pm,M{z)^
1-z^ M{l-z)
+ zM-l
1 +Z + -
\ ’^
M
is the Laurent polynomial two-scale symbol of Nm with respect to dilation by M. The functions ip^ £ Vi (which is the space of splines of order m with simple knots in (1/M)Z) are defined in (2.10) by Laurent polynomials Q^, such that V^(Ma;) - Q\e-’’^)Nm{uj), We let WM
I < ^ < L.
e^^^/^. The following generalization of Theorems 1 and 2 is given in [17].
Theore m 7.2.9 The compactly supported functions il^^ £ Vi, I "£ i "£ L, generate a normalized tight frame of L^(R), if and only if there exists a Laurent polynomial S{z) with real coefficients, such that S{1) = 1, S is nonnegative on TT, and the identity S{z’^)P(z)P{w1,z’’) + J2 Q\z)Q’iwl,z-’)
= SkoS{z),
(2.43)
172
FRAMES OF SPLINE FUNCTIONS
holds for all k = 0,. . ., M
1.
A variant of t he proof for t he sufficiency in t he above theorem is given in [17] , which estabhshes an important identity for t he inner products {f,j,k)and (f^cpj^), where we let
and A^^ is t he approximate dual in (2.26). This identity, namely L
kez
kez
£=1kez
holds for all j G Z. It is related to t he characterization of tight frames in [50] and was first established, for t he special case 5 = 1, in [13] . It is also t he guiding identity for constructions of tight frame in subsequent sections. If vanishing moments of t he functions ip^ are analyzed, we again make use of t he representation
Q'(z) = ( l - z ) V W ,
l 2. The functions ip\...,ilj^ in (2.10) have vanishing moments of order JJL and generate a normalized tight frame ^ of L ^ ( R ), with dilation factor M, if and only if there exists a Laurent polynomial S that satisfies (2.25) together with all conditions in 7.2.9. Moreover, no tight frame with generators tp^ G Vi exists, where all functions ip^ have more than m vanishing moments. We already noticed t h at t he auto-correlation symbol l does not depend on t he dila› tion factor M, of course. Likewise, t he approximation property (2.25) t h at relates S a nd ^ does not depend on M. It is natural to ask if the same V MR Laurent polynomial 5 , t h at was chosen for t he construction of tight spline frames with 2 generators for M = 2, can be utilized for the construction of dilation M frames. In other words, is there a "uni› versal" V MR function 5 , for a given B-spline Nm, such t h at t he equations (2.43) admit Laurent polynomial solutions Q^, I < i < L, and can we choose L = Ml In [17] it was observed t h at S from Table 7.1 could be utilized for piecewise linear splines ( m = 2) a nd dilation factors M 2 , 3 , 4. Moreover, t he equations (2.43) can be equivalently written as a matrix equation, where t he matrix
CHARACTERIZATION OF WAVELET SPLINE FRAMES
S{z)
m
M{z):= /?0
173
Siw ^-'-))
(2.46)
P{l/z
[P{z
S{z M\
p«-’^)
P{w - « / z ) is involved. The critical part of the result was still unsettled, if this matrix is positive semi-definite. A similar simplification of this problem as for M = 2 is obtained in [17, Theorem 4.1]: the matrix is positive semi-definite, if and only if its determinant
n s{w’i,z)-s(z’^)5] \p{wi,z)\’ n ^( 2. If we choose S to be the VMR Laurent polynomial in (2.35), then the matrix M{z) in (2.46) is positive semi-definite for all z £1T. Moreover, there exist M compactly supported functions ip^ ^ Vi, I < i "£ M, that have /z vanishing moments and generate a tight frame of splines with dilation factor M. The following example illustrates the previous results. Exampl e 7.2.7 For piecewise finear splines (m = 2), we consider dilation factors M = 3 and M = 4 separately. The two-scale symbols for these two cases are P2,3W = 5 ( l + Z + z ’ f,
P 2 , 4 W = Y ^ ( 1 +Z + Z’ + Z T
The VMR Laurent polynomial for two vanishing moments is, as before, 5(2) = 1 + 1 ( 2 z - z~^). The Laurent polynomials Q^, 1 < ^ < M, rounded to four decimals are given by
174
FRAMES OF SPLINE FUNCTIONS
Figure 7.7. Generators IIJ^ , ip^, tp^ of 3-dilation tight frame of piecewise linear splines with two vanishing moments
Q\z) = - ( 1 - z)2(.0574 + .30592 4- .0574^2), Q^{z) = (1 - zf{mS9
+ .15552 4- .38872^ 4- .51252^ -f .1863z^),
Q^{z) = (1 - z)2(.0059 -h .02362 + .05892^ + .11132^ + .16752^ + .34922^) for the case M = 3, and Q\z) = - ( 1 - 2)2(.0265 + .10612 H- .33392^ + .09072^ 4- .00732^*) Q^{z) = {I- 2)2(.0145 + .05812 4- .15082^ + .30382^ - .01792^) Q^{z) = {1- 2)2(.0188 + .07512 4- .19332^ + .35562^ 4- .36712^) Q^{z) = - ( 1 - 2)2(.02092-^ + .08352"^ 4- .20872"^ -h .41732"^ + .5942 +.62402 + .31202^ H- .12482^ 4- .03122^*) for M = 4. The graphs for M = 3 are shown in Figure 7.7, and those for M = 4 are shown in Figure 7.8. Finally, we present a general result about the existence of tight M-dilation frames with only M - 1 generators and end this section with a conjecture. Theore m 7.2.12 Let {Vj}jez be an MRA generated by an M-dilation compactly sup› ported scaling function (f) with Laurent polynomial two-scale symbol P{z). Then there exist compactly supported functions ipi,... ,ipM-i G Vi that are generators of a normal› ized tight frame, if and only if there exists a Laurent polynomial B such that B{1) = I, B{z^)/B{z) is a Laurent polynomial, and B{z^)P{z)/B{z) is an M-CQF, meaning that, in terms of a generic Laurent polynomial H,
WAVELET FRAMES OF SPLINES WITH MULTIPLE KNOTS
Figure 7.8. Generators IIJ^,... two vanishing moments
175
,\jj^ of 4-dilation tight frame of piecewise linear splines with
M-l
Y, my^Mz)\=1,
z^n.
Until now it has not been shown if this result rules out the existence of tight frames with dilation M and M - l generators in the spline space Vi of splines of order m > 2. Only the case M = 2 is settled, with a negative result, as mentioned at the end of Section 2.1. Conjectur e 1. For any m>2 and M > 2, there exists no family of M - 1 compactly supported splines ip^ Vi of order m, 1 < £ < M ~ I, that generate an M-dilation tight frame of L’^iK). 7.3 WAVELET FRAMES OF SPLINES W I T H MULTIPLE KNOTS
The theoretical development of wavelet bases for spline spaces with multiple knots of equal multiphcity evolved under the terminology of "multiwavelets". Here, the underlying MRA is generated by finitely many compactly supported functions 0 \ . . ., c/)"" G L^(R), which define the function spaces Vj = clos span{(/)j^fc; k e Z,
l^(2w)J
QLAZ)--
QLA^)\
T he matrix Q := [Qmn] is a Laurent polynomial matrix of dimension L x r, where L denotes the number of generators of the wavelet basis or frame. A general method for t he construction of wavelets from an MRA of multipUcity r can be found in [31] . A new method for t he construction of non-tight frames with arbitrary vanishing moments was recently presented in [34] . T he key step for finding an appropriate V MR function S was tackled by making use of a linear transformation of the set of generators ^ of the MRA; this transformation resembles t he approach of finding a "superfunction" in Vb for the purpose of studying t he approximation order of Vj, see [8] . In this section, we not only present a m e t h od for t he construction of tight frames with m a x i m um vanishing moments, b ut our result also leads to frame generators with much shorter supports t h an t he existing constructions. T he principal steps of our method are similar to the case of ordinary M R A ’s described in Section 2. T he construction of t he V MR function, however, depends heavily on our results developed jointly with W. He for the construction of tight
WAVELET FRAMES OF SPLINES WITH MULTIPLE KNOTS
177
frames of splines with non-uniform knots. In the paper [16] , we are able to avoid several difficult steps of the Fourier-domain approach, t h at involve factorization techniques for the Laurent polynomial matrix P in connection with sum rules or the characterization of symmetry, see [34] . It t u r ns out t h at reference to the shift-invariant structure of the spaces Vj is of lesser importance for these constructions, and a time-domain approach leads to a simpler analysis of V MR functions. The full description of the time-domain approach is given in Sections 4 and 5. Here, we draw from those results and present them in a Fourier-domain framework. We begin our discussion by agreeing on the notations for splines with knots of fixed multiplicity. Let r N and m > r. We consider m^^ order B-splines with respect to r-fold integer knots . . . ^-1 = - 1 < to =
== tr-\
=0
^0)_
Other more important differences concerning the construction of the VMR Laurent poly› nomial matrix S and the frame generators are described in the remainder of this section. R e m a rk 7.3.3 Note that we only claim the sufficiency in the previous theorem. To our knowledge, no proof for the necessity of the existence of the VMR Laurent polynomial matrix S has yet been published. The explicit construction of tight frames of splines with multiple knots follows the procedure that was laid out in Section 2. The identity (3.10) is written as a matrix equation
Qiz) M{z) =
M -1
[QW--
,QW^)1
(3.16)
LQ«"'^ where
S{z)
00
00
S{w^-’z))
M{z) = (3.17)
P(zr S(z^)[piz)...
Piw^’^z)]
p«-’r
Hence, the VMR Laurent polynomial matrix S{z) must be defined, such that (3.9) holds and that the matrix M{z) in (3.17) is positive semi-definite. Moreover, in order for the functions tp^ to have at least ^ > 1 vanishing moments, the matrix M{z) admits the factorization
DizY
D{z)
00
M{Z):
00 (3.18)
Moiz) /30
D{w^-’^)’
00
D(w^-’z)
where A4o is a Laurent polynomial matrix of dimension rM x rM and D ( z ) : = D + i . . ( 2 ) - £ » + ^ , . ( z ).
(3.19)
The methods in [35] for the factorization of the positive semi-definite matrix M.o{z) then allow us to find the Laurent polynomial matrix Q = DQ for the definition of t/;’ in (3.4). The key observation drawn from the time-domain approach in Section 5 is that the VMR matrix S should satisfy an approximation property similar to (2.25), which we brought into play in Section 2.1. This identity reads as
182
FRAMES OF SPLINE FUNCTIONS S{z) -
(3.20)
D{zrXiz)D{z)
^{z)
where X is a hermitian matrix of Laurent series, which are continuous on TT and whose coefficients decay exponentially. As in Section 2.1, we see that condition (3.20) does not depend on the scaHng factor. Hence, the same VMR matrix can be employed for different integer dilation factors M. Let us consider the identity (3.9) in 7.3.L Since D(l)[(^H0),...,(^^(0)f = 0, we observe that (3.20) implies "0’(O)’
’4>\0)
"-^’(0)’
S(l)
i^(o).
" m. For a bounded interval [a, 6] and the knot vector in (4.4) we replace the matrix Dt;r with 1-1
"t;r,- m + l -^t;r,-m+2 ''t-^,.
(4.12)
Dt;r := "t;r,7V+m-r- l J -1
which is a matrix with N + 2m the abbreviation
r rows and N + 2m
r
1 columns. We also introduce
Et-m,u : = Dt;m ’ ’ ’ A . m +u - l ,
(413)
in order to write
d" ^t;m+u{x) = ^t;m(2:)Et;m,i / dx’’ The recursion for the L^-normalized splines reads as | , * e 1 . . ( x ) = ^U^) diag [dim ^^^^ E,^.. diag [d-l^l,,] ^^^^^^ . > ^ — • -^
(4.14)
(4.15)
It is clear that Et-m,u and E^rn.i/ are banded matrices with precisely /^ + 1 nonzero diagonals. The identities (4.14) and (4.15) are particularly useful in order to express the order of vanishing moments of sphnes in 5t;mnL^(/). A spUne s 6 5t,mnL^(/) has /x vanishing moments (and compact support or exponential decay, if it is defined on an unbounded interval), if and only if it is the /x*^ derivative of a spline S of order m-\- ^i with respect to the same knot vector t. The spline S is hereby defined uniquely, if we require S to have compact support or exponential decay as well, in the case of an unbounded interval, or to have zero values of derivatives 5^’’^(a) and 5^*"^ (6), 0 < i/ < /x - 1, in the case of a bounded interval. Hence, the knots a and/or b of the B-splines of order m-\- ^L that represent S have multiplicity at most m (and not m-h^t). We obtain the following result. Lemm a 7.4.1 Let
5 = $ t V = E ’^’t^w
THE COMMON LINK: APPROXIMATE DUALS
189
be given, where (a) the entries of c decay exponentially or (h) c^ = 0 for allk i2- Then s has fj, vanishing moments, if and only if there exists a column vector d such that
c = E,%,^d,
(4.16)
and the entries of d decay exponentially in case (a) and satisfy dk = 0 for all k < ii and/or k> i2 iJ. in case (6). For the case I = [a,b], the same result holds when the superscript B is dropped. Finally, we describe t he refinability of t he 5-spline basis. Consider two knot vectors t i C t , + i,
j > 0,
(4.17)
t h at satisfy conditions (4.1)-(4.2). Note t h at the subset notation is used for ordered sets: new knots of multiplicity < m can be inserted into t^, or the multiplicity fik < rn of an existing knot t]^^ in t^ can be increased. (We use superscript (j), in order to denote knots in tj. We also drop t in t he index of B-splines and write Nj-m,k, etc.) R e m a r k 7 . 4 .1 One t y pe of knot refinement is defined by t he insertion of a new knot of t he same multiplicity into each knot interval of t^; this is called "two-threaded" refinement in [26] . If new knots are placed halfway between old knots, we obtain quasi-uniform refinements, and if, in addition, to = Z, we are in the situation described in Section 2. In general, we do not assume any of these special types of refinements. T he only additional restriction on the refinement is t h at t he number of knots inserted between t]^^ and tf^l^ is bounded by a constant Uj t h at may vary with j . T he jB-spline bases ^j,m and ^j^i-rn
satisfy t he refinement equation
^j,m = ^ i + l;mP,;m
(4.18)
with a real m a t r ix Pj-m whose entries are nonnegative and whose row sums equal 1. More› over, t he m a t r ix is sparse in t he following sense. We define strictly increasing sequences i{k) and rjik) such t h at
{4^’.--*K }c{t ’q(k). In other words, only t he B-spfines of ^ i + i ; m, whose support is contained in t he support of Nj-m,k, have a nonzero coefficient in the refinement relation of Nj.rn,k- T he restriction on t he knot insertion in the previous remark guarantees, t h at at most muj +1 entries can be nonzero in every column of Pj;mIn t he special case, where tj+i \ tj = { r} is a singleton and r [^If >*fc-/i)> ^^^ matrix Pj,m has t he form
190
FRAMES OF SPLINE FUNCTIONS
^Jk-Tn + 2
2i/, Fi, is a homogeneous polynomial of total degree 2i/, which is symmetric in its variables and is invariant under a shift of the arguments ( x i , . .. ,Xr) ^-> (xi c , . .. ,Xr c). Its Coordinate degree in each of its variables is 2. These properties are enough in order to assure, that Fu can be written in terms of the centered moments of its arguments
THE COMMON LINK: APPROXIMATE DUALS
191
where x = (xi H [ Xr)/r. This result and representations up to i/ = 10 were worked out by our Summer Intern Tim Huegerich, an undergraduate student from Rice University in Houston. For 1 < i/ < 3, we have F i ( x i , . . . , X r) = r^cr2, F2(Xi,...,Xr)
= ^
Oi
^^
’- (74,
^f . ( r - 2 ) ( r - l ) r2 (r - 2)(r2 - 5r + 10)r2 F3(xi,...,a:r) = ^^ ^^ ’ ae’ o^oi(3r2 - 15r + 20)r2 2 (r - 2)(r2 - 7r + 15)r^ 3 3 era + -^ ^2. 7.4.3 Explicit form of an approximate dual We now approach the task of constructing approximate duals of the B-spline basis. The Gramian matrix of the L^-normalized B-sphnes is r = / ^ t 1 m ( x f ^ t l m ( x ) dx = \(dt,m,kdt.,m,e)-’^^{Nt,m,k.
Nt;m,i)\
( 4 . 2 7)
This defines an spd banded matrix, whose upper and lower operator bounds on i^ are the Riesz bounds of 4>^^. The matrix is totally positive, see [28] . Its inverse F"^ is a full matrix (if m > 2), whose entries decay exponentially; namely, Demko’s result [27] assures that
where K is the condition number of F (in i^) and r its bandwidth. The dual Riesz basis of the sphne space 5t,m n L^{I) is given by ^ = ^ ^ m F ’^ The dual basis functions have global support in / for m > 2. The kernel K{x,y)= 4>^,^{xmy)f
(4.28)
defines the kernel of the orthoprojection
nt,mf:= J f{y)K(x,y)dy,
(4.29)
which maps L^{I)into St-m(^L^{I).The result of the recent proof of de Boor’s conjecture by A. Shadrin [55] states that there exists a constant Cm that neither depends on the knot vector nor depends on the interval / , such that this operator has operator bound Cm, if it is considered as an operator on any L^(I), for 1 < p < oo. Equivalently, the kernel K in (4.28) satisfies sup f\K{x,y)\dy 0 we have \tk\ 0. This effect is du e to the L^-normalizatio n of the B-spUnes ^tfm.fc) whic h incorporate s such scaHng. One of the achievement s in our join t pape r [16] with W. He is the exact computatio n of the matri x S of minima l bandwidt h such that ^tym is an approximat e dual . The function s Fu in (4.25) play an importan t role in this computation . Particular instances of this resul t appeare d in Theorem s 7.2.4 and 7.3.2 and were formulate d in the Fourie r domain . Here, we give a representatio n that has a similar form as in (2.35) and (3.21), bu t appear s in the time-domain . In doin g so, we mus t substitut e the constant s Uk in (2.35) and the constan t diagona l matrice s (/^ in (3.21) by new diagona l matrice s C/ with positiv e diagona l entries . Theore m 7.4.2 For every 1 < /x < m, there exists a unique spd matrix S with bandwidth fi such that ^t,m ^ tt’^ approximate dual of order /x for the spline basis ^^,n- Moreover, S has the form ^
^ ~^ ^t;m,lUt;m,l{Et;m,l)
+ ’ ’ ’ + £’t;m,/x- l ^t;m,/x- l (£^t;m,/x-l ) >
(4.38)
where the matrices E^^rn,u (^^ defined in (4-15) and Ut,m,u are diagonal matrices with diagonal entries (i,)
m\{m-
i ^ - 1)!
E^ /^
.
X
(m + i/)!(m -\-1/ - 1) ly. {m i/)\{m-\-1/
(4.39)
The corresponding kernel K^ ha^ the form
0, where t
c= {tk
c, k e K}.
Proof: The functions F^ in (4.25) satisfy Fu{xi
- C , . . . , Xr - C ) = F ( x i , . . . , X r ) ,
F^(/ixi,..., hxr) = h’^’^Fuixu..., Xr), while pU
’-^t
c\m,v
_
pB
’-’t-.m,!/ ?
Hence, each summand in (4.38) is invariant under the shift of the knots and the factors /i^’’ and h’^" in Eht-m ^^ht;m,u{Eht i uV cancel each other. This confirms the equation (4.41). Remar k 7.4.3 We explain in more detail, how the matrix S in the previous result is related to the VMR Laurent polynomial (matrix) S{z) in (2.35) and (3.21). If t = Z, the matrix Uu in (4.38) is a bi-infinite diagonal ToepUtz matrix and 5 is a bi-infinite banded Toeplitz matrix with bandwidth /x. The symbol of S is the VMR Laurent polynomial S{z) in (2.35). Hence, the positivity of the coefficients u]^’ is equivalent to the positivity of Uk in (2.35). Likewise, if t is the set of all integers repeated with multiplicity r, 5 is a bi-infinite block Toeplitz matrix with rxr blocks, and its symbol is a slight modification of the VMR Laurent polynomial matrix 5(2) in (3.21), see also Examples 8 and 9(b). The only difference appears in the use of the L^-normaUzed B-splines for the definition (4.38) versus the L -normaUzed B-spUnes in (3.21). Once again, the positivity of the diagonal entries of Uu{z) in (3.21) can be viewed as a consequence of 7.4.2. The property (4.34) of boundedness of the kernel K^ in (4.40) is a consequence of the following result that is proven in [16, Section 5.7]. Theore m 7.4.3 Let t he a knot vector and u^^"^ be the numbers in (4-39), 0 < i/ < m Then the kernel
satisfies
/ 0,
be two knot vectors on the interval / as in (4.17). The knot refinement can be almost arbitrary, if all knots have multiplicity at most m and the number of knots inserted between two adjacent knots of t^ is bounded by a constant Uj. This condition is quite realistic. It is required by the method of proof in [16], where the refinement matrix Pj^m in (4.18) is written as a finite product whose factors are block diagonal matrices, with blocks of the form (4.20). Under these very weak conditions, the following is proven in [16, Sections 5.5-5.6]. Theore m 7.4.4 Let 1 < /x < m. Let Sj and Sj+i denote the matrices in (4-38)for knot vectors tj C tj+i that satisfy the aforementioned conditions. Then the matrix (4.46)
Si+,-Pf,^Si(P^_^f
is positive semi-definite and handed, and there exists a positive semi-definite and handed matrix Zj, with row and column indices in KJ^^J^ , such that
5,+i - P^^^Si(Pf,^f = E/+,; ,^Z,(£f+i, ,j’".
(4.47)
Moreover, the handvjidth of Zj equals the handwidth of the matrix on the left-hand side of (4’4V minus ft. We will show in the next section, that the matrix Sj+i - Pj?^5j(Pj?^) ^ serves the same purpose for the construction of tight frames as the Laurent polynomial matrix M{z) in (2.46) and (3.17) does for the shift-invariant setting. The assertion of positive semi-definiteness generahzes the positivity condition (1.16), which was necessary for the construction of tight frame generators with knots at the half-integers. Moreover, the matrix Zj in the factorization (4.47) is the time-domain analogue of Mo{z) in Exam› ple 7.2.1 and (3.18). The factor Ef..^^^ and its transpose are needed, in order to construct spline functions with knots in t^+i, which have /i vanishing moments, see Lemma 7.4.1. Any synmietric factorization
can be employed in order to define a vector of spline functions * i = [V’i.itlfc := *f+i;
Qi,
(4.48)
TIGHT SPLINE FRAMES WITH NON-UNIFORM KNOTS
199
where each i/jj^k has /x vanishing moments. This shows that there is a close relation of VMR functions and approximate duals. We will come back to this point in Section 5, where we discuss the construction of a tight frame of L^(I). A more precise statement can be made about the sparsity of the matrix Z := Zj in (4.47). This is important for adaptive refinements of the knot vector, where the number of new knots that are inserted between two adjacent knots of tj varies (between 0 and Uj, say). Rx)ughly speaking, Z has zero rows and columns where the left-hand side of (4.47) has zero rows or columns. This situation occurs if there are large regions, where no knots are inserted. Moreover, we can define the lower profile of a matrix A = [ak,e]k,eeKby the sequence Xi{A) := max({^ - 1} U {k; ak,i ^ 0}), i£K. If A has bandwidth /x, then A(^) < ^-h/x- 1, of course. The factorization in (4.47) leads to a symmetric matrix Z, whose lower profile equals, up to an identification of appropriate columns of the two matrices, the lower profile of the matrix in (4.46) reduced by ^. Instead of going into more technical details, we explain the structure of Z by an example. E x a m p le 7.4.4 We let m = /x = 4 and to = [0,0,0,0,1,2,..., 19,20,20,20,20]. The matrix 5o in (4.38) has dimension 23 x 23 and bandwidth 4. If we insert simple knots P§,ASQ{P§,A)^ at 3.5 and 4.5 only, the matrix Si has dimension 25. The matrix Si is symmetric, positive semi-definite and has rank 9. The lower profile of its first 13 columns is given by the sequence of row indices (putting (1,1) as the upper left corner) [5,7,8,9,10,11,12,12,13,13,13,13,13]. All of the columns 14 - 25 are zero. Hence, the points of insertion do not affect this region. After performing the factorization in (4.47), we obtain the banded, symmetric and positive semi-definite matrix Z of dimension 21 x21, whose columns 10 21 vanish. The lower profile of the first 9 columns of Z is given by [1,3,4,5,6,7,8,8,9], and the 9 x 9 block in the upper left corner has full rank. The Cholesky factorization Z = R * R^ can be computed, where the upper left block of 100 * /2 is
3.93398 0.39340 6.24787 1.54318 5.78332 2.77857 4.77005 3.36881 4.97520 2.17523 6.29258 0.45862 5.86692
7.5 TIGHT SPLINE FRAMES WITH NON-UNIFORM KNOTS We show in this section that approximate duals of B-spUnes appear naturally in the characterization and construction of tight spline frames. In particular, the result of 7.4.4 is an important tool for this construction. Moreover, a characterization of all tight spline frames is given, and three comprehensive examples are provided. First, we must define the spHne MRA of L^{I) in the setting of splines with nonuni› form knots. A nested sequence of knot vectors
200
FRAMES OF SPLINE FUNCTIONS t o C ti C t aC . ..
(5.1)
is given, where each knot vector satisfies (4.1) and, if / is unbounded, (4.2) and (4.31) as well. If / has a finite endpoint, then to, and therefore all t^’s, are supposed to have an m-fold knot there. Moreover, there is a bound n G N for the maximum number of knots inserted between two knots, namely n := max max#([t\^\t\^li]ntj^i)
< oo,
and the knot vectors become dense in /, which means that
lim sup(4’2i - 4’^) = 0.
J o o
(5.2)
k
Then it follows from standard arguments in spline approximation, that the spaces V , : = 5 ( t , ; m ) n L 2 ( / ),
j > 0,
are dense in L^{I). We define function families ^ i := li^j,k; k e Mj] = ^f^,,^ Q
j > 0,
(5.3)
where Qj is a real matrix with row indices in K^+i and column indices in a set denoted by Mj. The localization properties of this family are defined as follows. Definition 7.5.1 The sequence of families ^j, j > 0, is called locally supported (with respect to the B-spline bases ^f, j > 1), if there exist integers ni,n2 (not depending on j) such that each tjjj^k G ^j is a linear combination of at most n\ consecutive JB-splines, and at every point a; G / at most n2 functions ijjj^t ^ j do not vanish. In other words, each family ^j is locally finite, and the supports of the functions il^j^k shrink at the same rate as the supports of the B-splines when j tends to infinity. Recall from (4.38) the special definition of the approximate dual of the B-spline basis ^j;m ’= ^tj;m, whose matrix Sj := S{tj) has bsmdwidth fj.. We define the quadratic form
Tjf ’’= E if^Km,^) ^^ {Km,kJ).
f e L\I).
(5.4)
fc Kj
It follows from (5.2) that for every £ > 0, there exists jo K^i (x, y) = 0 for all
|x - y| >
N such that and j > jo.
(5.5)
Together with (4.42) and the uniform bounds (4.43), we obtain lim Tjf = l l / f
for all
/ G L^I).
(5.6)
j-*oo
This is one of the key identities that we employ for the proof of the following result. Theore m 7.5.2 Let m,/Lt N with 1 < /x < m and let {tj}j>o be a sequence of knot vectors as described above. Assume that banded matrices Rj are defined, such that
TIGHT SPLINE FRAMES WITH NON-UNIFORM KNOTS
201
Then the families with Qj := Ef+i,m,^Rj and j > 0,
^j = bPj,k]keMj := ^r,mQj,
(5.8)
aTe locally supported and constitute a tight frame of L^{I), in the sense that
||/||^ = r o/ + f;
5 ] |(/,Vi.Or foroll
feL’iiy
(5.9)
j =o keUj
Moreover, all the wavelets ipj^k in (5.8) have (at least) /x vanishing moments. The method of proof in [16] is an adaptation of the telescoping argument (2.44). More precisely, the factorization (5.7) allows us to conclude that Tj+if-Tjf=
Y^ \{f.^j,k)\\
3>0,feL\l).
(5.10)
keUj
The identity (5.9) follows from (5.6). Moreover, all the functions i/;j,fc have /x vanishing moments by Lemma 7.4.1. 7.5.2 shows that the construction of tight frames of spUnes can be reduced to a simple problem of Unear algebra, by means of the appoximate duals of B-splines. For an arbitrary sequence of knot vectors tj, i > 0, of at most polynomial growth, we only have to find the factorization (5.7), for each j > 0, in order to define a tight frame with /x vanishing moments. The theoretical base for this fax^torization was already given in 7.4.4, where we showed that the matrices Sj^i - P^^Sj{Pj^^)^ are positive semidefinite and admit the factorization with a positive semi-definite and banded matrix Zj. Therefore, the construction of the matrix Qj hinges on the existence of the factorization Zj = RjRj.
(5.11)
For finite matrices, this is trivially achieved by the Cholesky factorization of Zj. For bi-infinite banded Toeplitz matrices, a factorization of Cholesky type, with a banded Toeplitz matrix Rj^ is obtained as an appUcation of the Riesz-Fejer Theorem [49, pp. 117118]. The analogue for bi-infinite banded block Toeplitz matrices was recently obtained in [35]. For other types of infinite matrices, results for the factorization (5.11) with bande d Rj exist under the additional assumption, that Zj is strictly positive definite, see [20] . We refer to the survey article of van der Mee et al. [42] for ongoing research in this direction. Remar k 7.5.1 The "coarse scale component" To/ in the identity (5.9) is indispensable for the following reason. We only discuss the case where / is a bounded interval. If all the wavelets V’j./c in (5.9) have /i vanishing moments, then Top = \\p\\^ must hold for all polynomials p of degree ^ 1 , since all other summands in (5.9) vanish. Consequently, the kernel K^ reproduces all polynomials of degree fi I. We mentioned in Remark 14(a), that this is equivalent to the fact that 5o defines an approximate dual of order /i. Hence, the tight frame condition (5.9) is the natural condition to ask, when we define a tight frame based on a spline MRA which starts with a coarsest level VQ.
202
FRAMES OF SPLINE FUNCTIONS
It is interestin g to ask how th e propert y of local suppor t can b e mad e mor e precis e for th e wavelet s V’i.fc- In orde r to describ e th e sparsit y of th e Cholesk y decompositio n of th e matri x Sj+i P^^Sj{P^^)’^, we nee d to describ e th e lower profil e of thi s matrix . For thi s purpose , we recal l th e definitio n of th e inde x sequenc e Tj{k),fc K^, from (4.19) in Section 4.1. Thi s sequenc e describe s th e lower profil e of P^m- A secon d inde x sequenc e C(i), I KjH-i, is define d tha t denote s th e lower profil e of th e transpos e of P^m- In othe r words , ’q(k) is th e larges t row inde x of th e nonzer o entrie s in th e k^^ colum n of P/fm? an d (^(i) is th e larges t colum n inde x of th e nonzer o entrie s of th e i^^ row of thi s matrix . It follows by elementar y combinatoria l arguments , tha t th e lower profil e of th e matri x 5,4-1 - P,%Sj{P,%f is give n by £(0:=r7(C(i ) + M - l ) . Th e matri x Qj in th e Cholesk y factorizatio n ha s th e sam e lower profile . Note tha t n o fill-in of nonzer o element s occurs , sinc e ^ is an increasin g sequence . If w e defin e i/{k) to b e th e numbe r of ne w knot s in tj^-i , tha t lie in th e ope n interva l {tk\tklm+fx-i)^ ^fi also obtai n tha t i < i{i) < i -f u{K{i)) + /i - 1. Thes e consideration s lead to th e followin g result . P r o p o s i t i o n 7 . 5 .1 Let Sj he the spd matrix in 7.^.2, and assume that the factorization of Zj in (4’4V exists. Then there exists a factorization QjQj
= Sj + i -
Cholesky
Pj,mSj{Pj,m)
where Qj defines the wavelets tpj^i of a tight frame with fi vanishing moments, and each tpj^i is a linear combination of at most i^CCCO) + A* consecutive B-splines of the basis ^j-\-i;m, starting with Nj+i;m,i’ In particular, the wavelet rpj^i is a spline in Vj+i whose support is contained in [t[^’^^\t^J}s.^. _ J. We can compar e th e previou s resul t wit h Remar k 5, in th e case wher e precisel y on e kno t is inserte d betwee n tw o adjacen t knot s of tj. In thi s particula r case, we hav e j/(k) is Hence, th e numbe r i^(C(0)-l-/i = 2/i + m 1 is th e sam e numbe r tha t constan t m-^fi l. w e denote d by ni in Remar k 5 of Section 2.1. Thi s is th e numbe r of nonzer o B-spline s in th e representatio n of on e functio n of a pair {rp^^\rp^^^) of minimall y supporte d tigh t fram e generators . Thi s shows , tha t ther e is essentiall y no differenc e betwee n th e suppor t of wavelet s for th e shift-invarian t settin g an d for nonunifor m kno t sequences . Th e next theore m show s tha t approximat e dual s are essentia l for th e characterizatio n of tigh t frame s eve n in muc h mor e generality . A simila r resul t is give n for non-splin e frame s in [16] . T h e o r e m 7 . 5 .3 Let I < fi < m,, So be an spd banded matrix such that ^ o, approocimate dual of order fi, and let locally supported families ^ j : = ^j,mQjf j ^0, be defined, where Qj = Ef^i ^^^Rj with some banded matrix Rj. Then the functions tpj^k define a tight frame of L^{I), in the sense of (5.9), if and only if there exist spd banded matrices Sj, j > 1, such that the following statements hold: (i) ^jm Sj is an approximate dual of order /A of the B-spline (ii) lim Tif = l l / f for allf& L\l);
basis
^frni
203
TIGHT SPLINE FRAMES WITH NON-UNIFORM KNOTS (Hi) 5,+i - P,^^Sj{P,^^f =
QjQj.
We end our discussion by giving several exaunples that explain the general approach sketched in the results of this section. 7.5.1 Piecewise linear tight frames We discuss the construction of the wavelets V’o.fc, where to C ti are nested knot vectors. The families ^ j , j > 1, are constructed analogously. Here we consider piecewise linear J5-splines and 2 vanishing moments, hence m = /i = 2. The matrices 5o and 5i in (4.38) are tridiagonal of dimensions No+ 2 and 27Vo H- 3, respectively. The diagonal matrices Uo and Ui in (4.39) have diagonal entries u^^^ = l
A’’uV^ =
V*’fc-|-2
and
for j = 0,1. We present an explicit construction for the case of a bounded interval [a, 6], where all interior knots are simple and one "new" knot of ti \ to is placed between two adjacent knots of to; in other words, we assimie that /(I)
f(l)
(1) (1) (1) (1) ^^\ in the case the shifts extend over Z. If we restrict our consideration to the interval [0, iV + 1], as in Section 5.2, the knot vector for cubic spUnes (m = 4) has the form to = { 0 , 0 , 0 , 0 , 1 , 1 , 2 , 2 , . . . , AT, iV, i V - f 1, AT + 1, A^ + 1, AT + 1}.
The refined knot vectors tj^j > 1, are obtained by inserting double knots at the midpoints of each knot interval. For t i, for example, we insert double knots at 1/2-^ k^ 0 < k < N. The dimension of the spUne space Vb is 2A/^ -f 4, that of Vi is 4N + 6. This setting is comparable to that of Section 3. Remar k 7.5.3 As mentioned in Section 3, no analogue for the construction of tight frames from this type of MRA has yet been developed based on the Fourier-domain approach. Our time-domain construction, however, makes no assumptions concerning the number of generators of the MRA. In fact, the absence of techniques based on any sort of shift-invariance makes our new technique versatile for much more general settings, where multiplicities of knots can vary and adaptive refinements are allowed. Therefore, our new results provide, at least for spline spaces, a unified treatment of several types of MRA. The current example shall serve as a simple illustration. For the construction of a tight frame for the given MRA, we proceed as sketched out before. The approximate dual So of order /x = 4 was aheady computed in Exam› ple 9(b) and Example 11(b). The matrix Zo in (4.47) has dimension AN -h 2, is positive definite and has bandwidth 8. Instead of its Cholesky factorization, we now wish to find another factorization that defines interior wavelets that have four vanishing moments, are symmetric or anti-symmetric and are translates of only a small number of generators ^(0 £ Vi (it turns out 5 generators are enough). At both endpoints of the interval we require several boundary wavelets which have also 4 vanishing moments. Table 7.5. Coefficients (xlOOO) of the generators T/;^*^ as in expansion (5.12) i
_g^^^l^^’
^0
^1^’
i’’
4"
4*’
2.526977
0.505395
0.126349
1 0.092642 0.370569 1.852847 0.989527 --0.989527 -1.852847 -0.370569 -0.092642 2 0.126349 0.505395 2.526977 3.156191 3.156191 i
x(») ^3
Ai) ^4
Ai) ^5
^^^’
4"
4’’
3 0.526730 1.601752 0.086252 -0.086252 -1.601752 -0.526730 4 0.580480 2.180883 1.757771 1.757771
2.180883
5
0.869741
0.869741 3.478964 3.478964
0.580480
TIGHT SPLINE FRAMES WITH NON-UNIFORM KNOTS
209
Figure 7.12. Boundary wavelets of tight frame of cubic splines with double knots and 4 vanishing monnents
Similar to the simple knot case, three symmetric reductions
Zi = {I~ K3)(I- K2){I - Ki)Zo(I- KJ){I- Kj)il - KJ)
210
FRAMES OF SPLINE FUNCTIONS Table 7.6. Coefficients (xlOOO) of the 7 boundary wavelets as in expansion (5.12)
I
Ai)
^(0
Ai)
Ai)
M)
M)
M)
M)
1 1.030983 1.417601 0.644364 0.096655 2
1.964342 1.523281 0.719836 0.300617 0.060123 0.015031
3
2.170762 1.104518 0.574380 0.134319 0.038137 0.001519
4
0.909528 3.566099 2.804337 1.352000 0.523422 0.061807
5
0.987016 3.948064 3.102908 1.320567 0.181613
6
0.100948 0.403790 2.018952 1.126572 0.207278
7
2.193554 0.731185
(with tridiagonal matrices I Ki) lead to a matrix Z\ with bandwidth 4. The factorization of Z\ leads to the definition of 7 boundary wavelets at both endpoints and 5 generators -^C*) £ y^ of the interior wavelets, such that i^^^i’-k),
l 1. When no confusion is likely, we keep the notation we introduced when we defined these various discrete affine systems. Thus, for example, t/;j,fc(x) now denotes the functio n a~^^’^’4){a~^x k), and tpj^ki^) denote s the functio n a’^^"^rp{a~^{x Associated with these discrete systems are the continuous wavelets produced by the groups G and G*, associated with the dilation group D, where Z) = { a ^ : j G Z Z }C GL(1,IR). Then, the system corresponding to the one defined by (1.5) is the collection of functions V’i,6(x) = a-^/2 ’^{a-’x - 6), j ZZ, 6 6 IR. The reproducing property (1.6) is, then.
111 /1 for all /
,c77. ’IR
Z/^(]R). If we use the group G* in this case, then the formula (1.10) reduces to
In this case, D is an abelian group (isomorphic to (ZZ, -h)), and \i is the counting measure. G and G*, however, are not unimodular and dA*(a,6) = a~^ dfj,{j)db. Equality (1.7) has the form (2.3) Y, ma-^)\’’= 1 a.e. i2Z
The factor a~^ in (2.2), that arises from the form of the left Haar measure A*, could be incorporated in the definition of the co-affine system thus giving us a re-normaUzatio n of
k)).
220
AFFINE WAVELETS
the elements V’^.A:- In fact, the quasi-affine system X{7p)does this for "half" the system: for j > 0, we let i^j^k = a’^^^ T-k D^-j iP, while t/i^.^ = ipj,k if j < 0. In order to clarify the situation, we are now going to study ^he discrete affine systems ^W {’^j,k ’ j^k ^ 2Z}, the discrete quasi-affine systems X(V’) = {i^j,k : j^k 7Z,}, and the discrete co-affine systems X*{tp) = {’’Pj,k’ h^ G ZZ}, where the dilations are integral powers of a > 1 and il) G L^(]R). The following observations will present further evidence for the discovery of Ron and Shen to be of importance. We showed, at the end of the first section, that X*{XIJ) cannot be an orthonormal basis for Z/^(R) when this is the case for X{IIJ). In view of the "equivalence" between the systems X{’4))and X{7p)^and the fact that X(’0) consists of a specific renormalization of "half" the system X*{tp), it is reasonable to inquire if there are renormahzation s of X*(ip) that provide a frame (or even a Bessel system). More precisely, does there exist a real sequence {c^}, j G ZZ, such that {cjt/j*^}^ j^k e 71, is a. frame for L^(R)? That is, are there constants A, B such that 0 < y4 < i? < oo, for which
<EEK/.ciV’;,oi’l^ = w{x). je'Zke'z
We claim that
/ w{x)dx= / 1/(01’ E I’^il’"’ \^i 0, TV G INL More precisely, ^^^^ = a~^^^ T-k D^-j tp if j > TV, and Tpj^k = i^j.k otherwise. Then X^{7p) can be a normalized tight frame for appropriate tp; in this case, X{tp) is such a frame as well, there are, however, ip such that X{7p)is a normalized tight frame and, yet, this fails to be the case for X^{ip). A precise result when N = I and a = 2 that explains this situation is the following fact: Theore m 8.2.2 X^{ip) is a normalized tight frame if and only if
(a )
Y V(2^0V’(2^($4-g)) = 0
a.e. when q is odd
j>0
(Hi) ^ J>o
V(2^0 V^(2^(^ + 7^9)) = 0
a.e. when q is odd.
2
ACKNOWLEDGEMENT The authors are grateful to Eugenio Hernandez, Hrvoje Sikic and Fernando Soria for several stimulating discussions with the authors about these matters. REFERENCES [1] M. Bownik, On characterizations of multiwavelets in L^{H^), Proc. Am. Math. Soc. 129 (2001), 3265-3274. [2] E. Hernandez and G. Weiss, A First Course on Waveiets, CRC Press, Boca Raton, FL, 1996. [3] R. S. Laugesen Completeness of orthonormal wavelet systems, for arbitrary real dila› tion, Appl. Comp.Harmoni c Anal., to appear (2001)
REFERENCES
223
[4] A. Ron and Z. Shen, AfRne systems in L2{R*^): the analysis of the analysis operator, J. Functional Anal. App., 148 (1997), 408-447. [5] Z. Rzeszotnik, Calderon’s condition and wavelets, Collect. Math., to appear, (2001). [6] G. Weiss, and E. N. Wilson, The mathematical theory of wavelets, Proceedings of the NATO-AST meeting "Harmonic Analysis 2000 - A Celebration", Kluwer, 2001.
This Page Intentionally Left Blank
Beyond Wavelets G. V. Welland (Editor) ' 2003 Elsevier Science (USA) All rights reserve d
SPARSITY VS. STATISTICAL INDEPENDENCE IN ADAPTIVE SIGNAL REPRESENTATIONS: A CASE STUDY OF THE SPIKE PROCESS BERTRAND BENICHOU AND NAOKI SAITO Ecole Nationale Superieure des Telecommunications 46, rue Barrault 75634 Paris cedex 13 Prance henichou@ email enst.fr Department of Mathematics University of California, Davis One Shields Avenue Davis, CA 95616 saito @math. ucdavis. edu
A b s t r a ct Pinding a basis/coordinate system that can efficiently represent an input data stream by viewing them as realizations of a stochastic process is of tremendous importance in many fields including data compression and computational neuroscience. Two popular measures of such efficiency of a basis are sparsity (measured by the expected F norm, 0 < p < 1) and statistical independence (measured by the mutual information). Gaining deeper understanding of their intricate relationship, however, remains elu› sive. Therefore, we choose to study a simple synthetic stochastic process called the "spike process", which puts a unit impulse at a random location in an otherwise zero vector of length n in eax:h realization. Por this process, we prove the following results: 1) The standard basis is the best in terms of sparsity for all n > 2 among all possible orthonormal bases in R" or all possible invertible linear transformation s in R" with a fixed determinant value; 2) The standard basis is again the best in terms of statistical independence if n > 5 and the search of basis is restricted within all possible orthonormal bases in R"; if 2 < n < 4, then the standard basis is not the best orthonormal basis in statistical independence; 3) If 225
226
SPARSITY VS. STATISTICAL INDEPENDENCE we extend our basis search to all possible linear invertible transformation s in R**, then the best basis in statistical independence is not the standard basis for any n > 2; 4) The best basis in statistical independence is not unique in general, and there even exist those which turn input spikes into completely dense vectors; 5) There is no linear invertible transformation that achieves the true statistical independence for n > 2.
9.1 INTRODUCTION What is a good coordinate system/basis to efficiently represent a given set of images? We view images as realizations of a certain complicated stochastic process whose probability density function (pdf) is not known a priori. Sparsity is important here since this is a measure of how well one can compress the data. A coordinate system producing a few large coefficients and many small coefficients has high sparsity for that data. The sparsity of images relative to a coordinate system is often measured by the expected £^ norm of the coefficients where 0 < p < 1. Statistical independence is also important since statistically independent coordinates do not interfere with each other (no crosstalk, no error propagation among them). The amount of statistical dependence of input images relative to a coordinate system is often measured by the so-called mutual information, which is a statistical distance between the true pdf and the product of the one-dimensional marginal pdfs. Neuroscientists have become interested in efficient representations of images, in par› ticular, images of natural scenes such as trees, rivers, mountains, etc., since mammalian visual systems effortlessly reduce the amount of visual input data without losing the essential information contained in them. Therefore, if we can find what type of basis functions are sparsifying the input images or are providing us with the statistically inde› pendent representation of the inputs, then that may shed light on the mechanisms of our visual system. Olshausen and Field [18] , [19] pioneered such studies using computa› tional experiments emphasizing the sparsity. Immediately after their experiments. Bell and Sejnowski [1], van Hateren and van der Schaaf [24] conducted similar studies using the statistical independence criterion. Surprisingly, these results suggest that both sparsity and independence criteria tend to produce basis functions efficient to capture and repre› sent edges of various scales, orientations, and positions, which are similar to the receptive field profiles of the neurons in our primary visual cortex. (Note the criticism raised by Donoho and Flesia [9] about the trend of referring to these functions as "Gabor"-like functions; therefore, we just call them "edge-detecting" basis functions in this paper.) However, the relationship between these two criteria has not been understood completely. These experiments and observations inspired our study in this paper. Our goal here, however, is more modest in that we only study the "spike" process, a simple synthetic stochastic process which puts a unit impulse at a random location in an otherwise zero vector of length n in each reaHzation. It is important to use a simple stochastic process first since we can gain insights and make precise statements in terms of theorems. By these theorems, we now understand what are the precise conditions for the sparsity and statistical independence criteria to select the same basis for the spike process. In fact, we prove the following facts. The standard basis is the best in terms of sparsity for all n > 2 among all possible orthonormal bases in R** or all possible invertible Unear transformation s in R** with a fixed determinant value;
NOTATION AND TERMINOLOGY
227
The standard basis is again the best in terms of statistical independence if n > 5 and the search of basis is restricted within all possible orthonormal bases in R"; if 2 < n < 4, then the standard basis is not the best orthonormal basis in statistical independence; If we extend our basis search to all possible linear invertible transformation s in R", then the best basis in statistical independence is not the standard basis for any n > 2; The best basis in statistical independence is not unique in general, and there even exist those which turn input spikes into completely dense vectors; There is no linear invertible transformation that achieves the true statistical indepen› dence for n > 2. These results and observations hopefully lead to deeper understanding of the efficient representations of more compUcated stochastic processes such as natural scene images. Additionally, a very important by-product of this paper is that this simple process can be used for vaHdating any independent component analysis (ICA) software that uses mutual information as a measure of statistical dependence, and any sparse component analysis (SCA) software that uses ^’’-norm (0 < p < 1) as a measure of sparsity. Actual outputs of the software can be compared with the true solutions obtained by our theo› rems. For example, the ICA software using mutual information of the inputs should not converge for the spike process unless there is some constraint on the basis search (e.g., search within all possible orthonormal bases). Considering the recent popularity of such software ( [14] , [2], [17]) , it is a good thing to have such a simple example that can be generated and tested easily on computers. Our investigations of other stochastic processes in terms of sparsity and statistical independence, such as the "generalized spike process" (which puts an impulse whose amplitude is sampled randomly from the standard normal distribution N(0,1) in each realization) and "ramp" process (another simple yet important stochastic process), can be found in Saito [22] and Saito et al. [23], respectively. The latter also contains our numerical experiments on natural scene images. The organization of this paper is as follows. The next section specifies notation and terminology. Section 3 defines how to quantitatively measure the sparsity and statistical dependence of a stochastic process relative to a given basis. Using a very simple example. Section 4 demonstrates that the spajsity and statistical independence are two clearly different concepts. Section 5 presents our main results. We prove these theorems in Sec› tion 6 and Appendices. Finally, we discuss the implications of our results and further research directions in Section 7. 9.2 NOTATION AND TERMINOLOGY Let us first set our notation and the terminology of basis dictionaries and best bases. Let X £ R’* be a random vector with some unknown pdf / j ^ . Let us assume that the available data T = {xi,..,,XAr} were independently generated from this probability model. The set T is often called the training dataset. Let B = {wi^... ^Wn) G 0(n) (the group of orthonormal transformation s in R**) or SL’^(n,R) (the group of invertible volume-preserving transformation s in R’*, i.e., their determinants are –1) . The best-basis paradigm [4], [26] , [20] , is to find a basis B or a subset of basis vectors such that the features (expansion coefficients) Y = B~^X are useful for the problem at hand (e.g., compression, modeUng, discrimination, regression, segmentation) in a computationally fast manner. Let C(B | T) be a numerical measure of deficiency or cost of the basis B given the training dataset T for the problem at hand. For very high-dimensional
228
SPARSITY VS. STATISTICAL INDEPENDENCE
problems, we often restrict our search within the basis dictionary D C SL (n,R), such as the orthonormal or biorthogonal wavelet packet dictionaries or local cosine or Fourier dictionaries where we never need to compute the full matrix-vector product or the matrix inverse for analysis and synthesis. Under this setting, B* = argminB^D C{B\ T) is called the best basis relative to the cost C and the training dataset T. We also note that log in this paper implies logj, unless stated otherwise. The n x n identity matrix is denoted by / , and the n x 1 column vector whose entries are all ones, i.e., ( 1 , 1 , . . ., 1)^, is denoted by In9.3 SPARSITY VS. STATISTICAL INDEPENDENCE The concept of sparsity and that of statistical independence are intrinsically different. Sparsity emphasizes the issue of compression directly, whereas statistical independence concerns the relationship among the coordinates. Yet, for certain stochastic processes, these two are intimately related, and often confusing. For example, Olshausen and Field [18], [19] emphasized the sparsity as the basis selection criterion, but they also assumed the statistical independence of the coordinates. Bell and Sejnowski [1] used the statisti› cal independence criterion and obtained the basis functions similar to those of Olshausen and Field. They claimed that they did not impose the sparsity explicitly and such spar› sity emeryed by minimizing the statistical dependence among the coordinates. These motivated us to study these two criteria. First let us define the measure of sparsity and that of statistical independence in our context. 9.3.1 Sparsity Sparsity is a key property for compression. The true sparsity measure for a given vector X R" is the so-called ^ quasi-norm which is defined as l|x||o = # { i [ l , n l : i i ^ O } , i.e., the number of nonzero components in x. This measure is, however, very unstable for even small geometric perturbations of the components in a vector. Therefore, a better measure is the i^ norm:
ii^iip=[Ei^in
i/p
, o < p
In fact, this is a quasi-norm for 0 < p < 1 since this does not satisfy the triangle inequality, but only satisfies weaker conditions: ||cc-l-y||p < 2~^^^ (l|aj||p + ||y||p) wherep’ = p / ( p - l) is the conjugate exponent of p; and ||a; -I- i/||^ < ||a;||J -h ||y||p. It is easy to show that limp J. 0 \\x\\^ = \\x\\o. See [6], [7], [8] for the details of the F norm properties. Thus, we use the expected F norm minimization as a criterion to find the best basis for a given stochastic process in terms of sparsity: C^{B\X)= E\\B-’xr^,
(3.1)
The sample estimate of this cost given the training dataset T is
CAB\T) = jj’£ WvX = ^ E E IJ/^.*!’-
(3-2)
SPARSITY VS. STATISTICAL INDEPENDENCE
229
where t/^ = (yi.fc, ,yn,k)^ = B~^Xk and Xk is the kth sample (or reahzation) in T. We propose to minimize this cost in order to select the best sparsifying basis (BSB): Bp = Bp(T,D) = argmin
C^{B\T).
Remar k 9.3.1 It should be noted that the minimization of the F norm can also be achieved for each realization. Without taking the average in k in (3.2), one can select the BSB Bp = Bp({xk}, D) for each realization Xk E T. We can guarantee that min Cp{B \ {xk}) < min Cp(B I T) < max Cp{B \ {xfc}). For highly variable or erratic stochastic processes, however, Bp{{xk),D) may significantly change for each k. Thus if we adopt this strategy to compress an entire training dataset consisting of N realizations, we need to store additional information in order to describe a set of N bases. Whether we should adapt a basis per realization or on the average is still an open issue. See Saito et al. [23] for more details. 9.3.2 Statistical Independence The statistical independence of the coordinates of Y / y (2/) = fYi{yi)fY2(y2)
R" means fYr,{yn),
where /y^ (yk)is a one-dimensional marginal pdf of / y . Statistical independence is a key property for compressing and modeling a stochastic process because: 1) an n-dimensional stochastic process of interest can be modeled as a set of one-dimensional processes; and 2) damage of one coordinate does not propagate to the others. Of course, in general, it is difficult to find a truly statistically independent coordinate system for a given stochastic process. Such a coordinate system may not even exist for a given stochastic process. Therefore, the next best thing we can do is to find the least statistically-dependent coordinate system within a basis dictionary. Naturally, then, we need to measure the "closeness" of a coordinate system (or random variables) Yi^... ,Yn to the statistical independence. This can be measured by mutual information or relative entropy between the true pdf fy and the product of its marginal pdfs:
HY)^ //y(y)iog
/^^^)
d|/^-i/(y)-f j^i/CyO,
where H{Y) and H{Yi) are the differential entropy of Y and Yi respectively:
H{Y) = - l fyiy) log/y(y)dt/,
H{Yi)= - J fy^yi) log/n(yi)d3/i.
We note that I{Y) > 0, and I{Y) = 0 if and only if the components of Y are mutually independent. See [5] for more details of the mutual information. Suppose Y = B’^X and B G GL(n,R) with det(B) = – 1 . We denote this set of matrices by SL’^(n, R). Note that the usual SL(n, R) is a subset of SL’^(n, R). Then, we have
I{Y)= -HiY) + Yi mVi) = -H{X) + Yl H{Y,),
230
SPARSITY VS. STATISTICAL INDEPENDENCE
since the differential entropy is invariant under such an invertible volume-preserving linear transformation , i.e., H{B-’X)
= H{X) + log I d e t ( B - ’ )| = if (X),
because | det{B~^)\ = 1. Based on this fact, we proposed the minimization of the following cost function as the criterion to select the so-called least statistically-dependent ba^is (LSDB) in [21] :
CH{B\X) = J2H {{B-’X)i)= Yl ^(^^)-
(^-^^
The sample estimate of this cost given the training dataset T is CH(B\T) = - ^ 5 ^ 1 o g / y . ( l / . , 0, k=l
t =l
where fYi(yi,k) is an empirical pdf of the coordinate V,, which must be estimated by an algorithm such as the histogram-based estimator with optimal bin-width search of Hall and Morton [11]. Now, we can define the LSDB as BLSDB = BLSDB(T,
D) = arg min CH{B \ T).
(3.4)
We note that the differences between this strategy and the standard independent com› ponent analysis (ICA) algorithms are: 1) restriction of the search in the basis dictionary D\ and 2) approximation of the coordinate-wise entropy. For more details, we refer the reader to [21] for the former and [3] for the latter. We now demonstrate the fact that the sparsity and the statistical independence are two intrinsically different concepts using a simple example. 9.4 TWO-DIMENSIONAL COUNTEREXAMPLE Let us consider a simple process X = (Xi, ^ 2 ) ^ where X\ and X2 are independently and identically distributed as the uniform random variable on the interval [ 1,1] . Thus, the reahzations of this process are distributed as the right-hand side of Figure 9.1. Let us con› sider all possible rotations around the origin as a basis dictionary, i.e., D = SO(2, R) C 0(2). Then, the sparsity and independence criteria select completely different bases as shown in Figure 9.1. Note that the data points under the BSB coordinates (45 degree rota› tion) concentrate more around the origin than the LSDB coordinates (with no rotation) and this rotation makes the data representation sparser. This example clearly demon› strates that the BSB and the LSDB are different in general. One can also generaHze this example to higher dimensions.
9.5 THE SPIKE PROCESS An n-dimensional spike process simply generates the standard basis vectors {ej]’^^i C R" in a random order, where ej has one at the j th entry and all the other entries are zero. One can view this process as a imit impulse located at a random position between 1 and n as shown in Figure 9.2.
231
THE SPIKE PROCESS
Preferred by Sparsity
Preferred by Independence U)
'o
M^«%|4C^ n&
ppv2E*Q
d
B^a^P^M
o o lO
o
^
q in T-
-1.5 -1.0 -0.5
0.0
0.5
1.0
1.5
-1.5 -1.0 -0.5
XI
0.0
0.5
1.0
1.5
XI
Figure 9.1. Sparsity and statistical independence prefer the different coordinates
100
ISO
200
250
Figure 9.2. Ten realizations of the spike process (n = 256)
9.5.1 The Karhunen-Loeve Basis
Let us first conside r the Karhunen-Loev e basis of this proces s from whic h we can learn a few things .
232
SPARSITY VS. STATISTICAL INDEPENDENCE
Propositio n 9.5.1 The Karhunen-Loeve basis for the spike process is any orthonormal basis in R*^ containing the "DC" vector ! = ( 1 , 1 , . . . , ! ) ^. This means that the KLB is not useful for this process. This is because the spike process is highly non-Gaussian.
9.5.2 The Best Sparsifying Basis
It seems obvious that the standard basis is the BSB among 0 ( n) by construction; an expansion of a realization of this process into any other basis simply increases the number of nonzero coefficients. However, we still need to verify this. In fact, we have the following theorem. Theore m 9.5.1 The BSB for the spike process is the standard basis if D = 0(n) or SL–(n,R). Remar k 9.5.1 It is not meaningful to consider the group GL(n, R) as a basis dictio› nary to find the BSB since one can always find an invertible matrix B whose inverse B~^ consists of infinitesimally small entries so that the cost Cp{B~^X) is close to zero. However, we can consider the subset GLa(n, R) C GL(n, R), which consists of all invert› ible matrices whose determinant is a > 0. (Note that this set GLo{n,R) is generally not a group since it does not contain the inverse matrices of its members. Of course GLi(n, R) = SL(n, R) is a special case.) The following corollary is a minor modification of Theorem 9.5.1 by recognizing that GLa(n,R) = a^/" SL(n,R): Corollar y 9.5.2 If D = GLa(n, R) with a> 0, then the BSB must be the scalar multiple of the identity matrix, a^^^InRemar k 9.5.2 Note that when we say the basis is a matrix such as a^^’^In, we really mean that the column vectors of that matrix form the basis. This also means that any permuted and/or sign-flipped (i.e., multiplied by 1 ) versions of those column vectors also form the basis. Therefore, when we say the basis is a matrix i4, we mean not only A but also its permuted and sign-flipped versions of A, This remark also applies to all the propositions, lemmas, and theorems below, unless stated otherwise.
9.5.3 Statistical Dependence and Entropy of the Spike Process
Before considering the LSDB of this process, let us note a few specifics about the spike process. First, although the standard basis is the BSB for this process, it clearly does not provide the statistically independent coordinates. The existence of a single spike at one location prohibits spike generation at other locations. This impfies that these coordinates are highly statistically dependent. Second, we can compute the true entropy H(X) for the spike process unfike other complicated stochastic processes. Since the spike process selects one possible vector from the standard basis of R" with uniform probabihty 1/n, the true entropy H{X) is clearly log n. This is one of the rare cases where we know the true high-dimensional entropy of the process.
233
THE SPIKE PROCESS 9.5.4 The LSDB among 0(n)
Let us now consider 0(n), the set of all possible orthonormal bases in R", as our basis dictionary. Then, we have the following theorem. Theore m 9.5.3 The LSDB among 0(n) is the following: for n > S, either the standard ba^is or the basis whose matrix representation is n-2 -2
-2
-2 n-2
(5.1) -2 -2
-2
n-2
-2
-2
n-2
1 1 1 1 for n = 4, the Walsh basis, i.e., ^
1 1 - 1 - 1 1-11-1 1-1-1
for n = 3,
1 1 7 5 v/6 v/ 2 1 1 - 1
1
1
Vs Ve \/2 -1. ^ 0
1 for n = 2, 72
1 1 1 -1
and
and this is the only case where the true independence is
achieved. Remar k 9.5.3 There is an important geometric interpretation of (5.1). This matrix can also be written as: _ D
^
r
J^HR(n) = In ^
1 o ^n
1^ J-n
Z = r=. y/ny/n
In other words, this matrix represents the Householder reflection with respect to the hyperplane {ye R"* | X)"^o2/t = 0} whose unit normal vector is In/y/n. Below, we use the notation Bo(n) for the LSDB among 0(n) to distinguish it from the LSDB among GL(n, R), which is denoted by BcLin)- So, for example, for n > 5, Bo(n)
in or BHR(n)
9.5.5 The LSDB among GL(n,R) Before discussing the LSDB among a larger dictionary of bases, let us remark an impor› tant specifics for a discrete stochastic process. Let X be a random vector obeying a discrete stochastic process with a probability mass function (pmf) / j ^ . This means that there are only finite number of possible values
234
SPARSITY VS. STATISTICAL INDEPENDENCE
(or states) X can take. Clearly the spike process is a discrete process since the only possible values are { e i , . . . , e } , the standard basis vectors. Then, for any invertible transformation B £ GL(n,R) with Y = B~^X, be it orthonormal or not, the total entropy of the process before and after the transformation is exactly the same. Indeed, in the definition of discrete Shannon entropy, ^ pj logp^, the values that the random variable takes are of no importance; only the number of possible values the random variable can take and its pmf matter. In our case, it is clear that the events {X = ai} and {Y = hi} where bi = B~^ai are equivalent; otherwise the transformation would not be invertible. This implies that the corresponding probabilities are equal: Pr{X = ai} = P r {y = 6i}. Therefore, considering the expression of the discrete Shannon entropy, this proves that H(Y) =
H{X),
as long as the transformation matrix belongs to GL(n, R). Note that for the continuous case, this is only true if B G SL^(n,R). Therefore, for a discrete stochastic process like the spike process, the LSDB among GL(n, R) can be selected by just minimizing the sum of the coordinate-wise entropy as (3.4) as if D = SL^(n, R). In other words, there is no important distinction in the LSDB selection from GL(n,R) and from SL^(n,R) for discrete stochastic processes. Therefore, we do not have to treat these two cases separately. Note that the case of the BSD is a different story as we already mentioned in Remark 2. Now we have the following theorem: Theore m 9.5.4 The LSDB among GL(n, R) with n>2 analysis and synthesis respectively): a
a
62
C2 62 • • •
^3
^3
is the following basis pair (for
a
C3
63
^3
(5.2)
^GL(n )
bn-l
bn-l
b
Cn- 1 6 n - l
b
(1 + Efc=2 ^fc^O / ^ -^2 - ^ 3 JB oo. More precisely, we have the following proposition. Propositio n 9.5.2 lim
i / 0 < p < 1;
j DO
a {BHR{n) \X) =
n for 0 < p < 1 and the equality holds if and only if p = 0. Yet this is still the LSDB. Finally, from Theorems 9.5.3 and 9.5.4, we can prove the following corollary: Corollar y 9.5.5 There is no invertible linear transformation providing the statistically independent coordinates for the spike process for n > 2. In fact, the mutual information I (5o( )X ) and I (BQ^.X) are monotonically increasing as a function ofn, and both approaches to loge « 1.4427 as n > oo. Remar k 9.5.5 Although the spike process is very simple, we have the following inter› pretation. Consider a stochastic process generating a basis vector randomly at a time selected from some orthonormal basis. Then, that basis itself is both the BSB and the LSDB among 0(n). Theorem 9.5.3 claims that once we transform the data to the spikes, one cannot do any better than that both in sparsity and independence within 0(n) with n > 5. Of course, if one extends the search to nonlinear transformations , then it becomes a different story. We refer the reader to our recent articles Lin et al. [15], [16], for the details of a nonlinear algorithm.
9.6 PROOFS OF PROPOSITIONS AND THEOREMS 9.6.1 Proof of Proposition 9.5.1 Let X = ( X i , X 2 , . . ., A"n)^ be a random vector generated by this process. For each of its realizations, a randomly chosen coordinate among these n positions takes the value 1, while the others take the value 0. Hence each X», i = 1,... ,n, takes the values 1 with probability 1/n and the value 0 with probability 1 1/n. Let us calculate the covariance of these variables. First, we have: E{Xi) = i X 1 + [ l - i " ) xO=: i n \ nJ n E{XiX^) =
fori = l , . . . ,n
E{Xf) = E{Xi) if i = j ; 0
if i # j ,
since one of these two variables will always take the value 0. Let R = (Rij) be the covariance matrix of this process. Then, we have:
Rii = E{X,X,) - E(Xi)E{Xi)= U,i -
^
We know that a basis is a Karhunen-Loeve basis if and only if it is orthonormal and diagonalizes the covariance matrix. Thus, we will now calculate the eigenvalue decomposition -^In’^^of the covariance matrix R = ^In We now need to calculate the determinant:
237
PROOFS OF PROPOSITIONS AND THEOREMS
A - i + Jj
1
1
PR{\) = det(A7 -R) =
1
1^ 1
1 -L
1
which is of the generic form:
a-\-b
b
A(a,6) =
6 a4-6 with the values a = \ 1/n and 6 = 1/n^. We can easily evaluate this determinant by subtracting the last row from all the others followed by adding all n 1 columns to the last column: a 0
0
0
0 a
’. :
:
0
:
A(a,6) =
’
:
0 . .. 0 a b
= a"
(a + nb).
(6.1)
0
b a-{-nb
Putting a = X 1/n and b = 1/n^, we have the characteristic polynomial P^ of i? as PR(A) = A(A l/n)^~^. Hence, the eigenvalues of R are A = 0 or 1/n. It is now obvious that the vector 1 is an eigenvector for R associated with the eigenvalue 0, i.e., In G ker R. Indeed, we have RU
= (-In
\n
-
Inln
n^
) U
J
= ’
n
U - \
n^
Tlln = 0.
Since dim ker P = 1, keri? is a one-dimensional subspace spanned by In Considering that R is symmetric and only has two distinct eigenvalues, we know that the eigenspace associated to the eigenvalue 1/n is orthogonal to VexR, which is the hyperplane {y ^ " I Zir=i y^ ~ 0}- Therefore, the orthogonal bases that diagonaUze R are the bases formed by the adjunction of In to any orthogonal basis of VexR^. The Walsh basis, which consists of oscillating square waves, is such a basis, although it is just one among many. D 9.6.2 Proof of Theorem 9.5.1 We first prove the case D = SL’*’(n,R). Then, the case oi D = 0(n) is automatic since this is just a special case ofSL^(n,R). Let B be any matrix in SL^(n,R), and let hj be its j th column vector. Let us first write the cost function (3.1) for the spike process in terms of the matrix elements of B\
238
SPARSITY VS. STATISTICAL INDEPENDENCE (6.2) t =l
j=l
j=l
It is a well-known fact [13, p. 112] that for any nonsingular real-valued matrix B (i.e., GL(n, R)), there exists a unique QR factorization (6.3)
B = QR, where Q £ 0(n) and i? is an n x n upper triangular matrix. rir,
rii ri2 R=
0
0
:
0
(6.4)
rnn
with Tjj > 0, j = 1 , . . ., n. Furthermore, since det(B) = – 1 , we can assume YYj=i’’^n Let qj be the jth column vector of Q. Then, from (6.3) and (6.4), we have ^j =’^jiQi
+ --’-^’^jj9j^
^^
l , . . . , n.
3
Now, the cost function (6.2) can be written as:
n-Cp(B|X ) = 53||6,’J lip
(6.5)
= Il’’ii9i lip + llnzgi + r22q2\\p +
+ llnnqi +
+ r nq || ? .
+ Ikinqi + + r q || ^ . (6.6) n Cp(B I X) = llniqill^ + ||ri2ir^i2^rl2r^\ since the vector ri2qi -h ^2292 has the i’^ length (r?2 + ^’22)^’’^, and among any vector of that length in R", the minimum F norm is attained if rnqi -\-r22q2 = –(^12 +^22)^^^^^ for some k G { 1 , . . ., n}, i.e., if it is aligned along one of the standard basis vectors. We can repeat this argument and finally we have: n Cp(B I X) > r?i + (r?^ + r^^)"’^ +
+ ( r^ +
+ r ^ ) " ^ ’.
Let g{R) denote the righthand side of this inequality. Since all the diagonal elements of R are positive, this is further bounded from below by P ( ^ ) > ^ l l + ^ 22 + - - - - h r ^ n,
PROOFS OF PROPOSITIONS AND THEOREMS
239
by setting all the nondiagonal elements of R to zero. This is again bounded from below by where rkk = m i n ( r i i , . .. ,rnn). Combining this with the fact that Y[j=i’’^n 1 ^^^ Tjj G R, we must have TJJ = 1 for j = 1,... ,n, i.e., R = In as the minimize! of the function g{R). That is, min g{R) = 9{In) = n. Coming back to the matrix B = QR, the minimizer of Cp{B\X) must satisfy B = Q, and furthermore
n.Cp(Q|X)=.||gJ|^ + ... -+r |K [ l i p -TWHnWp where the equality holds if and only if Q is a permutation matrix or the sign flipped version of such a matrix, by the same argument of the minimization of the F norm of an ^^-unit vector. This implies that B must be the identity matrix modulo permutations and sign flips. D 9.6.3 Coordinate-wise Entropy of the Spike Process Before proceeding to the proof of Theorems 9.5.3 and 9.5.4, let us consider coordinatewise entropy of the spike process and define some convenient quantities for characterizing a basis in 0(n) or GL(n, R). Let us consider an invertible matrix U = (iitj)t,j=i,...,n = B’^ £ GL(n,R), and the vector Y = UX. Let us consider the zth coordinate of V, Yi = "^"^^lUijXj. For each realization of the spike process X, Yi takes one of the values {uij,j = 1 , . .. ,n}. More precisely, we have PT{XJ = 1} = 1/n and PT{XJ = 0} = 1 - 1/n, for j = 1 , . . ., n. Thus, if all {uij.j = 1,... ,n} were distinct, Yi would take these values with a uniform pmf. But there is no particular reason that allows us to think {uij,j = 1 , . . ., n} are mutually distinct. Therefore, we shall group these values in "classes" of equality. Let us introduce, for each i 6 { 1 , . .. , n }, an integer k{i) equal to the number of distinct values in the ith row vector {uij.j = 1 , . .. , n }, and the vector c{i) = ( a i ( i ) , .. .,ak(i)(i)) N’’^*^ where each component counts the number of occurrences of each distinct value in the ith row vector. We will call k{i) the class of the ith row and c{i) the index of that row. Clearly, we have 1 ^ ^(0 ^ ’^ ^^^ For example, with n = 3, if we had
k{i)
/"^ cti{i) = n.
t=\
Y\ = Xi -h X2 4- Xz Y2 = 5X1 -h 2X2 + 2X3 , V3 = Xi + X2 then we would get fc(l) = 1, c(l) = (3) fc(2) = 2, c(2) = (2,1) fc(3) = 3, c(3) = ( l , l , l )
240
SPARSITY VS. STATISTICAL INDEPENDENCE
since {uij} = {1,1,1} in which we find three I’s, {^2^} = {5, 2,2} in which we find two 2’s, one 5, and {usj} = {-1,1,0} in which we find one -1, one 1, and one 0. Let us now examine the coordinate-wise entropy in terms of the quantities we have just defined. Suppose the value u appears ae{i) times in {uij,j = 1 , . .. , n }. Then the probabihty of the event {Yi = u} is ae(i)/n. Therefore, we have k{i)
We shall now describe the different values that this coordinate-wise entropy takes as the number of distinct values and their occurrences vary. Because the entropy is a measure of uncertainty, we can intuitively guess that a coordinate with a small class number generates small entropy. k(i) = 1: This necessarily means that c(i) = (n), i.e., all the {uij,j = 1,... ,n} are identical. Since there is no uncertainty about this coordinate, its entropy should be 0. Indeed, H(Yi) = -Y:L^^ log ^ = 0. k{i) = 2: Let us consider the link between the uncertainty and the index c{i). k(i) = 2 means that Yi can take only two distinct values. The least scattered distribution of these two values corresponds to the case c{i) = ( l ,n 1). This is also the distribution closest to the certain case k{i) = 1 and c{i) = (n). We now show that the case c{i) ( l ,n 1) generates the smallest entropy. Suppose that Yi can take two distinct values with index (0^1,0:2), ai -\- a2 = n. In other words, Yi takes these two values with probabihty ai/n and a2/n = 1 a i / n, respectively. Without loss of generality, we can assume ai < 0:2. Then, the entropy of the coordinate Yi is
//(y,) = - f ^ l o g ^ + ^ l o g ^ l In
n
n
n \
=-(T'«f+('-f)'-('-?)i where the function / is defined as f{x) = -[x log(x) + (1 - x) log(l - x)],
0 < X < 1,
(6.7)
which is displayed in Figure 9.3. The following properties of this function / are basic and will be used repeatedly in this paper: For all X G [0,1] , /(x) > 0 and /(x) = 0 if and only if x = 0 or x == 1; For all X [0,1] , /(x) = / ( I - x); / is increasing on [0,1/2] , and decreasing on [1/2,1] ; / is concave on [0,1] . Since ai < a2, it suffices to consider QI with 1 < ai < n/2. So, we have 1/n < a i /n < 1/2, and in this interval, / ( a i / n) is strictly increasing. In other words,
/'^’'^KJ)"
-
Therefore, the entropy is minimal when ai = 1 and ct2 = n have H{Yi) > / ( 2 / n ).
I. For ai > 2, we clearly
PROOFS OF PROPOSITIONS AND THEOREMS
241
Figur e 9.3. A plot of / : x -^ - [xlogx + (1 - x) log(l - x) k{i) > 3: To find a lower bound of H(Yi) following lemma:
^ M i) o^
l o g ^ , we need the
Lemm a 9.6.1 Let k > 3 be an integer, and let ( a i , . .. ,afc) be a set of strictly positive integers with Ylj=i ^j ’’^- Then,
t ?.«.?.-(-^)/a) . See 9.8.1 for the proof of this lemma. Lemma 9.6.1 implies that
««,.(:. 5S;^)/(i).(,.^)/(i). We can now summarize these results as the following lemma: Lemm a 9.6.2 The coordinate-wise entropy of the spike process after transformed by a basis in GL(n, R.) con be computed or bounded as follows: ifk{i) = l, thenH{Yi) = 0;
(6.8)
ifkii) = 2,thenHiY^h^^’^’’^ ^ / - i W = 1/ [ > / ( 2 / n) if 2< ai{i)< n/2;
(6.9)
if k{i) > 3, then H{Yi) > (l + ^\ f ( i \ .
(6.10)
Let us now come back to our invertible transformation J7; we are searching for the LSDB among 0(n) or GL(n, R). This means that the cost of the LSDB, i.e., the sum of the coordinate-wise entropy of the LSDB coordinates, cannot be larger than that of the standard basis. Therefore we will always keep the standard basis in mind as a reference
242
SPARSITY VS. STATISTICAL INDEPENDENCE
basis with whic h we shall compar e the performanc e of all other bases. The standar d basis correspond s to U = In- Ever y row of the standar d basis has inde x k{i) = 2 and c{i) = (1, n 1). Hence the entrop y cost of the standar d basis is C//(/n \X) = nx /(1/n ) = nlog n - (n - 1) log{n - 1).
(6.11)
We saw that, assumin g k{i) > I, H(Yi) > /(1/n) , with equalit y if and only if k{i) = 2 and c(i) = ( l ,n 1). Therefor e a basis with k{i) > 1 for ever y i { 1 , . .. ,n } has no chanc e to win over the standar d basis, and the best thin g one can do with such a basis is to match the entrop y with that of the standar d basis, i.e., a basis with k{i) = 2 and c{i) = ( l ,n 1) for ever y i. ^. So, the only chanc e to beat the standar d basis is to hav e some "class 1" rows (i.e., k{i) = 1) in a basis. However , we will neve r find an invertibl e matri x with multipl e one class 1 rows . Indeed , a class 1 row is necessaril y proportiona l to 1^ = ( 1 , 1 , . . ., 1), and it is eviden t that no more than one class 1 row can exist in any invertibl e matrix. 9.6.4 Proof of Theorem 9.5.3 Let us start with a simpl e remark . If we assum e that B is an orthonorma l basis, then U = B~^ = B^. Hence the rows of U are in fact the basis vector s of this basis. In the case of an orthonorma l matrix, the presenc e of one row of class 1 impose s a constrain t on the other rows , since these rows mus t form an orthonorma l basis. The followin g lenun a describe s one of these constraints . Lemm a 9.6.3 If k{l) = 1, then it is impossible to have two class 2 rows with index ( l ,n 1) in a matrix U 0 ( n ). In other words, If k{l) = 1, then there do not exist i\,i2 ^ { 1 , . .. ,n} such that ii ^ ii and c(i\) = 0(22) = ( l ,n 1). The proof of this lemm a can be foimd in 9.8.2. Hence, assumin g that A:(l) = 1, we can have at most one row of class 2 with inde x ( l ,n 1). All the other rows will be of either class k{i) > 2 or class k{i) = 2 with inde x ( a i ,n tti), 1 < ai < n/2 . Considerin g the minimizatio n of the sum of the coordinate wise entropy , we mus t hav e one row of class 1 and one row of class 2 with inde x (1, n 1). All the other cases always increas e the entropy , i.e., dependency . From (6.9) and (6.10), the entrop y of a row with either k{i) > 2 or k{i) = 2 with inde x (ai, n ai) , 1 < ai < n/ 2 is bounde d from below as
«,. ,.((,.2),(1),;(1) )
=/a)-’»(!’a)’’(i)-’a))
Therefore , combinin g this with (6.8) for fc(l) = 1 and (6.9) for QI = 1, we hav e
E«(>-,) > »./(i).(n-2, [/ (1) ..^ (1 / (i),/(!)-/(i)) ] (e.., We now use the followin g lemma :
243
PROOFS OF PROPOSITIONS AND THEOREMS Lemm a 9.6.4 For
n>6,
[HM^-’m^m Proof Let us define a function: r(x) = x [f / (1) - (/ ( | ) - / (^))] for x > 2, where / is defined in (6.7). This is a continuous and monotonically-decreasin g function for X > 2, since r’(x) = - ^ log(x - 1) + log ^ ^ X
X^
1
< 0 for X > 2.
Moreover, we have r(5) ?^ 0.199 and r(6) « 0.310 , and we can find a zero of r(x) numerically, i.e., r(x*) = 0 where x* ?^ 5.3623. These prove that this function is negative if X > X*. Therefore, for each integer n > 6, r{n) < 0, i.e..
’^ V^/
V’^/
V’^/
Using this lemma for n > 6, (6.12) can be written as
|:«(V,)>/(i)+(n-2)[/(i) + | / ( i )
2(n - 2)
4 - n- 1
a)
Therefore, if we compare the mutual information of the new coordinates to that of the standard basis, we have
I(Y)-I{X)> That is,
2(n - 2)
+n- 1
a)-a)=
2(n - 2)
-1
a)
/(y)-/(x)>^^-^/fi')>o. n
\ ’ ^/
Thus, B = U ^ = U^ is not the LSDB. We have therefore proved that any orthonormal basis yields a larger mutual information than the standard basis for the spike process for n > 6. We can summarize our results so far. For n > 6, the standard basis is the LSDB among 0(n). Any basis that yields the same mutual information as the standard basis necessarily consists of only class 2 rows with index (1, n 1). Now the question is whether there is any other basis except the standard basis sat› isfying this condition. The following lemma concludes the proof of Theorem 9.5.3 for n > 6. Lemm a 9.6.5 For n > 2, an orthonormal basis consisting of class 2 rows with index ( l ,n 1) other than the standard basis is uniquely (modulo permutations and sign flips as described in Remark 2) determined as (5.1), i.e.,
244
SPARSITY VS. STATISTICAL INDEPENDENCE
n-2
-2
-2
1 I -2 n- 2
BHR{n)
:
.
’ .
-2
’:
. -2 -2 n - 2
The proof of this lemma can be found in 9.8.3. Note that this matrix becomes a permuted and sign-flipped version of I2 when n = 2, and approaches to the identity matrix as n > 00. We now prove the particular cases, n = 2,3,4,5 in Theorem 9.5.3. For these small values of n, we cannot use Lemma 9.6.4 anymore since we have mm I
/
^•^ (!)-'©) =
/
Therefore, we prove these cases by examining exhaustively all possible indexes and the coordinate-wise entropy they generate. The only possible classes of rows in this case are class 1 with index (2) and n = 2: class 2 with index (1,1), which generate the following entropy values (in bits): {2):H{Y,)=0;
(1,1) ://(KO = 2 X (^-1 log 0 = log2 = 1. The rows of the standard basis are of class 2 with index (1,1). Therefore, a basis with one class 1 row and one class 2 row generates lower entropy than the standard basis. Because of the orthonormalit y condition, it is easy to show that it must be U^ = B =
^
1 1
V2 1 - 1
or its permuted and sign flipped versions. In this case, the total coordinate-wise entropy is O-h 1 = 1 bit whereas the true joint entropy H{X)is also log 2 = 1. Therefore, the mutual information is 0, i.e., this basis provides the true statistically independent coordinates. The fact that this is the only case when the statistical independence is achieved if the basis search is restricted to 0{n) will become evident when one goes through the cases of n = 3,4, 5 below. n = 3: The possible indexes are (3), (1,2) and (1,1,1), which generate the following entropy values (in bits): (3) : H{Y,)= 0;
{l,2)://(y.)=/Q)
•|'°4
2,
2
log3--;
(1,1,1): H{Yi)= 3 X ( | _ i l o g i ) = l o g3 Once again, the only possibility for a basis to generate lower entropy than the standard basis is to include a class 1 row with index (3). But here we still cannot have two class 2 rows of index (1,2) on top of the class 1 row since Lemma 9.6.3 still holds for n = 3.
PROOFS OF PROPOSITIONS AND THEOREMS
245
Therefore, t he best combination is to have one row for each possible class, which leads to the following global coordinate-wise entropy: 2 0 - h l o g 3 - - + l o gs ~ 2.50 < 3 1 o g 3 - 2 1 o g 2 ~ 2 . 7 5, t h at is, this best possible basis is better t h an the s t a n d a rd basis. Therefore, the LSDB is a basis including a vector of each class. Considering the orthonormality of the basis, we can only have the following basis or its permuted or sign-flipped versions for n = 3: 1
1
1
1 v^ 1
1 \/6 -2
-1 v/2 n
v/3 "Te ^2
U^--= B =
n = 4: T he possible indexes are: (4), (1, 3), (2, 2), ( 1 , 1, 2), and ( 1 , 1 , 1 , 1 ), which gen› erate t he following entropy values (in bits): (4) : H{Y,) = 0;
(l,3):i/(r.) = / Q ) = 4 l o g l - f l o g | =. 0.811; (2,2)://(y,) = / ( ^ ) = l ;
(1,1,2) ://(yO = 4 l o g i 4 l o g i - i l o gi = 1.5; ( 1 , 1 , 1 , 1) : i / ( y . ) = 4 X
4-0-
T he total coordinate-wise entropy of the s t a n d a rd basis is 4 log 4 3 log 3 ^ 3.245 bits. Note t h at all t he rows of the standard basis is of class 2 with index (1,3). Let U be an orthonormal basis, and let {b[,i = 1 , . . . , 4} be its rows. If U generates smaller entropy t h an t he s t a n d a rd basis, it necessarily includes one class 1 row. There is no other choice. W i t h o ut loss of generality, let us assume t h at bj is of class 1, i.e., c ( l ) = (4). We now prove t h at we cannot have a class 2 row with index (1,3) in such a U if the total coordinate-wise entropy of U is smaller t h an t h at of the s t a n d a rd basis. Suppose t h at b2 of class 2 with index (1,3), i.e., c(2) = (1,3). If so, we cannot have any more class 2 row with index (1,3) in U by Lemma 9.6.3. Now, U cannot include a class 4 row vector of index ( 1 , 1 , 1 , 1 ). If so, these three rows (i.e., rows of class 1, 2, and 4) would generate the entropy 0 -h 0.811 -h 2 = 2.811 bits. Hence, as we can easily see from t he bit counts of the class indexes above, any other admissible choice for the remaining row would generate larger total coordinate-wise entropy t h an the s t a n d a rd basis does. Therefore we can discard these combinations immediately, and the indexes of 63 and 64 must be chosen from (2, 2) and ( 1 , 1 , 2 ). Since 62^ is of the form (a, a, a, b) its orthogonality with bJ implies t h at 62^ is proportional to the vector ( 1 , 1 , 1, 3) . If bJ were of index (2,2), it would be of t he form {c,c,d,d) and its orthogonality with fef implies t h at 6^ is proportional to {c,c, c, c). On the other hand, the orthogonaUt y with 62^ implies c H -c cH-3c = 0, i.e., c = 0, which is impossible. Therefore the only possibiUty for 63" and bJ would be of class 3 rows with index ( 1 , 1 , 2 ). Such a row generates the coordinatewise entropy 1.5 bits. T he total coordinate-wise entropy generated by such a basis U is therefore at least 0-|-0.811-f 2 x 1.5 = 3.811 bits, which is larger t h an t h at of the standard
246
SPARSITY VS. STATISTICAL INDEPENDENCE
basis, 3.245 bits. Hence we have proved that U containing a class 1 row cannot have any class 2 row with index (1,3). Therefore, the best choice must be one class 1 row and three class 2 rows with index (2,2). If this configuration is possible, then the total coordinatewise entropy is0-f-3x 1 = 3 bits and surely this basis beats the standard basis. Now we prove that this configuration is possible and this gives rise to the Walsh basis. We can assume 62 is of the form (a, a, 6,6). Its orthogonality with bj gives us a -f 6 = 0, i.e., 62^ is proportional to ( 1 , 1 , - 1 , - 1 ). Similarly, thanks to the orthogonaUt y and the linear independence, we can easily show that 63^ and bj are proportional to ( 1 , - 1 , 1 , - 1) and ( 1 , - 1 , - 1 , 1 ). This implies that the LSDB among 0(4) must be the Walsh basis matrix (modulo permutations and sign flips). n = S: In this case, we prove that the LSDB is the standard basis or the basis of the Householder reflection (5.1), both of which consist of class 2 rows with index (1,4) only. Indeed, using the similar argument as before, any basis generating smaller entropy than these two bases must have a class 1 row and a class 2 row with index (1,4). In this case, the other three rows must be either of class 2 with different indexes or of class 3 or higher. The smallest entropy of a class 2 row whose index is other than (1,4), i.e., (2,3) in this case, is /(2/5) « 0.9710 by (6.9), which is smaller than the smallest entropy of a class 3 row or higher (1 + 2/5)/(l/5) « 1.011 by (6.10). Therefore, this basis must have one class 1 row, one class 2 row with index (1,4), and three class 2 rows with indexes (2,3). The total entropy of such a basis is larger than that of the standard basis or the Householder reflection basis: J2 ^ ( ^ 0 > 0 + / Q ") + 3 X / /^l") - 3.635 > 5 X / Q ") ~ 3.610. This concludes the proof of Theorem 9.5.3. 9.6.5 Proof of Theorem 9.5.4 In the case of D = GL(n,R), the constraint imposed by Lemma 9.6.3 is Ufted since the rows of U = B~^ do not have to form an orthonormal basis anymore. Hence we can have as many rows of class 2 with index (l,n 1 ) as we wish, even if the first row of U is of class 1. Clearly, we still cannot have two class 1 rows because this violates the invertibility of U. Therefore, considering all these remarks and the classification of indexes estabUshed in the previous subsections, it is immediate to conclude that the combination of classes of rows leading to the smallest sum of coordinate-wise entropy is one row of class 1 and n 1 rows of class 2 with index ( l ,n 1). This matrix reaches the lower bound for the total coordinate-wise entropy (n l ) / ( l / n ). Considering the invertibility of the matrix with n 1 rows of class 2, the most general form of the admissible matrices is the following (modulo permutations and sign-flips): a a
t/(GL(n) = B.GL(n)
a
^2
C2
62
63
^3
C3
^2
63
63
PROOFS OF PROPOSITIONS AND THEOREMS
247
where a, 6A;, c^, /c = 2 , . . ., n, must be chosen so that UcLin) ^ GL(n, R). We can easily compute the determinant of this matrix in a similar manner that we derived (6.1): n
det{UGL{n))
=aY[{ck-bk). fc=2
Therefore, we must have a ^ 0 and bk ^ Ck for k = 2,... ^n for Uchin) to be in GL(n,R). Note that if we want to restrict the dictionary to SL’^(n,R), then we must have det (L^sL–(n)) = – 1 , i-e., a must satisfy a= – 11^=2 (^’^ - 6^)"^ The corresponding inverse matrix (5.3) can be computed easily by elementary linear algebra, i.e., the Gauss-Jordan method. This concludes the proof of Theorem 9.5.4. D 9.6.6 Proof of Proposition 9.5.2
If we transform the spike process X by the Householder reflector BHR{n) (51), the number of nonzero components of y = BJ^J^^^^X can be easily computed as Co(BHR(n)|A:)=£;||y||o = n. Next, let US consider the case 0 < p < 1. Since n > 2, we have
Cp (BH«( ) I X)= EWr,= (^i-iy + in-l)(J^J. Let us now define the following function: s^{x) 4 ( 1 - x)" + 0
- l ) x" = (1 - xf - ^’’ +
^ .
where 0 < x = 2/n < 1. Taking the derivative with respect to x, we have
for 0 < X < 1 and 0 < p < 1. Therefore, in this interval, Sp{x)is monotonically decreasing, and the decisive term for the sparsity measure Cp is 2/x^~’’. Therefore, we have lim Cp (B//R(n) I X) = lim Sp{x)= oo for 0 < p < 1. x)
If p = 1, then si(x) = (1
X -h 2 = 3
2x. Hence, we have
lim Ci (BHRin)I X) = lim si(x) = 3. n o o
^
^ ^’
^
xiO
This completes the proof. 9.6.7 Proof of Corollary 9.5.5
We now consider the mutual information of the spike process under the LSDB pair (5.2) and (5.3) in Theorem 9.5.4. Using this analysis LSDB, the mutual information of
248
SPARSITY VS. STATISTICAL INDEPENDENCE n
IiY) = -H(X) + Y,H{Y,) t =l
= -logn + ( n - l ) / f i j = - l o g n + (n - 1) logn
l o g ( n- i;
= (n - 2) logn - i ! L _ i L iog(n - 1). n
(6.13)
Let h{n) denote the last expression in (6.13). Note that h{2) = 0, i.e., we can achieve the true independence forn = 2. If n > 2, this function is strictly positive and monotonically increasing. By expanding the natural logarithm version of /i(x), we have In2 X h{x) = {x-2)\nx-
i ^ - Z i L \n(x - 1) X
= {x-2)\nx-
(x-2-h-^
A nx + l n U - i j j
= (x-2)lnx-(.-2 + i ) ( l n x - i - ^ + o ( ^ ) )
_
inx _ ^ X
2x
n\ \xy
In other words, we have established
Hence we have lim / (B-}(.X]
= - i - = loge « 1.4427.
Therefore, for n > 2, there is no invertible linear transformation that gives truly inde› pendent coordinates for the spike process. As for the orthonormal case, using (6.11), we have
Now, it is eglsy to see lirn^I (^B^^r^)X)= l o g e. This completes the proof of Corollary 9.5.5.
Q
9.7 DISCUSSION In general, sparsity and statistical independence are two completely different concepts as an adaptive basis selection criterion, as demonstrated by the rotations of the 2D uniform distribution in Section 9.4 For the spike process, however, we showed that the BSB and the LSDB can coincide (i.e., the standard basis) if we restrict our basis search to 0(n)
DISCUSSION
249
with n > 5. However, we also showed that the standard basis is not the only LSDB in this case. To our surprise, there exists another orthonormal basis (5.1) representing the Householder reflector, which attains exactly the same level of the statistical dependence as the standard basis, if the statistical dependence is quantified by the mutual information or equivalently by the total coordinate-wise entropy CH defined in (3.3). Yet this LSDB does not sparsify the process at all if we measure the sparsity by the expected i’^ norm Cp defined in (3.1) where 0 < p < 1. It is also interesting to note that this Householder refiector approaches to the standard basis as n > oo. Furthermore, if we extend our basis search to SL’^(n, R) or GL(n, R), then the LSDB and the BSB cannot coincide. What do these results and the effort to prove these theorems suggest? First, it is clear that proving theorems on the LSDB and computing it for more complicated stochastic processes would be much more difficult than the BSB. To deal with statistical dependency, we need to consider the probability law of the underlying process (e.g., entropy or the marginal pdfs) expficitly. On the other hand, the sparsity criterion does not require such expUcit information. In fact, one can even find the BSB for each reahzation rather than for the whole realizations, which is impossible for the LSDB. see Saito et al. [23], [22] for further information about this issue. Second, it is now clear that both criteria prefer sharply concentrated (i.e., peaky) marginal distributions. There is, however, a fundamental difference: the sensitivity on the location (mean) of the marginal pdfs. The Shannon entropy is location invariant, i.e., its value does not change regardless of where the mean of the distribution is located, whereas the expected F norm is very sensitive to the mean. This is one of the reasons why the LSDB is non-unique and different from the BSB as shown in Theorems 9.5.3 and 9.5.4. Third, the LSDB unfortunately cannot tell how close it is to the true statistical independence; it can only tell that it is the best one (i.e., the closest one to the statistical independence) among the given set of possible bases. In order to quantify the absolute statistical dependence, we need to estimate the true high-dimensional entropy of the original process, i / ( X ), which is an extremely difficult task in general. We would like to note, however, a recent attempt to estimate the high-dimensional entropy of the process by Hero and Michel [12], which uses the minimum spanning trees of the input data and does not require us to estimate the pdf of the process. We feel that this type of techniques will help assessing the absolute statistical dependence of the process under the LSDB coordinates. Then, why the sparse basis of Olshausen and Field and the ICA basis of Bell and Sejnowski were more or less the same? Our interpretation to this phenomenon is the following. First of all, both teams applied their algorithms to the natural scene image patches after essentially centering and sphering them. Hence there is no location sensi› tivity problem of the BSB and the LSDB as we described above (although Olshausen and Field used the cost YT^^i E{1 -f- Y?) instead of YJ^^^ E\Yi\^ and Bell and Sejnowski used their "infomax" algorithm rather than directly minimizing the mutual information). This implies that these two algorithms both prefer the basis that makes the input image patches sharply concentrated around the origin. Second, the "edge-detecting" basis func› tions they obtained essentially convert an input image patch to a spike or spike-like image. In other words, in our opinion, the image patch size such as 16 x 16 pixels were crucial in their experiments, as Donoho and Flesia also observed [9]. Since those image patches are of small size, they tend to have simpler image contents such as simple oriented edges. It seems to us that if their algorithms were computationally feasible to accept image patches of larger size such as 64 x 64 or 128 x 128, both the BSB and the LSDB would be
250
SPARSITY VS. STATISTICAL INDEPENDENCE
very different from such simple "edge-detecting" basis functions. These large size image patches (due to rich scene variations and contents in the patches of these sizes) cannot be converted to spikes by those simple basis functions. See also Remark 5 about this viewpoint. These observations, therefore, suggest that the pursuit of sparse representations should be encouraged rather than that of statistically independent representations, if we believe that mammalian vision systems were evolved and developed by the principle of data compression. This is also the viewpoint indicated by Donoho [8]. However, this does not mean to downgrade the importance of the statistical independence altogether. If we want to separate mixed signals or to build empirical models of stochastic processes for simulation purposes, then pursuing the statistical independence should be encouraged, and we expect to see further interplay between these two criteria. Finally, there are a few interesting generahzations of the spike process, which need to be addressed in the near future. One is the spike process with varying amplitude. The spike process whose amplitude obeys the normal distribution was treated by Donoho et al. [10] to demonstrate the superiority of the non-Gaussian coding using spike location information over the Gaussian-KLB coding (see also a recent follow-up article by Weidmann and Vetterli [25]) . We have started investigating this "generalized spike process" and have succeeded in obtaining the same result for the BSB as the simple spike process dealt in this paper, but the different results for the KLB and the LSDB, which will be reported elsewhere [22]. The other generalization is to randomly throw in multiple spikes to a single realization. If one throws in more and more spikes to one realization, the standard basis is getting worse in terms of sparsity. It will be an interesting exercise to consider the BSB and the LSDB for such situations. Except in very special circumstances, it would be extremely difficult to find the BSB of a complicated stochastic process (e.g., natural scene images) that truly converts its realizations to the simple spike process. More likely, a theoretically and computationally feasible basis that sparsifies the reahzations of a complicated process well (e.g., curvelets for the natural scene images [9]) may generate expansion coefficients that can be viewed as an amphtude-varying multiple spike process. In order to tackle this scenario, we certainly need to: 1) develop such a basis adapted to a specific stochastic process; and 2) deepen our understanding of the amplitude-varying multiple spike process. There is no doubt that these pursuits force us to explore the territory "beyond wavelets". ACKNOWLEDGEMENT The second author (N.S.) would like to thank Dr. Jean-Marie Aubry (Universite Paris XII) for his checking the proof of Theorem 9.5.1. This research was partially supported by NSF DMS-99-73032, DMS-99-78321, and ONR YIP N00014-00-1-0469.
9.8 APPENDICES 9.8.1 Appendix A: Proof of Lemma 9.6.1 First we need to show another lemma as follows:
251
APPENDICES Lemm a 9.8.1
Let P2 > Pi > I be positive integers such that pi +P2 < n. Then P i l o g Pl + P i l o g P i < P l – Pi n n n n n
n
n
\nj
where f is defined in (6.7). Proof The left-hand side of the inequality can be written as Pl log Pi + ^ log Pi = ( P l – P i ) f - P i - l o g Pi + ^ - l o g 2 i n n n n n \ n / [ p i + p2 T^ P I + P2
logPi–Pi +
"’ log P’ Pi + P2 Pi + P2
+ Pi ^+ P2
log-^i^ Pi + P2 J
=(^)'-(^)-('-4^)h(sT7;)]
(8.1)
However, it is clear that
1 1 1 Pi > n 2 pi + P2 Pi + P2 From the monotonicity of f{x) for x [0,1/2] , we deduce
which we can rewrite as
This inequality, nonnegativity of / , and the assumption of this lemma yields \
n
J [ ’ \p1-\-p2J\
~
n
\nj
This inequaUty combined with (8.1) completes the proof of Lemma 9.8.1. Coming back to the proof of Lemma 9.6.1, we now use induction as follows. fc = 3: Since ai -f a2 < n, we can use Lemma 9.8.1 to assert Oil , cti a2 , OL2 ^ OLi^-a2 , log h log < log n n n n n
ai + a2 n n
\nj
Therefore,
V ^ log ^ < ^ log ^ + SliJ:^ log ^ i – ^ - 2 / f i ) ^-^ n
n
n
n
n
n
n
\nj
= 2 i l o g^ + ( l - 2 2 ) i o g ( i - ^ ) - 2 / ( i ) n
n
\
n J
\ n /
n
\nj
\
n J
n
\nj
We used the fact X2^=i " i = n to derive the equality in the second Une of the above expression. Since aj > 1 for j 1,2,3, we must have (n l ) / n > OL^jn > 1/n. Consid› ering the symmetry of f{x) around x = 1/2 and its behavior, we can deduce that
252
SPARSITY VS. STATISTICAL INDEPENDENCE
This nails down the case k = 3. A: =»fc4-1: Let us demonstrate that, assuming that the formula is true for A; > 3, it is still true for /c + 1. We can decompose the sum Yl^tl ^ log ^ in the following way:
y - ^’ ^l olog g 22i i = = 2i–iiog2–– 2i–i log 2i–ii ++£:iiog^ ^ log ^ ++y :V2 i l o g ^ . ^-^ n
n
n
n
n
n
--^^ n
n
(8.2)
But once again, since ak 4- otk-^-i < n, we can use Lemma 9.8.1 to reach
n
log
n
^ +
n
log
n
3. Consider now the off-diagonal entry of U^U^ for example, {U’^U)i,2 =0 = aibi + 0262 4- 0363 + 64 + + ^n, {U’^U)i,n =0 = aibi + 0262 -^bj-^bl-h-’-hbl Inserting (8.6) into these, we get CLibi + 0262
n
1
-f a202 H
aibi
n n-2 n
= 0
^
= 0.
This is a contradiction (i.e., aib\ -h 0262 cannot have two different values). Therefore U cannot have two rows where the distinguishing entries ai, aj share the same column index as (8.4). It is clear that we cannot have more than two such rows. Therefore, U must be of the form (8.3). Now, let us compute the entries of (8.3). The normalization condition (8.5) still holds. Computing the diagonal entries of U^U = In, we have n
{U’^U)k,k= l=al-\-
Yl
^i
for A : = l , . . . , n.
(8.7)
Combining (8.5) and (8.7), we have: n
nbl = y ^ b^ for /c = 1 , . . ., n. This implies that bl = - = b^. Then, from the normaUzation condition (8.5), we must = an also. Consider now the off-diagonal entry of U^U: have al = iU’^U)i,2 = 0 = aibi -h 0262 4- (n - 2)6^ Now, we must have 62 = 61 or 62 = 61 . So, the above equation can be written as
255
REFERENCES {U^U)i,2 = 0 = ai6i – a26i 4- (n - 2)6?.
This implies that either 6i = 0 or ai – a2 4- (n 2)6i = 0. 6i = 0 leads to 6^ = 0 and ak = – 1 for A: = 1 , . .. ,n, i.e., the standard basis. Let us consider now the other case, i.e., ai – a2 + (n 2)6i = 0. Since 02 = ai or a2 = ai , these lead to either 61 == 0 or 2ai + (n 2)61 = 0. The former case has been already treated. Thus, let us proceed the latter case. From this, we have ai = ( l - ^ ) b i . (8.8) Inserting this into (8.5), we have Consequently, ai = 1 - (n - 1)
n^ Because (8.8) is also true for all k, i.e., ak = {I Q-k = –
, Ok
2
=
. \ n J n/2)hk, k = 1 , . . ., n, we have:
for A: = 1,
(8.9)
. , n.
This means that the matrix U must be of the following form or its permuted and signflipped versions: .-2 -2
-2
-2
n- 2 * .[ / =
^O(n)
. n - 2 -2 -2
-2 n - 2
1 n
"n-2
-2
-2
n-2
-2
-2 -2
-2 n - 2
It turns out that this is symmetric, so we have B = U. This completes the proof of Lemma 9.6.5. REFERENCES [1] A. J. Bell and T. J. Sejnowski. The ’independent components’ of natural scenes are edge filters. Vision Research, 37:3327-3338, 1997. [2] J. F. Cardoso. An efficient batch algorithm: JADE. h t t p : / / s i g . e n s t . f r/ cardoso/guidesepsou.html . See also h t t p : / / t s i . e n s t . f r/ cardoso/icacentral/index.htm l for collections of contributed ICA software. [3] J.-F. Cardoso. High-order contrasts for independent component analysis. Neural Com› putation, 11:157-192, 1999. [4] R. R. Coifman and M. V. Wickerhauser. Entropy-based algorithms for best basis selection. IEEE Trans. Inform. Theory, 38(2):713-719, Mar. 1992. [5] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley Interscience, New York, 1991. [6] M. M. Day. The spaces L^ with 0 < p < 1. Bull. Amer. Math. Soc, 46:816-823, 1940.
256
REFERENCES
[7] D. L. Donoho. On minimum entropy segmentation. In C. K. Chui, L. Montefusco, and L. Puccio, editors, Wavelets: Theory, Algorithms, and Applications, pages 233-269. Academic Press, San Diego, 1994. [8] D. L. Donoho. Sparse components analysis and optimal atomic decomposition. Con› structive Approximation, 17:353-382, 2001. [9] D. L. Donoho and A. G. Flesia. Can recent innovations in harmonic analysis ’explain’ key findings in natural image statistics? Network: Comput. Neural Syst, 12(3):371393, 2001. [10] D. L. Donoho, M. VetterH, R. A. DeVore, and I. Daubechies. Data compression and harmonic analysis. IEEE Trans. Inform. Theory, 44(6):2435-2476, 1998. Invited paper. [11] P. Hall and S. C. Morton. On the estimation of entropy. Ann. Inst. Statist. Math., 45(l):69-88, 1993. [12] A. O. Hero and O. J. J. Michel. Asymptotic theory of greedy approximations to minimal /c-point random graphs. IEEE Trans. Inform. Theory, 45(6): 1921-1938, 1999. [13] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge Univ. Press, 1985. [14] A. Hyvarinen. The FastICA package for MATLAB. h t t p: //wiTit. c i s . hut. f i / p ro j e c t s / i c a /f a s t i c a /. [15] J.-J. Lin, N. Saito, and R. A. Levine. An iterative nonlinear Gaussianization algorithm for resampling dependent components. In P. Pajunen and J. Karhunen, editors, Proc. 2nd International Workshop on Independent Component Analysis and Blind Signal Separation, pages 245-250. IEEE, 2000. June 19-22, 2000, Helsinki, Finland. [16] J.-J. Lin, N. Saito, and R. A. Levine. An iterative nonlinear Gaussianization algorithm for image simulation and synthesis. Technical report, Dept. Math., Univ. CaHfornia, Davis, 2001. submitted for pubhcation. [17] B. A. Olshausen. Sparse coding simulation software. http://redwood.ucdavis.edu/bruno/sparsenet.html . [18] B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607-609, 1996. [19] B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: A strategy employed by VI? Vision Research, 37:3311-3325, 1997. [20] N. Saito. Local feature extraction and its applications using a hbrary of bases. In R. Coifman, editor, Topics in Analysis and Its Applications: Selected Theses, pages 269-451. World Scientific Pub. Co., Singapore, 2000. [21] N. Saito. Image approximation and modeling via least statistically dependent bases. Pattern Recognition, 34:1765-1784, 2001. [22] N. Saito. The generahzed spike process, sparsity, and statistical independence. In D. Rockmore and D. Healy, Jr., editors. Modem Signal Processing, MSRI Publica› tions, Cambridge University Press, 2003. To appear. [23] N. Saito, B. M. Larson, and B. Benichou. Sparsity and statistical independence from a best-basis viewpoint. In A. Aldroubi, A. F. Laine, and M. A. Unser, editors, Wavelet Applications in Signal and Image Processing VIII, volume Proc. SPIE 4119, pages 474-486, 2000. Invited paper. [24] J. H. van Hateren and A. van der Schaaf. Independent component filters of natu› ral images compared with simple cells in primary visual cortex. Proc. Royal Soc. London, Ser. B, 265:359-366, 1998. [25] C. Weidmann and M. Vetterli. Rate distortion behavior of sparse sources. Submitted to IEEE Trans. Info. Theory, Oct. 2001.
REFERENCES
257
[26] M. V. Wickerhauser. Adapted Wavelet Analysis from Theory to Software. A K Peters, Ltd., Wellesley, MA, 1994. with diskette.
This Page Intentionally Left Blank
Beyond Wavelets G. V. Welland (Editor) ' 2003 Elsevier Science (USA) All rights reserve d
10 NONUNIFORM FILTER BANKS: NEW RESULTS AND OPEN PROBLEMS SONY AKKARAKARAN AND P.P. VAIDYANATHAN Department of Electrical Engineering 136-93 California Institute of Technology, Pasadena, CA 91125 Sony'systems, caltech. edu ppvnath@systems, caltech. edu
Abstrac t A nonuniform filter bank (FB) is one whose channel decimation rates need not all be equal. While the theory and design of uniform FBs is a very well developed subject, there are several interesting open issues in the area of nonuniform FBs. Most nonuniform FB designs either result in approx› imate or near-perfect reconstruction, or involve cascading uniform FBs in tree structures. This leaves unanswered many important theoretical issues involved in obtaining perfect reconstruction (PR) in nonuniform FBs. The purpose of this paper is to address these issues. We only study FBs with integer decimation rates, as FBs with rational decimators can also be shown to be transformable to them. The central problem of interest is as follows: Let 5 be a set of positive integers obeying maximal decimation (i.e., with reciprocals summing to unity). Find necessary and sufficient conditions on S for existence of a PRFB belonging to some FB class C and using S as its set of decimators. The class C is defined by some constraint on the fil› ters of its constituent FBs; examples of interest are the class of all rational FBs (FBs with rational filters), FIR FBs, orthonormal FBs, etc. A condi› tion that immediately suggests itself is the one stating that the integers be arrangeable in a tree so that the required PRFB can be built by cascading uniform PRFBs in a tree structure. However, this condition, while clearly sufficient, is not necessary for many classes C of interest. In fact there are sets violating it which can be used to build delay-chain PRFBs (in which all filters are delays). Many of our new results focus on the class of rational FBs. We strengthen considerably the known necessary conditions in this case, and provide new ones. The basic problem remains unresolved nec› essary and sufficient conditions are still unknown, however we believe our 259
260
NONUNIFORM FILTER BANKS
subbands
input
U{z)
i^of
t^iof
Fo{z)
1 | / (^)
1 ^ll^i l
^|i’’’^i|
1 ^ u ^)
x(n)
L
U
i{z)
FM
analysis filters
t ^M i
i’^M \
output A
x{n)
1
J>
i{z)
synthesis filters
Figure 10.1. Nonuniform filter bank x{n) ,
^ - n
[*1 So{z) -*j4JW|
*|tAt-* 1 Qo{z) \
Si{z) 1- ^ y j w )
*jtjwt- ^ QiW J
\
^^
t,
(a)
x(n) . - If (b)
-1?
^M-iW
r-um
;M
-t4
tAft-nQw-iW HJ ^
IM E(^)
subbands
x{n)
tM R(z)
tM
x{n)
4^
LrAiJ tM
im
analysis DolvDhase matrix
synthesis DolvDhase matrix
Figure 10.2. Uniform maximally decimated filter bank, (a) Showing analysis and synthesis filters, (b) Polyphase representation
work is an important step towards a full solution. We conclude by listing all known conditions, studying their inter-relationship, and pointing out several open problems. 10.1 I N T R O D U C T I O N
Figure 1 shows an M-channel nonuniform filter bank (FB). The FB is said to be maxi› mally decimated if the channel decimation rates TIJ are integers satisfying M-l
(maximal decimation condition)
(1.1)
Figure 2a shows a maximally decimated uniform FB, which is a special case of Fig. 1 where m = M for all i. For this case, the system can be equivalently redrawn using the analysis and synthesis polyphase matrices E(z) and R(2), as shown in Fig. 2b. The
INTRODUCTION
261
condition for perfect reconstruction (PR) is then easily expressed as Il{z) = E~ (z). Due to this, the theory and design of uniform PRFBs is an extremely well developed subject. Numerous parameterization results list all possible M-channel uniform PRFBs with various sets of properties such as paraunitariness, FIR filters, linear phase filters, etc. In contrast, several issues involved in achieving PR in nonuniform FBs remain unre› solved. For example, given a general set of positive integers Ui obeying maximal decima› tion (1.1), how do we determine whether or not there exists a rational PRFB (i.e., one with rational filters) using the rn as decimators? If the rn are all equal, clearly such a FB exists (as it is then uniform). Similarly, it also exists if the rii are arrangeable in a tree so that such a PRFB can be built by cascading uniform PRFBs in a tree structure (Sec› tion 4.1). This is the most common approach to achieving PR in nonuniform FBs. In particular, it is used to build the FBs that implement the dyadic wavelet transforms [11], [12]: Such a FB has a dyadic decimator-set, i.e., one of form {2, 2^ . . ., 2 ’ ’ "\ 2’’, 2’’} for some integer r > 1, and is built using a dyadic tree (i.e., one built from a cascade of r 2-channel FBs). However, there are sets of decimators rn that cannot be arranged in a tree as described above, and yet permit existence of rational PRFBs in which in fact all filters are delays. Further, even if the decimators are arrangeable in a tree, it is possible that there are PRFBs using those decimators that cannot be realized using the tree. These facts will be discussed in detail with examples in Section 4.2. Thus tree structures of uniform PRFBs are far from being a full solution to the PR problem for nonuniform FBs. Derivability of decimators from a tree (as described above) is a sufficient condition for existence of rational PRFBs using the decimators. There are certain other conditions that are known to be necessary, e.g., there are no rational PRFBs using the decimatorset {2,3,6} because no two decimators of such a FB can be coprime (Section 6.1, [4]). However, a condition that is both necessary and sufficient remains unknown. The present work studies this and related problems. An important part of our study is to significantly improve upon the known conditions, i.e., to derive new ones, strengthen necessary con› ditions and weaken sufficient ones. Another contribution is to study the conditions for reducibility of PRFBs to tree structures. For example, it has been shown [3], [10] that all rational PRFBs with dyadic decimator-sets must be derivable from dyadic trees. In Section 7, we will considerably generaUze this result. Although these problems in their full generality remain unresolved, we believe the present work to be an important step towards a complete understanding of this subject an area so rich in open problems even after over two decades of filter bank research.
10.1.1 Relevant earlier work Trees of uniform FBs, and near-PR designs: A very common approach to nonuniform PRFB design is to cascade uniform PRFBs in a tree-structure, e.g., as is done to imple› ment dyadic wavelet transforms [11], [12]. However, as stated earlier, there are nonuniform PRFBs that cannot be built in this manner. Many works deal with approximate recon› struction (or ’near-PR’) nonuniform FBs, e.g., the frequency domain approaches of Li et al. [7], the time domain methods of Nayebi et al. [8], and other references therein. These are very useful from a practical standpoint, giving FBs with excellent filter responses and low aliasing distortions. However, they do not address the many theoretical issues involved in obtaining exact reconstruction.
262
NONUNIFORM FILTER BANKS
» inpu t x(n)
»
-ii^M~HTTl-ii9l
it?Ml£wiM3"
s—
subban d a(n)
^}*0—"ig—[^TH •B*1l9l-7*S*Ep - i |
channe l outpu V (n)
"B7Mip|--*fr9hi~p^ ^
H~BrHip|~MtgM~p^H
•B*ii^TlliM^3*^
thif> arfe identit y matri x > channelw ^ ith integer decimatio n q ~
c)
Ci(2)= 2*"iii(2) (Ro{z\ .., B^-iiz)^: svnt.bftsisa
olvrihasw
Di(z^ p-th order ertnr o W
fzl
,^ up _ vq^=iI
z-’^Ei{z)
(^o{^), ,^p-i(2)) : P-th order analvsis ) olvnbas w ector of F(z^
Figure 10.3. FB with rational decimators. (a) Single channel with decimator q/p. (b) Equivalent system of p channels with decimator q. (c) A possible set of filter choices ensuring the equivalence
FBs with fractional decimators: Kovacevic and Vetterli have studied a more general sys› tem [6] where each channel of the FB has a decimation rate that is fractional, i.e., of form q/p where p, q are coprime positive integers. Such a channel, shown in Fig. 3a, is completely equivalent to the system of Fig. 3b. By this we mean that given any one of these systems, we can choose the filters in the other so that the same input x{n) for both systems always produces the same signals s{n) and y{n) as shown. A choice ensuring this is shown in Fig. 3c (polyphase vectors are defined in Section 1.3). The equivalence under this choice is provable using the discussion on fractional decimation in [11, Section 4.3.3]. If the Ai{z) differ from the special choice of Fig. 3c, we can replace them by this choice and modify the Ci{z) so that the signal s{n) is unaffected. This is done by performing a p-th order polyphase decomposition of the Ai{z), using the fact that p, i; are coprime, and moving the resulting polyphase matrix to the left. A similar comment holds for the Bi{z). From the equivalence shown in Fig. 3, we conclude that the PR problems for integerdecimated and rationally decimated FBs are fully equivalent. Another concern besides PR in rationally decimated FBs is the nature of their spectral analysis: Does a subband represent a contiguous portion of the input spectrum, or do the decimators and expanders in Fig. 3b cause it to contain separate parts, possibly mirrored and shuffled in order? This issue is studied in [6].^ However, as far as the PR problem is concerned, it is enough to study FBs with integer decimators, and that is the approach we shall use. Other more general multirate structures: As we will see in Section 2.2, nonuniform PRFBs are hard to design because of certain structural constraints that their associated polyphase matrices must obey. This is the origin of the central problem studied in our work: These structural constraints cannot be obeyed by rational FBs unless their decimators satisfy various conditions, which we aim to characterize. However, the constraints vanish if we It become s less seriou s if we allow modulator s at appropriat e point s withi n the FB.
INTRODUCTION
263
use more general systems in the channels of the FB, e.g., if the filters are allowed to be periodically time-varying (Section 2.3). Chen and Qiu [2] and Shenoy [9] have studied multirate and FB design using such more general structures. The PR design then allows as much or even more freedom than that in the well-studied PR designs for the tradi› tional uniform FB of Fig. 2. Our work is restricted to the usual nonuniform FB structure of Fig. 1 that does not use such generaUzed multirate structures. PR conditions on decimators, and reducibility to tree structures: A necessary condition on the (integer) decimators for PR with rational FBs was first stated in [5]. Called the compatibiUty condition, it was generalized by Djokovic and Vaidyanathan [4], who also pointed out another such condition (pairwise noncoprimeness). We will considerably generalize these conditions. Another related work has involved showing derivability of FBs using dyadic decimator-sets from dyadic trees [10], [3], as explained earlier. These results too will be significantly strengthened. Among various more general situations studied include certain non-dyadic sets, unconstrained FBs, and tree structures whose constituent FBs need not be uniform. 10.1.2 Outline
Section 2 reviews the PR conditions on the filters of uniform FBs, and their generalization to nonuniform FBs, derivable using a transformation of nonuniform FBs to equivalent uniform ones. It shows how in spite of this transform, the nonuniform PRFB design does not reduce to a uniform PR design, unless the filters of the nonuniform FB are allowed to be time varying. In Section 3 we formally state the central problem, and study its solution for classes of unconstrained FBs (where the filters of the FB have no constraints such as rationality). Section 4 analyzes the role of tree structures in the study of the main problem. It shows how tree structures of uniform PRFBs do not provide a full solution (Section 4.2), and how trees can be used to improve upon known PR conditions on the decimators (Section 4.3). Section 5 solves the central problem of the paper for the class of delay-chains (FBs in which all filters are delays): It states the necessary and sufficient condition for a set of decimators to be usable to build a PR delay-chain, and presents algorithms to test the condition. Subsequent sections focus mainly on the class of rational FBs. Section 6 states the earlier known necessary conditions on decimators of rational PRFBs, and generalizes them in several ways. Section 7 generalizes [10], [3] by finding weaker conditions on decimators under which all PRFBs using them can be derived from certain tree structures. Section 8 summarizes all known necessary PR conditions on the decimators, and studies their inter-relationships. We conclude by noting many open problems in the area. 10.1.3 Notations, definitions and assumptions
Standard notation: Superscripts (*) and (^) denote the complex conjugate and matrix (or vector) transpose respectively. We use boldface letters for matrices and vectors. We use lowercase letters for discrete sequences and uppercase letters for Fourier and ztransforms. Sometimes lowercase boldface letters are used for vector ^-transforms. For sequences h(n) without z-transforms that are rational functions of z, the notation H(2;) is an abbreviation for the Fourier transform H(e^’*’). For LTI transfer matrices H(2:), the ’paraconjugate ’ H*^(l/z*) is denoted by H(z). The L-th root of unity, e"^^’’/^ is denoted by VTL, or by W if the subscript value L is understood. The Kronecker delta function is denoted by S (6(0)= 1 and 6{x) = 0 if x ^"^ 0).
264
NONUNIFORM FILTER BANKS
Polyphase concepts [11]: The M-fold decimator and expander are represented by JM and I M respectively, as in Fig. 1. Given a sequence h(n) with z-transform H(2;), its M-fold decimated version is the sequence g(n) = h(Mn), with >2;-transform denoted by (H(2;)) IM- Likewise, the M-fold expanded version of h(n) is h(n/M) if n/M is an integer
!
0
otherwise
with z-transform denoted by {U{z)) TM- With W = e-^^Tr/M^ ^^ j ^ ^ ^ ^ M-l
( H W) iM = - ^ 5 ] H(z’/^W^’),
and
(H(z)) TM = HCz*’)
(1.2)
1=0
Given filters Ho(z),Hi{z),..., HN~I(Z), their M-th order analysis polyphase matrix E(z) is the N X M matrix defined by h(z) ^ (//o(^), / / i ( z ) , . . ., HN-i{z)f
= E(z^)d(z),
where d(z) = ( l , z ~ \ . . . , z - ( ^ - ^ ) ) ^ is the length M delay vector. Thus, E{z) has i-th column (z*h(z)) | M . Similarly, the M-th order synthesis polyphase matrix of the filters Fo(z), F i ( z ) , . . ., FN-I{Z) is the M x AT matrix R(z) obeying f (z) ^ (Fo{z), F i ( z ) , . . ., F ^ - i ( z )) = d ( z ) R ( z ^ ). Thus the i-th row of R(z) is (z~*f (z)) [M-If the Hi{z), Fi{z) are respectively the analysis and synthesis filters of a FB, then E(z),R(z) are respectively said to be the M-th order analysis and synthesis polyphase matrices of the FB. An easily proved result that we often use is the following: Lemm a 1: Polyphas e lemma . Let e(z), r(z) be the M-th order analysis and synthesis polyphase matrices of the filters H{z) and F{z) respectively. Thus e(z) is a row vector and r(z) is a column vector. Then, e(z)r(z) = {H{z)F{z))JM
( L 3)
Maodmal decimation: All FBs studied in the paper are maximall y decimate d with intege r decimatio n rates, even if this is not explicitly stated. Similarly, references to a ’set of decimators’ (or ’decimator-set’) always implicitly mean a set of positive integers (not necessarily distinct) obeying (1.1). 10.2 BACKGROUND: EQUIVALENT UNIFORM FBS; PR EQUATIONS The main focus of the paper is to find conditions on the decimators that permit existence of various types of nonuniform perfect reconstruction (PR) FBs with those decimators. To do this, we must first know what conditions on the filters of the FB guarantee the PR property. This section begins by reviewing the PR conditions for uniform FBs. We then review the transformation of a nonuniform FB with decimators Ui to an equivalent uniform FB with a decimation rate L that is a multiple of all the TIJ. This yields the
BACKGROUND: EQUIVALENT UNIFORM FBS; PR EQUATIONS
265
general PR conditions for nonuniform FBs, that will be used in all the later sections. In spite of the possible transformation to uniform FBs, the nonuniform PRFB design problem by no means reduces to the uniform PR design. However, such a reduction does occur if the nonuniform FB is allowed to have filters that are LPTV(L) (linear periodically time varying with period L) instead of LTI. With LTI filters, achieving PR is tougher, and is the subject of the later sections.
10.2.1 PR for uniform FBs, and the nonuniform to uniform transform
For the uniform FB of Fig. 2, the problem of achieving PR is very well understood. The following are three equivalent necessary and sufficient conditions on the filters for PR in this case [11]: 1 Biorthogonality condition. (Si{z)Qj{z))| M = S{i 2 AC matrix formulation. Let W = g-^^Tr/M jj^^j^^ Ao{z) Ai(z)
I M - I (4
Soiz) A
.
_So{zW^-’).
M
Qo{z)
SM-I{Z)
So{zW) . .
j).
SM-I{ZW)
. 5 M - I ( ^ W ^ ^ - ’ ).
Qi{z)
0 =
Q M - I W.
(2.1)
_0
alias cancellation (AC) matrix S{z) For any uniform FB (PR or otherwise), the Ai{z) defined above are called the ’aliasing gains’. The PR condition (2.1) thus specifies all aliasing gains. It arises from the frequency domain relation between the output X{z) and input X{z) of any uniform FB (PR or otherwise): M-\
^(^) = Xf E Mz)x(zw’)
(2.2)
3 Polyphase formulation. If E(2^),R(z) are respectively the M-th order analysis and synthesis polyphase matrices of the FB (as in Fig. 2b), then R(2:) = E~^(z). That this is equivalent to the biorthogonality condition stated earlier follows from the polyphase lemma (Section 1.3), which shows that the ij-th entry of E(z)R(2;) is precisely the quantity {Si{z)Qj{z))[M occurring in the biorthogonaUt y condition. Now any nonuniform FB (as in Fig. 1) is transformable into a uniform FB, which we will call its equivalent uniform FB [1], [4], [5], [6]. This transform is described by Fig. 4, which shows how a single channel with decimator Uk is replaceable by pk channels with decimators L = UkPk-Repeating this process on all channels of the nonuniform FB, with L as any common multiple of all its decimators Ui (usually L = Icmjrii}), yields a uniform L-channel FB. The nonuniform FB has PR if and only if the equivalent uniform FB has PR. The filters in the uniform FB are various delayed versions of those in the nonuniform one. Inserting these relations between the filters into the PR conditions for uniform FBs gives the PR conditions for nonuniform FBs. These conditions, described next, generalize the uniform FB PR conditions, and are heavily used later.
266
NONUNIFORM FILTER BANKS
Hk(z)
inii k-tk
xibbaaA nJonunifor m FB
hH4.Lh-HtPfc
Hk
^-'t
z-^'fffchH iLh-Htpfch*t A z
-if
7" r-{Pk-l)nk
Hk
iPfcM t i n
F^
iPk\-ntiri
^"’^* o o
V
t ^ - ^ ^ I lihntP, i ri—»-\fnJr^ thi?) arfe
Fk{z)
rik
'iA
«
1
identity matrix
f-
J
Figure 10.4. Transforming a nonuniform FB to an equivalent uniform FB
10.2.2 The general PR conditions for nonuniform FBs
Biorthogonalit y condition . The uniform FB biorthogonality condition, when appUed to the uniform FB derived from the nonuniform one of Fig. 1, is equivalent to (Hi{z)Fj(z))igcd(ni.nj) = S(i - j)
(biorthogonality condition)
(2.3)
This has been observed earlier [4], [10]. Appendix A contains a proof for easy refer› ence. The condition gets its name from its time-domain equivalent. To describe this, let ’^t(’^)>/t(^) respectively be the impulse responses of Hi{z),Fi{z). We define two sets of sequences {l^ikin) = h*{kni n)
| z = 0 , 1 , . . ., M - 1, /: = any integer}
(2.4)
{’qji{n)= fj(n - luj)
| j = 0 , 1 , . . ., M - 1, / = any integer}
(2.5)
The action of the FB on its input x(n) is now elegantly expressible using these sequences: The j - th subband signal Cj{)and the FB output x()are given by CX3
Cj{l) =
(^x{n),fiji{n)j
=
^
x(n)hj{lnj-n),
and
n= o o M
\
c»
Af
1
oo
^ W = Y^ Y^ Cj(l)rjji{n)= Yl Jl c , ( 0 / i ( n - / n ,) j=0 Z=-oo
jf=0
l=-oo
Here (a{n),^(^)) = Z)n a(^)^*(^) is the inner product of the sequences a{n) and 6(n) (in ,the space of all sequences x{n) for which ^ ^ |2^(’^)|^ is finite). Thus, the FB output x{n) is a linear combination of the sequences from (2.5), using weights Cj{l) that are inner products of the input x{n) with the sequences from (2.4). Thus PR (i.e., x{n) = x{n))is achieved if the two sets (2.4),(2.5) form a biorthogonal system, i.e., if {f^ik{n),T]ji{n)) = 6{i-j)6{k
- I)
This can indeed be shown to be the ’time domain’ equivalent of (2.3).
BACKGROUND: EQUIVALENT UNIFORM FBS; PR EQUATIONS
267
t^6
z-’Ef^ ^
^JPfc-2)nfc
l)nf c
^(|.fc-2)nf c
^
^(Pfc-2)nfc-l
^(J»fc-l)nfc-l
1 ciA
z-’El
z-^^Li
Z-’£?2 n
^s
Figure 10.5. Polyphase matrix structure for uniform FBs derived from nonuniform ones
AC matri x formulatio n [4]. In (2.1), we set M = L,W = e’^^""^^, and the filters as those of the uniform FB derived as in Fig. 4, from the nonuniform FB of Fig. 1. The 2-th row in (2.1) is a sum of filter product terms Sj(zW^)Qj(z). We group terms arising from the k-ih subband in Fig. 1, i.e., those with 5,(2:) = z’^’^^Hkiz) and Qj{z) z^’^’’Fk{z)for / = 0 , 1 , . .. ,pife - 1 where UkPk = L (see Fig. 4). This yields a sum of form Hk{zW’)Fk{z)Aik, wher e Pfc-i
Pfc-i
Aik = Y. ^ " ’ ’ "’ = E e-^’"*’/’’^ = 1=0
1=0
Pk if i is a multiple of pk 0
otherwise
Thus, we can rewrite (2.1) using a new L-row AC matrix H(z) that has only M columns (one for each analysis filter of the nonuniform FB), as follows: Fo(z) , where
[ho(z) . .. hM-i(-z)j AC matrix H(z)
(2.6)
FM-I{Z)
hUz) , and hUz) = [ifi(2) 0...0
h,(2) ^
[hUztV’"*-’)’’’ )
Y
(2.7)
Pi 1 zeros
If Tit = M and pi = l for all i (i.e., if the FB is uniform), the form of H(2;) indeed reduces to that of (2.1). Polyphas e formulation . The PR condition is R(z) = E~^(2;), just as for uniform FBs. However, as the equivalent uniform FB has interdependencies between the filters, its analysis polyphase matrix £(2;) has a special structure [1]: Its rows can be partitioned into groups, where the fc-th group corresponds to the fc-th subband analysis filter Hk(z) in Fig. 1. This group has pk = L/uk rows as shown in Fig. 5. The first row is the L-th order analysis polyphase matrix (vector) of Hk{z). Each subsequent row is formed by shifting length - Uk blocks of the previous row to the right, with the last block multiplied by 2;"^ and circulated back to the left end.^ These rows are the polyphase vectors of filters z~ ’’^^Hk{z) for a = 1,2,... ,pfc 1. Similarly, the synthesis polyphase matrix R(2:) of the equivalent uniform FB has columns arrangeable into groups. The fc-th group has a ^The submatrix of ’Ei(z)shown in Fig. 5 is block pseudocirculant with block size 1 x n^ (generalizing the notion of pseudocirculants [11)).
268
NONUNIFORM FILTER BANKS
form like the transpose of that in Fig. 5, with the E]^{z)replaced by the entries i?f (z) of the L-th order synthesis polyphase vector of the synthesis filter Fk{z), and the z~^ factors replaced by z elements. Th e paraunitar y case. The uniform FB of Fig. 2 is said to be paraunitary (or orthonormal) if E~^(z) = E(z)\ or in other words, if PR is obtained with R(z) = E(z), or equivalently with Qt{z) = Si{z). By generaUzation, the nonuniform FB of Fig. 1 is said to be orthonormal if PR is obtained (i.e., (2.3) is obeyed) with Fi{z) = Hi{z). From the relations between the filters of the nonuniform and the equivalent uniform FB, we see that each of these is paraunitary if and only if the other is. Notice that the two sets of (2.4),(2.5) which form a biorthogonal system in any PRFB, will coincide, hence forming an orthonormal system, if and only if the FB is paraunitary. This is because Fi{z) = Hi{z) is equivalent to rjji{n) = fiji{n) in (2.4),(2.5). A general PRFB that is not necessarily orthonormal is often called a biorthogonal FB, due to the condition (2.3). Two other properties of orthonormal FBs, proved for uniform FBs in [11], are the unit energy and power complementarity properties, stated respectively as ^ r \ H . i e n U
= l,
and ^ ’ ^ . i f f e l
= i
We can prove these for nonuniform FBs using the result for uniform ones and the trans› formation of Fig. 4. 10.2.3 Relation between the nonuniform and uniform PR designs Transforming a nonuniform FB to an equivalent uniform one helps to find the PR condi› tions on its filters. These two FBs also share several properties (i.e., each has the property iff the other does). Examples are PR and paraunitariness; and rationaUty, stability, and FIR nature of filters. However, the equivalent uniform FB does not help in designing nonuniform PRFBs. This is due to its special structure: It has groups of filters that are delayed versions of each other. There are no known uniform PRFB design methods that allow imposition of this structure. Notice that the delayed versions of a filter have the same magnitude response, while uniform PRFB designs usually approximate ideal nonoverlapping analysis filter responses. Most choices of the analysis filters Hi{z) of Fig. 1 yield an equivalent uniform FB with an invertible analysis polyphase matrix ’E,{z).However, this is not sufficient for existence of LTI synthesis filters {Fi{z) of Fig. 1) resulting in PR: For this we further require that the inverse R(z) = E~^{z) have the special structure described in Section 2.2. This added constraint is not always easy to satisfy. If E(2;) is paraunitary, then R(z), being equal to E(2;), automatically has the desired structure, and a nonuniform (paraunitary) PRFB is possible. However, again none of the many known parameterization s of uniform paraunitary FBs [11] allow imposition of the special structure of Section 2.2 that ’E(z) must have in order to represent a nonuniform FB. The structural constraints on E,(z) and R(>2) can however be completely given up if the filters in the nonuniform FB are allowed to be LPTV(L) instead of LTI [1]. This is shown by Fig. 6, in which pk = L/uk channels of a uniform L-channel (maximally decimated) FB are converted into a single channel with decimator Uk. The analysis and synthesis filters in this channel are LPTV(L). The procedure is repeated for each k using disjoint subsets of channels of the uniform FB. Clearly the nonuniform FB has the PR property if and only if the uniform one does. In the rest of the paper, we assume all
269
PROBLEM STATEMENT. AND UNCONSTRAINED FBS Pkchannels f uniform FB this p£irfe
iI3o
LPTV(L) system with pg omponents
identity matrix
?i *
*-(ir]-HTXM
tnJ
fc-th subband in nonuniform FB
^ o
k_
LPTV(L) system with pg omponents
Figure 10.6. Equivalence between uniform FBs and nonuniform FBs with LPTV filters
analysis and synthesis filters of all F Bs to be LTI. T he nonuniform P R design is t h en significantly harder.
10.3 PROBLEM STATEMENT, AND UNCONSTRAINED FBS 10.3.1 Problem statement T he nonuniform perfect reconstruction ( P R) FB design problem in its full generaUty can be stated as follows: 1 C o n d i t i o n s o n d e c i m a t o r s for P R . Given a set of positive integers Ui satisfying t he maximal decimation condition (1.1), find necessary and sufficient conditions on t he rii for existence of a P R FB in some specified class C of FBs, having the Ui as decimators. 2 P a r a m e t e r i z a t i o n o f t h e P R F B s . W h en the n, satisfy such a condition, find all possible P R F Bs in C having Ui as decimators. T he FB class C here is defined by some constraint on the filters of its constituent FBs. I m p o r t a nt examples t h at we will consider are delay-chains (FBs in which all filters are delays), rational F Bs and F IR FBs. Other constraints t h at the class C can impose are realness of filter coefficients, stability of filters, and paraunitariness (or orthonormality). Note t h at in general the class definition does not directly by itself impose any constraint on either the number of channels or t he n a t u re of the decimators in the F B. However, the requirement t h at a FB in the class be maximally decimated and have P R could impose various constraints on these parameters. T he statement of the problem is to characterize (a) the n a t u re of these constraints, and (b) all P R F Bs in C having a general d e c i m a t o rset t h at obeys these constraints. T he solution to t he problem of course depends on the FB class C. It is completely known for delay-chains, b ut unknown for rational FBs. Notice t h at the parameterization problem depends on first finding conditions on the decimators for P R, which can be quite tough in itself. So we will mainly focus on finding conditions for P R. Our aim will be to weaken the sufficient conditions and strengthen necessary ones until we obtain a set of necessary and sufficient conditions (the final goal, which we do not always achieve). We will also derive some results on the parameterization problem, especially in connection with tree structures.
270
NONUNIFORM FILTER BANKS y/ni
y/TlM-l
y/riQ
Hi
Ho 27r no
HM-
27r
27r
Figure 10.7. Ideal contiguous-stacked complex coefficient brickwall FB
Hi
_2L *
Ho
- -
Ho >
H
_2L - ZL T> *
Figure 10.8. Ideal contiguous-stacked real coefficient brickwall FB
10.3.2 FBs with unconstrained complex and real coefficient filters
Let the class C in the above formulation be simply the class of all FBs, with no filter constraints (i.e., allowing ideal brickwall filters etc.). Then a PRFB in C always exists, no matter what the decimators rii are (of course, provided they obey (1.1)). This is because the FB in Fig. 7, with ideal contiguous-stacked brickwall filters, always has PR. In fact it is a paraunitary FB. We will hence exclude this class C from all further discussion. Note that the filters of Fig. 7 always have complex coefficients. Now let C be the class of all real coefficient FBs (i.e., FBs in which all filters have real coefficients). No other constraint is imposed, so the filters could still be ideal. However, it is now more difficult to find conditions on the decimators for existence of PRFBs in C, Taking a cue from Fig. 7, we can examine brickwall FBs, i.e., FBs as in Fig. 1 where the filters Hi{eP’^)have nonoverlapping supports, are constant on their supports and Hi{z) = Fi{z). Since the Hi partition the input spectrum, PR is possible if and only if for each i, the t-th channel perfectly reconstructs all inputs that are bandlimited to the passband of Hi{e’’^). (In fact we then get a paraunitary PRFB, by suitable scaUng of the filters.) This equivalently means that Hi{ p’^) has an aliasfree(ni) support. For the (real coefficient) FB of Fig. 8, the bandpass sampling theorem states that this happens iff the band edges of Hi are at integer multiples of Tr/n, [6]. Thus, the FB of Fig. 8 has PR if and only if 1 1 (3.1) 2. > rii is an integer multiple of for all A; = 0 , 1 , . . ., M ^ ^ ’"nfc+i Thus, a given set of decimators Ui can be used to build a real coefficient PRFB of the form of Fig. 8 if and only if (3.1) holds for some ordering of the rn. For example, the set {2,3,6} obeys this condition (with ordering (2,6,3) or (3,6,2)). The set {2,3,7,42} violates the condition (it is the only such set with < 4 decimators). However, this does not preclude existence of PRFBs with more complicated stackings of nonoverlapping real coefficient brickwall filters, e.g., as in Fig. 9. Given a set S of decimators, does such a PRFB using
271
TREE STRUCTURES
nj=Ho (no = 2) n = /f2 (n2 = 7)
i{en
= Hi ( m = 3) = Hs (ns = 42)
Hiien 2\x
X 2x
2x
2x
2x
2x
lOx
{i^
=
Hi{e-n
^)
Figure 10.9. Non-contiguous stacked ideal real coefficient brickwall FB
the set S always exist? Does its nonexistence imply that there is no PRFB using S with real coefficient filters (ideal or otherwise) at all? The answers are not currently known to the authors. An important class of FBs studied in the later sections is that of all rational FBs, i.e., those in which all filters have rational transfer functions. As Section 6.1 will show, neither of the above decimator-sets {2,3,6} and {2,3,7,42} permit existence of a rational PRFB (since they have pairs of coprime decimators). Thus it is tempting to conclude that the decimators of rational PRFBs are more restricted than those of real coefficient PRFBs. Indeed, intuition suggests that for any decimator-set 5, existence of rational PRFBs using S implies that of real coefficient rational PRFBs using S. This is in fact true for all sets S for which rational PRFBs are currently known to exist. However, as we will see later, there are many sets for which it is not known whether either rational PRFBs or PRFBs with real coefficient filters (rational or otherwise) exist. Thus, in general we do not know whether existence of the former implies that of the latter. The constraint of realness of filter coefficients will not be applied or studied further in the rest of the paper. 10.4 TREE STRUCTURES
Cascading uniform PRFBs in a tree structure is the most common method of designing nonuniform PRFBs. As pointed out in Section 1, this method, though useful, is far from providing a complete PR theory of nonuniform FBs, i.e., a full solution to either of the two basic problems posed in Section 3.1. However, tree structures do provide very useful tools in the study of these problems. This section aims at analyzing their role in this study. Section 4.1 defines some basic terminology we will often use later in describing and studying tree structures. Section 4.2 analyzes the method of cascading uniform PRFBs in tree structures, and shows with examples how it falls short of a full PR theory of nonuniform FBs. Section 4.3 presents general methods that use trees to improve upon known PR conditions on the decimators of nonuniform FBs belonging to various FB classes. By ’improving a PR condition’ we mean strengthening a necessary condition, or weakening a sufficient one. These methods will be applied to specific conditions later on. 10.4.1 Basics and terminology
A tre e structure d F B is one of the form shown in Fig. 10, built by repeated insertion of FBs into the subbands of other FBs. These constituent FBs of the tree structure will be called its units . They could be either uniform or nonuniform FBs, and may themselves be tree structured FBs. The terms parent , child , roo t an d leaf unit s will often be used to describe the relative positions of the units in the tree; their meanings are presumed
272
NONUNIFORM FILTER BANKS
4-unit tree root ru nit 0 (has no parent) leaves.-u nits 1,3 (have no children)
unit " ~ ^ t y t j | ~^ Fj [~^
-*fNH"prn unit 3
unit 0
unit 2 has unit U as parent and unit 3 as leaf attached to decimator n j.
unit 3
-*fR*ffrhT
unit 2
unit 2
unit 0
iM*[^ analysis bank
synthesis bank
subbands
Figure 10.10. Tree structure of filter banks
subband of tree structured FB A{z)B{zP)C{zP^)
IP^rh
tpqrr
Z{z)YizP)X{zP -
TIN is a multiple of all rii < Uj. It then suffices to verify existence of valid delays Ik obeying (5.2) for all ^fc < 72iv. As an extreme case, if every rij is a multiple of all rn < rij (i.e., every rij divides all rn > rij), then a delay-chain P R FB always exists. In fact t he decimator-set is then derivable from a uniform-tree (Appendix B). Fact 1 is also useful in proving Theorem 3 which follows soon. Nonuniqueness of delay-chains: W h en a decimator-set allows building of a P R delaychain F B, in general this delay-chain is not unique. T he non-uniqueness can be much deeper t h an t h at caused simply by adding a constant delay to all t he filters. For exam› ple, when several delay-chains are possible, it could happen t h at some of t h em are also derivable from uniform-trees, while some others are not, as seen in Section 4.2.
10.5.3 Delay-chains vs. uniform-trees Our study of tree structures showed t h at (a) known P R conditions on decimators can sometimes be strengthened using trees (Section 4.3), and (b) derivability of t he dec› imators from a uniform-tree is a sufficient P R condition for all FB classes t h at we s t u dy (Section 4.2). Does this teach us more a b o ut delay-chains? Firstly, t he condi› tion (5.2) is b o th necessary and sufficient for existence of P R delay-chains. Hence it remains unaltered by t he procedures of Section 4.3. Next, t he uniform-tree condition is not necessary, as we now show: T h e o r e m 3 : P R d e l a y - c h a i n s w i t h o u t u n i f o r m - t r e e s . There are infinitely many P R delay-chain F Bs t h at cannot be derived from uniform-trees. Such F Bs can be built using every set of decimators of t he form S = { n o , n i , n 2 , L, L , . .. , L }, where L = l c m ( n o , n i , n 2 ), and no = m i m 2, ni = m2mo, n2 = m o mi where t he rui are pairwise coprime integers greater t h an unity. (Here L occurs L (l S i = o ( V ^ * )) times in 5.) Proof: By Fact 1, decimators of S allow building of a P R delay-chain FB iff there are integers I0J1J2 obeying (5.2) for i,j G { 0 , 1 , 2 }. This condition is easily ensured, in fact it holds iff g c d ( n t , n j) > 2 for iJ G { 0 , 1 , 2} with strict inequaUty for at least one ’^ ¥" j - 0^^ can then make a valid choice of t he li from t he numbers 0,1,2.) Further if
DELAY-CHAINS
279
gcd(no,ni,n2) = 1, the set cannot be derived from a uniform-tree (Appendix B). Both these requirements are satisfied by the choice of rn stated by the theorem. An example of a delay-chain PRFB not derivable from a uniform-tree was first shown in [4]. Its set of decimators {6,10,15, 30, 3 0 , . . ., 30} (30 occurring 20 times) is a special case of the construction of Theorem 3 with (mo,mi,7712) = (5,3,2). This is not the only way to produce such examples: Delay-chain PRFBs can also be built with the decimator values 6,10,15,30 when the number of their respective occurrences are 2,4,1,6 or 2,3,2,7. The former set of decimators is the smallest such example.^ It can be used as the root of a tree to derive the example of [4], but not the latter example. In all these cases, the decimators have no common factor, ensuring that they are not derivable from uniform-trees. In fact if the decimators of a delay-chain PRFB do have a common factor, the FB can be built from smaller PR delay-chains as follows: Fact 2. Let all decimators in a PR delay-chain FB have common factor K > 1. Then the FB can be derived from a tree structure in which each unit is a PR delay-chain FB and the root is uniform with decimator K. Proof: Let x{n) be the FB input. For 0 < A; < A’, let fk{n) = x(Kn - k), which is the k-th subband signal in a uniform K channel delay-chain PRFB. Now consider the i-th channel of the given PR delay-chain, with decimator rii, analysis filter z~^\ and hence, subband signal x{nin k). Since rn is a multiple of K, either all its samples lie in the sequence fk(n)^ or none of them do (depending on whether or not li = k (mod K)). We now collect all subbands whose samples do lie (entirely) in fk{n). Due to the PR condition for delay-chains (Theorem 2), these subbands collectively contain all samples of /fc(n) (as none of the other subbands have any of them), and each of these samples occurs in exactly one of these subbands. Further the delays in all these subbands are equal (to k) modulo K. Thus these subbands can be generated by inserting a suitable delay-chain PRFB as a child (in a tree) in the k-th. subband signal /fc(^) of a uniform K channel delay-chain PRFB. Repeating this process for /c = 0 , 1 , . . ., K 1 yields the desired tree structure. D Remarks : 1 The above result does not generalize easily to other classes of FBs (besides delaychains). For example, consider the decimators {4,4,4,4}, having common factor K = 2. These decimators can be used to build rational and FIR PRFBs that are not derivable from any tree structure (besides the trivial one). 2 A common factor K > 1 among all decimators does not by itself ensure their derivability from a tree whose root is uniform with K as decimator.’* However, if the decimators also allow building of a delay-chain PRFB, then by Fact 2, there is at least one such tree, as the FB itself is derivable from such a tree. 3 All decimators of a PR delay-chain FB need not have a common factor K > 1 (see the example in Theorem 3). However, further conditions on the decimators can force such a common factor to exist, thus making Fact 2 apply. For example, suppose the PR delay-chain has a decimator of value m occurring m 1 times (m is thus the smallest decimator). Then all decimators must have m as a factor. This is provable by a slight extension of the proof of Fact 2. In fact it even generalizes ^This is true when size is meaisured by either the number of decimators, or their 1cm, or the largest one. In fact there is no other example with 13 or fewer decimators. This is verifiable by exhaustive search aided by a computer and Fact 2. ^The set of decimators {4,6,6,10,10,10,10,60} shows this for K = 2. The choice of root prevents the leaves from obeying (11).
280
NONUNIFORM FILTER BANKS
to rational FBs in place of delay-chains (Theorem 5, Section 7), although this is harder to prove. 10.6 T H E CLASS OF RATIONAL FBS
In this section and most of Section 7, the FB class C of interest is that of rational FBs, i.e., FBs in which all filters are rational. We seek necessary and sufficient conditions on a decimator-set S for existence of a rational PRFB using S, The weakest known sufficient condition is that of existence of a PR delay-chain (Section 5). This is clearly sufficient since delay-chains are rational FBs, but is it also necessary? Or is there a decimatorset which does not permit existence of PR delay-chains, but allows building of rational PRFBs (whose filters are not all delays)? This is a major open question in the PR theory of nonuniform FBs. A possible approach to answer the above question is to try to build a rational PRFB with decimators that do not allow building of PR delay-chains. However, starting with an arbitrary decimator-set such as S = {2,3,6} does not help, as S violates a known necessary condition (called ’compatibility’, Section 6.1) on the decimators of a rational PRFB. Such sets must be excluded, and to this end it helps to derive more necessary conditions. This is our main contribution in this section. The previously known necessary conditions for PR are described in Section 6.1. Each subsequent subsection develops a new necessary condition that is strictly stronger than a previously known one. Table 1 (Section 8) presents a comprehensive summary of all known conditions, many of which are new results of the present work. The table studies the interrelationship between the conditions, and lists example decimator-sets illustrating their use. All the new necessary conditions we develop still collectively remain insufficient for existence of delay-chain PRFBs, and thus it is still not known whether they are sufficient for existence of rational PRFBs. Our work reduces the ’gap’ between the necessary con› ditions and the sufficient one. Proving that the sufficient condition is in fact necessary would in some sense render obsolete most of the present section. However this appears tough to do, in fact the statement may not even be true. Our work is a step towards the truth. 10.6.1 Previously known necessary conditions on decimators
1 Pairwise noncoprimeness. No two decimators of a rational PRFB can be coprime [4]. If gcd(ni,nj) = 1 for two decimators Ui.Uj in Fig. 1, the biorthogonality condi› tion (2.3) for PR implies HiFj = 0 and {HiFi) U, = {HjFj) Uj = 1. This is impossible for a rational FB, as HiFj = 0 forces Hi = 0 or Fj = 0. 2 Compatibility. Every decimator occurring only once must divide some other decimator [1, 5, 4]. In particular, the largest decimator must occur at least twice. As Section 6.4 will show, without this condition the rational FB cannot even be a nonzero LTI system, let alone have PR. 3 Strong compatibility. This condition, developed in [4], places a lower bound bj > 1 on the number of occurrences Nj of each decimator Uj. The condition is stated as follows: Nj >bj =
( min \cm{pi,pj)) , where pi =
Pj \Pil^Pj
)
^i
,
(6.1)
where L = K(lcm{nt}) for any integer K > 0. This will be shown in Section 6.4, which in fact proves a new condition strictly stronger than the above.
THE CLASS OF RATIONAL FBS
281
Note that the bound bj of (6.1) is independent of the integer K. Also, it only needs verification for distinct decimator values, because if rii = Uj then Ni Nj, bi = bj. For a uniform set of decimators, pi = pj for all z,j, so we define bj = 1 here (so that the bound holds). Excluding this case, bj = 1 iff pj is a multiple of some pi ^ pj, i.e., iff rij divides some distinct decimator rij. So the bound need not be checked for such decimators. Also, strong compatibility impHes compatibility, because it demands that any Uj occurring only once (i.e., with Nj = I) must have bj = 1, i.e., must divide some other decimator. In fact strong compatibility is a strictly stronger necessary condition than compatibility, as shown by the set of decimators {2,4,6, 24, 24}. However it does not imply pairwise noncoprimeness [1] (shown by {2,5,10,10,10}). Likewise, a set could satisfy pairwise noncoprimeness but violate compatibility (and hence strong compatibility), e.g., {2,4,6,12}. 10.6.2 The pairwise gcd test
Theore m 4: Pairwis e gcd test. Among the decimators of a rational PRFB, there cannot be a subset of p + 1 decimators such that the gcd of any two elements from the subset is a factor of g. In particular (for ^ = 1), this implies the pairwise noncoprimeness condition (Section 6.1), Proof: As with pairwise noncoprimeness, the proof uses the biorthogonality condi› tion (2.3) for PR. Let p -h 1 decimators n o , n i , . .. ,ng be such that the gcd of any pair divides g. From (2.3), {Hi{z)Fj{z)) igc2:), R(z). 10.6.3 Tree version of strong compatibility
In Section 4.3, we saw how given a necessary condition P on the decimators for PR, we could form its ’tree version’ P’\ which is a stronger (though not necessarily strictly stronger) necessary condition. We can apply this process to the conditions of Section 6.1. Some thought shows that both the pairwise noncoprimeness and the compatibility con› ditions are preserved by tree structures, and are hence identical to their tree versions (as seen in Section 4.3). However, the same is not true with strong compatibility: Its tree version is strictly stronger than itself. This is shown by the two-unit tree in Fig. 13. Both units R and S are strong compatible, and 5 allows building of rational PRFBs (as it is uniform). However the resulting set of decimators is not strong compatible. Hence, though R obeys strong compatibility, it violates its tree version. A complete algorithm to test this new necessary condition is described in Appendix E. Its derivation involves characterizing trees similar to that in Fig. 13. This is done by the following results:
282
NONUNIFORM FILTER BANKS
\i2\
Ell
xxxx
> unit /?
[|6l [12^ [J4^ [Ji^
Figure 10.13. Showing that strong compatibility is not preserved by trees
Fact 3. Consider a set T of decimators derived from a 2-unit tree structure having root R and leaf S attached to decimator mo of R. Suppose i?, S are strong compatible but T is not. Then 5 is a uniform unit, i.e., all its decimators have equal value K. The decimator mo of R does not occur in T, i.e., it occurs only once in R. The decimator m,oK of T obtained at the leaf S also occurs in R. Decimators of this value m,oK are the only ones violating the strong compatibility lower bound on the number of their occurrences in T. Fact 4. Let a set D of decimators satisfy strong compatibility but violate its tree version. Then there is a tree T generating a set T of decimators, such that T and T have the following properties: 1 The tree T has root D. All leaves of the tree are uniform and are children of its root. All decimators obtained at the leaves have equal value d. 2 If di are the decimators of D to which leaves are attached in T, then no decimator in T has value di. 3 Ifd^D, then d = \cm{di}. Hence, if d ^ D, the di are not all equal (for otherwise, d = die D). 4 Decimator d £T violates the strong compatibility lower bound on the number of its occurrences in T. Fact 3 is proved in Appendix C and used to prove Fact 4 in Appendix D. Fact 4 gives an algorithm to test whether the set D obeys the tree version of the strong compatibiHty condition: We find all trees with root D and properties 1,2 and 3 listed in Fact 4. It can be seen that there are only finitely many such trees, and from Fact 4, D violates the condition if and only if one of these trees also obeys property 4. This idea is the basis of the detailed algorithm of Appendix E. 10.6.4 The AC-matrix test
The necessary condition derived here relies heavily on the AC matrix formulation (2.6),(2.7) of the PR condition on the filters of the FB. The algorithm to test the condition is described in Appendix F, and may be taken as the statement of the condition (i.e., this condition, unlike the earlier ones, does not have a short / simple statement). Like the test of Section 6.3, this test also strictly strengthens strong compatibiUty, but in an inde› pendent direction. In this section we derive two lemmas that explain the operation of the test, illustrate the test with examples, and thus justify the algorithm of Appendix F. Deriving the new test also proves the necessity of strong compatibility for PR; a result assumed in deriving the test of Section 6.3. We further show that (simple) compatibility is necessary even if we allow the rational FB to violate PR but merely insist that it be an LT/system (i.e., an aliasfree FB) that is not identically zero.
THE CLASS OF RATIONAL FBS
283
Tw o key result s use d b y th e test Lemm a 2. In Fig. 1, if J2f^ Hi^ {z)Fi^{z) = 0 for any set of u , 0 < ifc < M, then the FB cannot have PR. Proof: If the FB of Fig. 1 has PR, {Hi,{z)Fi,{z)) U^^ = 1 by biorthogonaht y (2.3). Let L = lcm{ni}, thus {Hi, (z)Fi, (z))I I = (1) i(L/n,^) = 1. So ( E^ Hi, {z)Fi, (z)) II # 0, violating X:^ Hi, {z)Fi, (z) = 0. Lemm a 3. Given rational filters Bi(z),Ci{z), 0 < i < N, let W = e"^^’’/^ and Gi(z) = E i l o^ Bi{zW^)Ci{z). If Gi{z) = 0 for iV values of / occurring consecutively in an arithmetic progression, then Gi{z) = 0 for all values of / in this progression. (The lemma in fact holds for any nonzero complex W.) Proof: UN = 1, the lemma must be taken to mean that if Bo{zW^)Co(z)= 0 for some I, then it holds for all /. This is clearly true: Rational filters Bo{z),Co{z) obey Bo{zW^)Co{z)= 0 iff Bo = 0 or Co = 0 or both. (Note however that this is in general false if we remove the rationality constraint.) Hence, let iV > 1. Let the N given consecutive values of / in arithmetic progression be s, s -\- d, s -{- 2d,..., s -\- {N l)d. The lemma can then be restated by defining b(z) = [Bo{zW^). ...,
BN-I(ZW’)]
, c{z) = [Co(z), ..., CN-i(z)f ,
as follows: If h{zW^’^)c{z) = 0 for n = 0 , 1 , . .. ,iV - 1 then it is true for all integers n. To show this, form the square matrix 6(2) with rows h{zW’^’^), 0 < n < N. By the premise of the lemma, ’B{z)c{z)is the zero vector. This implies linear dependence of the columns of 6(2), and hence of its rows, as it is square. So Xl^^zTo^ 1 at Step 1, as ^ is the smallest decimator. At Step 2 in forma› tion of the partition, if we sequentially select elements from the smallest upwards, the condition ensures that at some stage the reciprocals of the selected elements will sum to unity. Repeating this process results in a vaUd partition, and further each of its groups also satisfies the condition. Thus the proof is completed by induction on the number of decimators. Derivability of a set of decimators from uniform-trees implies existence of various types of PRFBs (including PR delay-chains) using those decimators. Thus, any conditions necessary for such existence are also necessary for derivability from uniform-trees. Their necessity is often provable directly from the above algorithms. For example, without pairwise noncoprimeness (Section 6.1), p = 1 at Step 1 of the root-to-leaves algorithm. If compatibility (Section 6.1) is violated, i.e., if a decimator d does not divide any other decimator, then eventually m = d and A^ = 1 at Step 2 of the leaf-to-root algorithm, i.e., there are no sets Sk- As tests for such necessary conditions are inconclusive whenever they are satisfied, they cannot replace the earlier complete algorithms, though they can potentially increase their efficiency.
10.10.3 Appendix C: Proof of Fact 3 Let R = {mo,.. . , m M - i }, S = {/co,..., fc/c-i}. So T = {no,.. ,riK+M-2} with rn = moki for 2 = 0 , 1 , . . ., X 1 and n/c-i+i = rrii for z = 1,2,..., M - 1. Let L = lcm{ni}, Pi = L/rii. Let Hi occur Ni times in T. Let hi be the strong compatibility lower bound on Ni. The proof is in two parts: Part 1: Uniformity of S. Suppose S is not a uniform unit, we will then show that bj < Nj for all jf, i.e., T is strong compatible. Indeed for j = 0 , 1 , . . ., A’ 1 we have from (6.1),
294
NONUNIFORM FILTER BANKS Pjbj = min \cm{pi,pj)
min
K, if rij = mo then bj = I < Nj^ because Uj divides a distinct decimator no = rijko. If rij ^ mo, t h en Nj > Nj , t he number of occurrences of rij = mj_(^K-\) in R. Let bj be the strong compatibility lower bound on Nj^. Thus bf < Nf- < Nj, and with A=
min
and
\cm{pK-i+i,Pj)
Pjbj = mm(A,B),
B = \cm{L/mo,Pj),
while
we have
Pjbj = min \cm{pi,pj) Pi
(10.8) (10.9)
i’Pj
T h us if A < B in (10.8) (e.g., this holds if rrii = mo for some 2 > 0), then clearly bj
min
Px^Pjj
0 Pjbj,
as Pi = L/{moki) for i < K, and nonuniformity of S again ensures t h at lcm(p»,pj) is not being minimized over an empty set. (Nonuniformity of S is not needed here if L/{moK) ^ Pj.) So again 6^ < bf < Nj. Thus, bj < Nj for all j , i.e., T is strong compatible, contradicting t he premise of Fact 3. Hence S must be a uniform unit, i.e., ko = ki = ... = kK-i = K. O Part 2: Necessary conditions for bj > Nj. We have already shown in P a rt 1 t h at if j > K, then bj > Nj is possible only if mo occurs only once in R and moK = Uj. T he proof of Fact 3 will be completed if we show a similar statement for j < K, i.e., t h at bj > Nj is possible only if mo occurs only once in R and moK = Ui for some i > i^. To show this, note t h at for all j < K, all t he Uj are identical (shown by P a rt 1), and hence the same holds for t he Nj and the bj. Also Nj > K. T h us it suffices to show t h at bo < K if either for mo = mi = riK-i+i for some / > 0, or m,oK ^ m,i for all i > 0. If mo = m^ = UK-I+I some / > 0, t h en po^o = min lcm(pi,po) < lcm(pK-i+/,Po) = 1cm ( , r^ ) = = PoK, Pi^Po \Tno moi\ J TTio hence 6o < K. If on the other hand mo occurs exactly once in J?, then m oF = mi = UK-i+i for some F > 1, / > 0 since R is compatible. T h us if m,oK ^ mi for all i > 0, then po^o = min lcm(pi,po) < \cm{pK-i+hPo) Pi^po
= 1cm ( =, r^ ) < = Po^i \m,or TTioA / TTIO
hence bo < K again. This estabhshes the claim, hence proving Fact 3.
10.10.4 Appendix D: Proof of Fact 4 From t he premise of Fact 4, there is a tree T’ in which each unit is either D ^>v allows building of rational F Bs (e.g., uniform units), such t h at T’ generates a set of CUH imators
APPENDICES
295
that is not strong compatible. Note that every unit in T’ is strong compatible. We now perform a series of operations on T’, each yielding a new tree with all the properties of T\ until finally we get the tree T with the desired properties as in Fact 4. If the root of T’ has a child that is not a leaf, then this child, along with all its descendants, forms a tree with fewer units than T\ We can assume that this tree generates a strong compatible decimator-set (else we can replace T’ by this tree and repeat the process). We then view this tree as a single unit. This makes every child of the root of T’ a strong compatible leaf. Next, we delete any leaf such that the residual tree generates a decimator-set that is not strong compatible. This yields the desired tree T having all properties of T’. We now show that T and the decimator-set T it generates have all the properties listed in Fact 4. Properties 1,2,4: For any leaf S of T, we see that T can be redrawn as a 2-unit tree with strong compatible units R and S. However T itself generates the set T that is not strong compatible. Thus we can use Fact 3 to conclude the following: (a) All leaves of T are uniform, (b) For any decimator value obtained at a leaf of T, decimators of T with that value are the only ones in T that violate the strong compatibility lower bound on the number of their occurrences in T. (c) Property 2 of Fact 4 holds. Now (b) impUes that all decimators obtained at the leaves have the same value d. Also, (a) imphes that T has root D: Otherwise the root allows building of rational PRFBs, and hence, so does T (as all children of its root are uniform leaves); violating the fact that T is not strong compatible. This completes the proof of property 1. Property 4 follows from this and conclusion (b) listed above. Thus we have shown properties 1,2,4 of Fact 4. Property 3: Let ki be the decimator value of the leaf attached to di ^ D to form T. As diki = d^ we have d = Clcin{di} where C = gcd{A:i}. We must show that if d 0 D, then C = 1. In fact, this may be false. Our approach is to assume that d ^ D, and then create a new tree T* generating a decimator-set T* with all the properties of Fact 4. This is done by replacing every leaf decimator ki with ki/C. (If ki C this means deleting the leaf.) Clearly property 1 of Fact 4 continues to hold, with the decimators obtained at the leaves now having value d* = d/C = Icmjcii}. To prove property 2, let decimator di of D have a leaf attached to it in T*. Then it also has a leaf (uniform with decimator ki) attached in T. As di ^ T (by property 2 for T), the only way to have di e T* is that di be the newly formed decimator d/C. This however means that ki = C {as d = diki), i.e., the leaf attached to di in T has been deleted in T*, contradicting the assumption on di. Thus di ^ T*, i.e., T* obeys property 2. Next we prove property 3. As already seen, if kj = C for some j , then d* = d/C = dj D. Thus, if d* ^ D, then kj > C for all j , i.e., decimators di with leaves attached in T are the same as those with leaves attached in T*. So property 3 holds for T* from d* = d/C = \cm{di}. Lastly, we show property 4, i.e., that d* violates the strong compatibility lower bound 6* on the number iV* of its occurrences in T*. Let N be the number of occurrences of d in T, and let b be the strong compatibility lower bound on N. Let L be any common multiple of the decimators of T. We must show that b* > N*. Since T obeys property 4, we have b > N. Also, by construction of T* and the hypothesis d ^ D, we have N* > N/C. The inequality is strict only if d/C T, but this would imply (by definition (6.1) of 6) that 6 < (^) lcm(^, ^ ) = C. Since N > ki > C, we get 6 < AT, a contradiction. Thus d/C 0 T, and hence N* = N/C. Lastly, 6* = ( ^ ) l c m ( ^ , ^) for some m G T\ m 7^ d/C. Thus m e D and m e T too, and m ^ d by the hypothesis d 0 D. So 6 < ( f ) lcm(^, ^) b/C > N/C = N^ (using b> N). Thus b* > N* as required. D
296
NONUNIFORM FILTER BANKS
10.10.5 Appendix E: Testing Tree Version of Strong Compatibility Given a decimator-set £), let V = {vo^vi, ^ . ,VK-I} be the set of distinct decimator values in D, with Vi occurring Ni times in D. Let L be any multiple of all the Vj, i.e., of lcm{i;i}, and let pi = L/vi. Then D satisfies the tree version of strong compatibiUty if and only if Routine 1 below returns the value ’TRUE’ for all Vxe V and Routine 2 returns value ’TRUE’. Routine 1: (To be performed for all Vi G V) 1 Initiahzation: Set M Ni, A = V and delete Vi from A. 2 If A is empty, return(TRUE) . Else, let j I minimize \cm{pi,pj) over all j such that Vj e A.lf M < \cm{pi,pi)/pi, return(FALSE). 3 Uvi does not divide Vi, return(TRUE) . Else, add Ni{vi/vi) to M and delete vi from A. This represents attaching to every decimator of value vi, a leaf that is uniform with decimator Vi/vi. Then go to Step 2. Routine 2: 1 Find all subsets 5 of V having at least two but less than K 1 elements, such that the 1cm 1{S)of all elements of S does not divide any Vj G V2 For each S of Step 1, let a{S) be the sum of all the numbers Ni(l(S)/vi) for all Vi e S. Let 6(5) be the minimum of ( ^ ) lcm(7(|y,:^) over all Vi ^ S. This step represents attaching to every decimator whose value Vi lies in 5, a leaf unit that is uniform with decimator l{S)/vi, so that all decimators thus obtained at the leaves have value 1{S).In the resulting tree structured set of decimators, cr{S)is the number of occurrences of decimator 1{S)and 6(5) is the strong compatibility lower bound on cr(5). 3 If a{S) > 6(5) for all 5 above, return(TRUE) . Else return (FALSE). The action of the routines is independent of which multiple of lcm{fi} we choose L to be. To explain how the above test works, refer to the statement of Fact 4. Routine 2 Usts all trees T obeying properties 1,2,3 of Fact 4 such that d^ D (see property 3), and returns a ’FALSE’ value if any of these obey property 4. The set 5 of Step 1 represents choice of the di of property 2. We demand that 5 must have at least two elements, and that 1{S) ^ Vj for all Vj V, to ensure that property 3 holds with d ^ D. In fact we further demand that 1{S)must not divide any Vj £V, for if it does, 6(5) = 1 at Step 2. We also exclude sets 5 with > K -I elements, for then T generates a set with at most two distinct decimators. Such a set, being derivable from a uniform-tree (Appendix B), is always strong compatible, i.e., a(5) > 6(5) will hold at Step 3. Routine 1 becomes a test for strong compatibihty if we delete Step 3 in it. Hence we can assume strong compatibility of the given set of decimators. Thus the only task remaining is to examine whether there is a tree T obeying all properties of Fact 4 with d D in property 3. This is achieved by the addition of Step 3. To see this, let there be such a tree T, with d = Vi, producing a set T of decimators. The quantity 6 = \cm{pi,pi)/pi of Step 2 is the lower bound on Ni, which holds by assumption of strong compatibihty. Now the number NT of occurrences of Vi in T is at least Ni. Further if vi e T, then the strong compatibihty lower bound on NT does not exceed 6, and hence cannot be violated. Thus vi ^ T, i.e., all decimators of value vi must have leaves attached to them to convert them into decimators of value Vi. This justifies Step 3. In the special case when L = \cm{vj} £ V, Routine 2 can be skipped (it always returns ’TRUE’), and Routine 1 needs execution only for Vj - L (it returns ’TRUE’ for all other Vj). This is provable from the fact that for Vj = L,pj = I. In general. Routine 1
APPENDICES
297
appears to be the important part of the test: There are relatively fewer decimator-sets for which violation of the test is detected by Routine 2 but not by Routine 1 (examples of such sets being {2,3,24, 24,36,36,36} and {2,4,6,48,48,72,72, 72}). 10.10.6 Appendix F: Algorithm for the AC Matrix Test
In the given set of decimators, let vo.vi,... ,VK-I be the distinct decimator values, with Vj occurring Nj times. Let L be any common multiple of the Vj^ and let pj L/vj. The algorithm is then as follows: 1 and columns 0 1 Initialization. Create a matrix U with rows numbered 0 to L 1, where the Ij-ih entry uij is 1 if / is a multiple of pj, and zero otherwise. to /T Thus U is initialized to describe the positions of the zero and nonzero entries in the AC matrix (2.6),(2.7). In particular, uoj = 1 for all j . 2 Set U’ = U (saving the current value of U in U’). For all /, j such that uij is the only entry in the l-th row having value unity, set uij = 2. This identifies sets of filters having the same decimator value Vj, and satisfying an equation of the form
j:,Bi(zW’)Ci{z) = 0.
3 For each d kpj for integer k obeying 1 < kpj < [Z//2J, let Cs{n) = s -\- nd for s = 0,pj,2pj,.. .,d- Pj. If uij = 2 for / = c’^{n) (mod L) for Nj consecutive integers n, set uij 2 for I = c’^in) (mod L) for all integers n. Do this for each j = 0 , 1 , . . ., X 1. (This represents use of Lemma 3.) 4 If uoj = 2 for any j , the given set of decimators fails the AC matrix test. (This is where we apply Lemma 2.) If U’ = U, the set passes the test. If neither of these happens, go to Step 2. Passing the above test is a necessary condition on the decimators of any rational PRFB, as the discussion of Section 6.4 proves. The test outcome is independent of which common multiple of the Vj we choose L to be. The above algorithm may be made more efficient in many ways (e.g., we can declare the test as passed if U’ = U after Step 2); our main purpose here is to state a correct (rather than highly efficient) algorithm. Lastly, we prove that the above test implies strong compatibility. Consider any fixed J G { 0 , 1 , . . ., /C - 1}, and find the smallest / > 0 such that uij is not set to value 2 at Step 2. This is the smallest nonzero multiple of pj that is also a multiple of some pi ^ pj^ i.e., it is miup.^p^. lcm(pi,pj) = pjbj where bj is as in (6.1). Thus, after Step 2, uij = 2 (oTI = kpj fovk= 1,2,..., 6j - 1. So if Nj < bj, Step 3 will use the sequence CQ^ (n) to set uij = 2 for all / = npj. In particular it sets UQJ 2, which means that the test is failed (see Step 4). Hence if the test is passed, we have Nj > bj for all j , which is the strong compatibiUty condition (6.1). 10.10.7 Appendix G: Proofs of Theorems 6,7
Proof of Theore m 6: We will prove the claim of the theorem after replacing its premises (7.2)-(7.5) about the decimator-set D by the following premise: The set D has two nonempty disjoint subsets 5, T such that = ^-r for some integer TV, ^Ui N \T\ = = i V - l, and gcd(ni,rij) = factor of N whenever
(10.10)
Ui e SUT, Uj
T, i ^ j
(10.11) (10.12)
298
NONUNIFORM FILTER BANKS
This suffices because from a rational P R FB obeying (7.2)-(7.5), we can create one obey› ing (10.10)-(10.12) by inserting in each of its channels with decimator rii G T i, a uniform rational P R FB with decimator N/rii. This process preserves t he channels corresponding to t he decimator subset S, and creates [Yin eT ( ^ ) ) ^ ^ ^ decimators each of value N. T he set T consists of T2 and these new decimators; thus (10.11) follows from (7.5), and (10.12) from (7.4) and t he fact t h at the new decimators have value N. Having proved the claim using (10.10)-(10.12), we remove the inserted uniform leaf F Bs to prove it under t he original premise (7.2)-(7.5). Part 1: Proof under additional assumption that all Ui £ S are multiples of N. Let u s be given a rational P R FB with decimator-set D and filters as in Fig. 1, such t h at D has disjoint subsets 5 , T obeying (10.10)-(10.12). Let E{z),K{z) respectively be t he Nth order analysis and synthesis polyphase matrices of the analysis and synthesis filters corresponding to channels with decimators Ui T. Let ei{z) be t he N-th order analysis polyphase vector of Hi{z) where Ui £ S. From (10.11), E(z), K{z) have sizes {N-l)xN and N X (N - 1) respectively. We use (10.12) with the P R condition (2.3) and t he polyphase lemma, as in Section 6.2. This shows t h at ei{z)’R{z) = 0, and t h at E{z)K{z) 1) diagonal matrix, none of whose diagonal entries is identically zero. is a (AT - 1) X (iV This implies (using rationality of t he filters) t h at R ( z) has N 1 linearly independent columns. All t he ei{z), being ’orthogonal’ to all these columns, must be ’proportional’ , i.e., ei(z) = Hi{z)a.{z) for some rational filters Hi{z) and vector a.{z). Let A{z) be t he filter with ai{z) as its iV-th order analysis polyphase vector. C o m p u t i ng Hi{z) from Gi{z) shows t h at Hi{z) = A{z)Hl{z^). A similar argument shows t h at for all i such t h at m iS^, Fi{z) = B{z)Fl[z^) for some rational B{z), Fl{z). Thus, under the additional assumption t h at all decimators in S are multiples of TV, we see t h at the given rational P R FB is derivable from a two unit tree of rational FBs. T he units of t he tree have decimator-sets exactly as desired, and using Theorem 8, their filters can further be modified so t h at they also have P R. This completes P a rt 1 of the proof. Part 2: Extending Part 1 to nonrational FBs in the setting of Theorem 5. Whe n th e original premises (7.2)-(7.5) of Theorem 6 are obeyed in t he special manner t h at results in t he premise of Theorem 5, the effect on (10.10)-(10.12) is to cause D = SUT and Uj = N for all rij G T. Now in P a rt 1, the diagonal elements of E{z)’R(z) are {Hj{z)Fj{z)) IN where n-, T (by polyphase lemma). Thus, in the above special case, by (2.3), in fact E ( z ) R ( z) is t he identity. Hence we can choose the A{z),B{z) of P a rt 1 to have Nth order analysis and synthesis polyphase vectors a(2), b ( z) respectively, such t h at t he N X N matrices
r
E{z)^ and
1 R ( z) b ( z)
have product equal to identity. This possible ^ a ( z) J even without any rationality restriction on the filters (of course A, B are then nonrational in general). These matrices now become the polyphase matrices of the root F B. Thus, t he root automatically has P R, and hence so does t he leaf (since the overall FB has P R ), without t he need to use Theorem 8 (which requires filter rationaUty). T h u s, for t he special case of Theorem 5 (as distinct from the general setting of Theorem 6), we have extended P a rt 1 to nonrational FBs. Part 3: Proving the additional premise used in Part 1, using filter rationality. For each i such t h at Ui e S we insert a Qi channel uniform rational P R FB within t he z-th channel of t he given P R F B, where qi = \cm(N,ni)/ni. This forms qi new decimators of value mqi. Let S’ be t he set of these decimators. Then, the newly formed tree-structured rational P R FB also has a decimator-set satisfying t he premises (10.10)-(10.12), with S replaced
APPENDICES
299
by 5 ’ and T unchanged. Indeed, (10.10),(10.11) obviously hold, while (10.12) follows from the observation that if gcd(ni,nj) is a factor of N and qi contains precisely the factors of N that are not present in TH (i.e., qi = 1cm(iV,n»)/ni) then gcd(ntgt,n-,) is also a factor of N. Further S’ also obeys the additional assumption that its elements are multiples of iV, by the choice of the qi. Let qi > 1 and consider two analysis filters C|(z), / = 0,1 of the qi band leaf FB inserted in the channel with decimator rii S. The corresponding analysis filters of the new tree-structured FB are Hi{z)Ci{z’^^).However, using Theorem 6 (which Part 1 has proved for the new FB), these filters have the form A(z)Di(z^) for some rational Di{z), A(z)where A(z)is independent of /, i. Taking ratios of these filters (a crucial step that requires filter rationality) shows that
which implies that each equals Xi(2; "^^^’"*^) for some rational Xi{z). Replacing z by 2^/"* and using the definition of qi, we have ^Hfr = A’i(2;^*). This means that the qith order analysis polyphase vectors e[{z)of Cl(z), / = 0,1, are linearly dependent, as e (z ) = ei{z)Xi{z). Thus, the inserted qi band uniform leaf FB with the filters Ci{z), while assumed to have PR, has an analysis polyphase matrix that is not invertible (since it contains the rows e[{z)J = 0,1). This contradiction disproves the assumption that Q qi > 1. Hence ^i = 1, or in other words, rii is a multiple of N. Proof of Theore m 7: We first write the input-output relations, analogous to (2.2), of the systems of Fig. 15: KM-l
^(^)
= YM ^ .
X{zW’)Gi{z)
for Fig. 15a
(10.14)
for Fig. 15b
(10.15)
1=0 M-l
X{z) = -^Yl Mz^^^)B{z)X{zW^^) /=o
Here Gi are as defined in statement (a) of Theorem 7, and (10.15) uses the PR property of the FB formed by the Hl.Fl. That (b) implies (a) in Theorem 7 follows directly by comparing (10.14) and (10.15), and holds even without any rationahty requirements on the filters. We now prove that (a) implies (b) (for which the rationality is essential). Form the M-th order AC matrix li{z) (of size M x K) using analysis filters Hi{z), i.e., let the q-th row of H(z) be {Ho(zW^’^),Hi(zW^’’),... .HK-iizW^"")) for q = 0 , 1 , . . ., M - 1 . Let f (z) = (Fo(z), F i ( z ) , . . ., F/c-i(z))^. Thus, the condition (a) is equiv› alent to U{zW^)f{z) = 0 for i = 1,2,..., / i : - l . Replacing z by zW^-^ U{z)f{zW-^) = 0. Now the K 1 columns f{zW~^), I = 1 , 2 , . . . , / f- 1 are linearly independent. For oth› 0 for all z, where erwise, there are rational filters ai{z)such that YlhT-^ai{z)f(zW~^) 1 < i < ^ and OLj{z) ^ 0. Dividing this by otj{z) and replacing z with zW^ shows that H(z)f(z) = 0 too. This would mean that Gi{z)= 0 for all integers /. This shows, by (10.14), that the system of Fig. 15a is identically zero, contradicting the premise of the theorem. Thus, the K - 1 columns f{zW~^), / = 1,2,... , / r - 1 are linearly independent, and each row of H(z) is ’orthogonal’ to all these columns (i.e., their prod› uct is identically zero). Hence all these rows must be ’proportional ’ to each other, i.e., hi(z) = C(z)ho{z) for some scalar filter C(z), where hi(z) is the i-th row of H(z). This mean s that Hi{zW^)/Ho{zW^) = Hi{z)/Ho{z) = Di{z), i.e., Do(z) = 1 and for 2 = 1,2,..., / r - 1, A(e^") - A(e^^’"+i^)), i.e., A(e^’") is periodic with period ^ . So
300
NONUNIFORM FILTER BANKS
Di{e^^) = Pi(e^"^), i.e., by rationality, Di{z) = R{z^). Thus, H^{z) = A{z)Hi{z^) where A{z) = Ho{z) and Hl{z) = Pi{z), showing that the analysis banks of Figs. 15a and 15b can be made equivalent. Next, replacing z with zW~^ in condition (a) of the theorem shows that the condition holds even if each Hi is interchanged with Fi. Hence the same process can be repeated for the synthesis banks. The above process may not ensure PR for the K band FB formed by the i/^, F/ (which we will refer to as the leaf FB). However, Gi now takes the form Gi{z) = A{zW^)B{z)Y^fSo’’H[{z^W^^)Fi{z^) = A{zW^)B{z)G[{z^), where G’i{z)= E z ^ o’ Hl{zWi^)Fl(z) and WK = W^ = e x p ( ^ ) . Thus, condition (a) implies that G\{z) = 0 for / = 1,2,... ,/^ - 1. (The alternative A(zW^)B{z) := 0 is infeasible as it makes the systems identically zero.) Now the input-output relation of the leaf FB is V{z) = ^ Ylt’o^ y{z^k)G\{z) (analogous to (2.2)). Thus the leaf FB is LTI with (rational) transfer function U{z) = G’o{z)/K. Hence, dividing all the H’i{z) by U{z) and multiplying A{z) by U{z^) gives a new system with all the properties desired in condition (b). This proves that (a) implies (b). 10.10.8 Appendix H: Proof of Theorem 8 It suffices to prove the result for 2-unit trees, as we can continue by induction. A general 2-unit tree is specifiable as follows: The triples of (analysis filter, synthesis filter, decimator) are {Hi{z),Fi{z),mi),i = 0 , 1 , . . . , M- 1 for the root and {Ai{z),Bi{z),ki),i = 0,1,...,/(" 1 for the leaf, which is attached to decimator mo of the root. Thus the filters allowing and requiring modification are HQ^FQ and the leaf filters Ai,Bi. The overall FB is unaffected iff the modifications preserve all the products Ho{z)Ai{z’^ ) and Fo{z)Bi{z^^). Realizing stability, FIR filters: Let all the Ho{z)Ai{z’^^) be stable. Then for every unstable pole z = p of Aj{z), there are mo unstable poles in Aj{z’^^), one at each mo-th root of p. To cancel these, we must have Ho{z) = Ho{z)C{z’^^) where Ho, H’Q have the same set of poles and C{z) = (1 - z’^p), so that C{z^^) is FIR with mo zeroes at the right places. Hence, replacing HQ by H’Q and the Ai by AiC removes the unstable pole of Aj and preserves the analysis filters of the overall FB. Thus all Ai can be made stable. Similarly if HQ has an unstable pole p, each Ai{z’^^) must have a zero at p, and hence for each i, Ai{z) = A’i{z){l - p^^z~^) where Ai.A’i have the same set of poles. Thus, replacing Ai by A[ and HQ{Z) by ifo(2)(l-p’" 2""’ ) removes the unstable pole of ^ o. Thus all filters can be made stable while preserving the overall FB. Similarly, if all the Ho(z)Ai{z’^ ) are FIR, the above argument can be repeated for all poles (rather than just the unstable ones), and all analysis filters can be made FIR. Realizing PR, orthonormality: If the overall FB has PR, from (2.3) we get {HQ{z)Ai{z-’^)FQ{z)BAz’^’))
igcd(mofc..mo^,) =
{{HQ{Z)FQ{Z)) imoMz)B^{z)) igcd(;.,.fc,) = Ki~3)
(10.16)
With rational filters X{z),Y[z) defined such that XY = {HQFO) jmo, let A’i = AiX, B’i = BiY for all i. Thus from (10.16), {A’i{z)B’j{z))igcci(A:,,fc,) = ^{i - j). i-e., replacing each Ai by A’i and Bi by B- causes the leaf FB to obey (2.3) and hence to have PR. The overall FB is preserved on replacing HQ[Z)by H’Q{Z) = HQ{Z)/X{Z’^’’) and Fo{z) by Fly{z)= Fo(z)/r(2’^ ). Since now both the leaf and the overall FB have PR, the root must have PR too. Thus the root and leaf have been modified as desired. Further if the overall FB is orthonormal, then it has PR with Fo(2)Bi(2"’ ) = Ti{z) where
REFERENCES
301
Ti{z) = Ho(z)Ai{z’^^) (and of course, Fi = % for i > 0). Using PQ = PQ, this means that (10.16) holds with Fo.Bi replaced by Ho.Ai respectively. So we repeat with these substitutions, the earlier arguments used to make the root and leaf PR, and choose X such that Y = X, i.e., such that XX = (HOHQ) [mo = ^(z)- (This is possible by spectral factorization, as W{z) is rational and Wle’’^) > 0.) This ensures that the root and leaf are modified to be PR with FQ = HQ and Bl = A[. In other words, for all FBs, PR is obeyed and the synthesis filter corresponding to a given analysis filter D is D. Thus both the root and leaf have been modified to be orthonormal rational FBs. ACKNOWLEDGEMENT Work supported in parts by the National Science Foundation Grant MIP 0703755 and ONR Grant N00014-99-1-1002. REFERENCES [1] S. Akkarakara n and P.P. Vaidyanathan, New results and open problems on nonuni› form filter banks, in Proc. IEEE ICASSP, Phoenix, AZ, Mar. 1999. [2] T. Chen and L. Qiu, General multirate building structures with appUcation to nonuni› form filter banks, IEEE Trans. Ckts. Syst.-II, 45 (1998), 948-958. [3] S. Dasgupta and A. Pandharipande , On biorthogonal nonuniform filter banks, preprint. [4] I. Djokovic and P.P. Vaidyanathan, Results on biorthogonal filter banks, Appl. Comp. Harmonic Anal., 1 (1994), 329-343. [5] P.-Q. Hoang and P.P. Vaidyanathan, Non-uniform multirate filter banks: Theory and design, in Proc. IEEE ISCAS, Portland, Oregon, May 1989, pp.371-374. [6] J. Kovacevic and M. VetterU, Perfect reconstruction filter banks with rational sampUng factors, IEEE Trans. Sig. Proc, 41 (1993), 2047-2066. [7] J. Li, T.Q. Nguyen, and S. Tantaratana , A simple design method for near-perfectreconstruction nonuniform filter banks, IEEE Trans. Sig. Proc, 45 (1997), 21052109. [8] K. Nayebi, T.P. Barnwell,III, and M. Smith, Nonuniform filter banks: A reconstruc› tion and design theory, IEEE Trans. Sig. Proc, 41 (1993), 1114-1127. [9] R.G. Shenoy, Multirate specifications via alias-component matrices, IEEE Trans. Ckts. Syst.-II, 45 (1998), 314-320. [10] A.K. Soman and P.P. Vaidyanathan, On orthonormal wavelets and paraunitary filter banks, IEEE Trans. Sig. Proc, 41 (1993), 1170-1183. [11] P.P. Vaidyanathan, Multirate Systems and Filter Banks, Englewood CUffs, NJ: PrenticeHall, 1993. [12] M. Vetterli and J. Kovacevic, Wavelets and Subband Coding, Englewood Cliffs, NJ: Prentice-Hall, 1995.
This Page Intentionally Left Blank
INDEX B-splines cardinal, 151 QR factorization, 236 0 ( n ), 225, 230, 231, 236, 241, 246 S L i ( n , R ), 225, 230, 235, 247 £ quasi-norm, 226 £P norm, 226, 236, 237, 247 GL(n,R), 230, 232, 239, 244, 247 G L a ( n , R ) , 2 30 (VMR), 152
computed tomography , 135 curvature, 133 decimation integer, 262 maximal, 262 decimator tree structured, 270 delay-chains, 267 differential entropy, 227 dilation equation, 109 Discontinuity Separation Property, 118 discrete Shamnon entropy, 232 dual approximate, 157
, 106, 109 energy potential, 134 AC alias cancellation, 263 matrix test, 280 anisotropy scaling relation for curves, 81 antipodally-symmetrized , 40
edge artifacts, 105 Edge Effects, 21 edge-detection, 96 efficient representations, 82 EM expectation-maximization , 136 energy prior, 133 ENO, 105 ENO-wavelet transform, 105 entropy, 230, 237-239, 242, 247
basis dictionary, 225, 231 Bayesian reconstruction, 133 Beamlab, 36 Bessel family, 155 best basis, 225 best sparsifying basis (BSB), 227, 230, 246 biorthogonal bases, 66 biorthogonal wavelets, 109 biorthogonal windowed Fourier bases, 65 brushlets, 61
Fast Fourier Transform pseudopolar, 32 fast slant stack, 48 fast wavelet transform, 110 FB FIR, 267 M-channel nonuniform , 258 nonuniform filter bank, 257 rational, 267 uniform-tree, 270
channel decimation rates, 257 coarse level extrapolation, 112 coding, 123 complex wavelets, 63 cortex transform, 90 covariance matrix, 234 CT
303
304 FFT pseudopolar, 35, 48 filter, 109 filter bank directional (DFB), 88 pyramidal directional (PDFB), 89 FIO tiling, 33 FIR, 266 folding, 67 Fourier polar approax:h, 47 Fourier domain pseudopolar, 32 Fourier transform pseudopolar, 19 frame, 1 bounds, 90 tight, 90, 97, 155 tight directional wavelet of L’^{B?), 96 frames sibling, 155 spline-wavelet tight, 150 frequency domain tiling, 33 Gabor transforms, 32 Gaussian random noise, 128 GGMRF generalized Gaussian MRF, 137 Gibbs phenomena, 27 Gibbs’ phenomenon, 105, 106 GMRF Gaussian Markov random field , 134 group afline, 213 co-affine, 215 HalfDome, 17 high fi-equency, 106 Hilbert pair, 63 Householder reflection, 231 Householder reflector, 233, 241, 245, 247 ICD, 140 ICD method, 137
INDEX image compression, 123 independent component analysis (ICA), 228, 247 inter-orthogonal , 168 Karhunen-Loeve basis, 229, 234 kinetic energy total, 134 Lapleician pyramid, 85 least square extrapolation, 112 least statistically-dependent basis (LSDB), 228, 231, 232, 241, 246 log-likelihood, 134 LPTV(L) linear periodically time varying with period L, 263 LTI linear time invariant, 261 MAP maximum a posteriori , 135 marginal distribution, 227, 247 maximally decimated, 258 mechanical image model, 138 minimization log-posterior, 133 MLE maximum likelihood estimate, 136 morphological principle, 135 MRI magnetic resonant imaging, 135 multiresolution, 105 mutual information, 227, 241, 245, 247 non-Gaussian, 230, 248 non-linear approximation (NLA), 82 ortho-ridgelets, 16 paraconjugate, 261 paraunitary, 266 periodized folding, 67 pixtron, 134 polynomial extrapolation, 112 polyphase, 262 positron emission tomography (PET), 134
305
INDEX potential energy, 134 PR perfect reconstruction, 257 PRFB delay-chain, 257 principle least action, 133 prior Gaussian Markov random field , 133 pseudo-Radon plane, 32 pyramidal directional filter banks, 83 QFB, 88 quantization, 123 quasi-FIO tiling, 44 Radon domain digital, 49 Radon Isometry, 40 adjoint, 40 Radon transform, 135 relative entropy, 227 ridgelet orthonormal, 1 transform, 1, 83 ridgelet domain, 19 Ridgelet Packet domain, 51 ridgelet packets, 43 ridgelet tiling, 33 ridgelets orthonormal, 31 Riesz bases, 66 Riesz bounds, 66 sampling bandpass, 268 scaling relation anisotropy , 83 signal denoising, 128 sparse representation, 81 sparsity, 226, 233, 245, 246 SPECT single-photon emission computed tomography , 134 spike process, 228, 237, 239, 246 spline multiresolution analysis, 154 standard basis, 230, 231, 233, 239, 241, 246, 248
statistical independence, 227, 233, 246 steerable pyramid, 90 system affine, 215 co-affine, 215 discrete affine, 217 discrete co-affine, 217 quasi-affine, 217 thresholding, 107, 126 tiling digital ridgelet, 16 ridgelet, 16 tilings ’FIO’, 33 ’wavelet-like’, 33 total variation, 107 uncertainty, 238 unfolding, 67 unitary extension principle (UEP), 151 unitary matrix extension criterion, 151 vanishing moment recovery functions (VMR), 149 vanishing moments, 109 of order m, 152 variational principles, 107 Walsh basis, 231, 235, 244 wavelet, 105 Meyer, 35, 45 periodized Meyer, 35 wavelet coefficients, 106, 110 wavelet equation, 109 wavelet frame, 155 wavelet packets, 45 wavelet tiling, 33 wavelet transform, 106 2-D discrete (DWT2), 99 wavelets discrete, 213 Meyer, 17 periodized, 17 Wilson-like basis, 45
This Page Intentionally Left Blank