ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 97
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES/ Laboratoire d’Optique Elec...
13 downloads
604 Views
18MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 97
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES/ Laboratoire d’Optique Electronique du Centre National de la Recherche Scientifique Toulouse, France
ASSOCIATE EDITORS
BENJAMIN W A N Xerox Corporation Palo Alto Research Center Palo Alto, California
TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdom
Advances in
Imaging and Electron Physics EDITED BY PETER W. HAWKES CEMES / Laboratoire d’Optique Electronique du Centre National de la Recherche Scientijique Toulouse, France
VOLUME 97
ACADEMIC PRESS San Diego London Boston New York Sydney Tokyo Toronto
This book is printed on acid-free paper. @ Copyright 0 1996 by ACADEMIC PRESS All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
Academic Press, Inc. 525 B Street, Suite 1900, San Diego, California 92101-4495. USA http://www.apnet.com
Academic Press Limited 24-28 Oval Road, London NWl 7DX, UK http ://www.hbuk.co.uk/ap/ International Standard Serial Number: 1076-5670 International Standard Book Number: 0-12-014739-4 PRINTED IN THE UNITED STATES OF AMERJCA 97 9 8 9 9 00 01 BC 9 8 7 6 5
96
4
3 2
1
CONTENTS CONTRIBUTORS ...................................... PREFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Image Representation with Gabor Wavelets and Its Applications NAVARRO. ANTONIO TABERNERO. AND GABRIEL CRIST6BAL I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. Joint Space-Frequency Representations and Wavelets . . . . . I11. Gabor Schemes of Representation . . . . . . . . . . . . . . . . . . . IV. Vision Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Image Coding. Enhancement. and Reconstruction . . . . . . . . VI . Image Analysis and Machine Vision . . . . . . . . . . . . . . . . . . VII . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix xi
RAFAEL
2 8 19 37 50 61 75 79
Models and Algorithms for Edge-Preserving Image Reconstruction L. BEDINI.I . GERACE. E.SALERNO. AND A . TONAZZINI I. Introduction ................................... 86 I1. Inverse Problem. Image Reconstruction. 94 and Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I11. Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 IV. Image Models and Markov Random Fields . . . . . . . . . . . . 104 118 V . Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI . Constraining an Implicit Line Process . . . . . . . . . . . . . . . . 129 VII . Determining the Free Parameters . . . . . . . . . . . . . . . . . . . 141 VIII . Some Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 IX. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Y
vi
CONTENTS
Successive Approximation Wavelet Vector Quantization for Image and Video Coding E. A. B. DA SILVAAND D. G. SAMPSON I. Introduction . . . . . , . . . . . . . . . . . . . . . . . . . , . . . . . . . . 11. Wavelets . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. Successive Approximation Quantization . . . . . . , . . . . , . . IV. Successive Approximation Wavelet Lattice Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Application to Image and Video Coding . . . . . . . . . . . , . . VI. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I. 11.
111. IV.
Quantum Theory of the Optics of Charged Particles R. JAGANNATHANAND S. A. KHAN Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scalar Theory of Charged-Particle Wave Optics . . . . . . . . . Spinor Theory of Charged-Particle Wave Optics . . . . . . . . Concluding Remarks . . . . . . . . . . . . . . . . , . . . . , . . . . . . References . . . . . , , . . . . . . . . . . . . . . . . , . . . . . . . . . . .
Ultrahigh-Order Canonical Aberration Calculation and Integration Transformation in Rotationally Symmetric Magnetic and Electrostatic Lenses JIYE XIMEN I. Introduction . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . 11. Power-Series Expansions for Hamiltonian Functions and Eikonals in Magnetic Lenses . . . . . . . , . . . . . . . . . . . . , . 111. Generalized Integration Transformation on Eikonals Independent of ( r X p ) in Magnetic Lenses . , . . . . . . . . . . IV. Canonical Aberrations up to the Ninth-Order Approximation in Magnetic Lenses . , . , . . . , . . . . . . . . . . V. Generalized Integration Transformation on Eikonals Associated with ( r X p ) in Magnetic Lenses . , . . . . . . . . . . VI. Eikonal Integration Transformation in Glaser’s Bell-Shaped Magnetic Field . , . . . . . . . . . . . . . . . . . . . , . VII. Generalized Integration Transformation on Eikonals in Electrostatic Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . .
191 195 205
221 226 252 253
257 259 322 336 356
360 361 369 381 389 393 396
CONTENTS
VIII. Conclusion References
................................... ...................................
vii 403 407
Erratum and Addendum for Physical Information and the Derivation of Electron Physics B. ROY FRIEDEN409
INDEX
...........................................
413
This Page Intentionally Left Blank
CONTRIBUTORS Numbers in parentheses indicate the pages on which the authors’ contributions begin.
L. BEDINI(85), CNR Istituto di Elaborazione della Informazione, 1-56126 Pisa, Italy GABRIELCRIST6BAL (l), Daza de ValdCs (CSIC), Instituto de Optica, 28006 Madrid, Spain EDUARDOA. B. DA SILVA(190, Depto de Electronica, Universidade Federal do Rio de Janeiro, Cep 21945-970 Rio de Janeiro, Brazil B. ROY FRIEDEN(4091, Optical Sciences Center, University of Arizona, Tucson, Arizona 85721 I. GERACE(85), CNR Istituto di Elaborazione della Informazione, 1-56126 Pisa, Italy (2571, Institute of Mathematical Sciences, CIT Campus, R. JAGANNATHAN Theramani, Madras 600113, India S. A. KHAN (2571, Institute of Mathematical Sciences, CIT Campus, Theramani, Madras 600113, India
RAFAELNAVARRO (l), Daza de ValdCs (CSIC), Instituto de Optica, 28006 Madrid, Spain E. SALERNO (85), CNR Istituto di Elaborazione della Informazione, 1-56126 Pisa, Italy DEMITRIOS G. SAMPSON (1911, Zographou, Athens 15772, Greece
ANTONIO TABERNERO (l), Facultad de Informbtica, Universidad PolitCcnica de Madrid, 28660 Madrid, Spain ANNATONAZZINI (85), CNR Istituto di Elaborazione della Informazione, 1-56126 Pisa, Italy JIYEXIMEN(359), Department of Radio Electronics, Peking University, Beijing 100871, People’s Republic of China ix
This Page Intentionally Left Blank
PREFACE
This volume contains three contributions from image science and two from electron optics. It concludes with an erratum and addendum to the chapter by Frieden that appeared in volume 90 (1995). Although it is not usual to publish errata in this serial, for the simple reason that readers are not likely to be aware of subsequent corrections, I have made an exception here because of the importance and wide-ranging nature of the work reported by Frieden. I am convinced that his ideas will be recognized by our successors as a major advance in theoretical physics and it therefore seemed reasonable to ensure that they are expressed correctly here. Two chapters examine different aspects of wavelets. R. Navarro, A. Tabernero, and G. Cristdbal describe image representation using Gabor wavelets, with sections on vision modeling, coding, enhancement and reconstruction, and reconstruction and on analysis and machine vision. E. A. B. da Silva and D. G. Sampson discuss successive approximation wavelet vector quantization for image and video coding, a most interesting use of wavelets of great practical importance. The chapter on image science, by L. Bedini, I. Gerace, E. Salerno, and A. Tonazzini, deals with a very common problem in image processing: How can images be restored without suppressing small features of interest, notably edges? This question raises deep and difficult questions of regularization, which we meet in most ill-posed problems. The authors analyze these and discuss in detail some ways of solving them. The chapter, by R. Jagannathan and S. A. Khan, is really a complete monograph on a little-studied question, namely the development of electron optics when the spin of the electron is not neglected. Generally, electron optics is developed from the everyday Schrodinger equation, as though the electron had no spin; although this is certainly justified in virtually all practical situations, it is intellectually frustrating that this approximation does not emerge as a special case of a more general theory based on the Dirac equation. This study goes a long way toward remedying this situation and I am delighted to include it here. We conclude with a shorter chapter by J.-Y. Ximen, whose work has already appeared as a supplement to this serial. This is concerned with higher order aberrations of electron lenses. xi
xii
PREFACE
I am most grateful to all these authors for the work and time they have devoted to their contributions and I conclude as usual with a list of forthcoming contributions. Peter W. Hawkes
FORTHCOMING CONTRIBUTIONS Nanofabrication Finite-element methods for eddy-current problems Use of the hypermatrix Image processing with signal dependent noise The Wigner distribution Hexagon-based image processing Microscopic imaging with mass-selected secondary ions Modern map methods for particle optics Cadmium selenide field-effect transistors and display ODE methods Electron microscopy in mineralogy and geology Electron-beam deflection in color cathode-ray tubes Fuzzy morphology The study of dynamic phenomena in solids using field emission Gabor filters and texture analysis Miniaturization in electron optics Liquid metal ion sources The critical-voltage effect Stack filtering Median filters
H. Ahmed and W. Chen R. Albanese and G. Rubinacci D. Antzoulatos H. H. Arsenault M. J. Bastiaans S. B. M. Bell M. T. Bernius M. Berz and colleagues T. P. Brody, A. van Calster, and J. F. Farrell J. C. Butcher P. E. Champness B. Dasgupta E. R. Dougherty and D. Sinha M. Drechsler J. M. H. Du Buf A. Feinerman R. G. Forbes A. Fox M. Gabbouj N. C. Gallagher and E. Coyle
PREFACE
Quantitative particle modeling Structural analysis of quasicrystals Formal polynomials for image processing Contrast transfer and crystal images Morphological scale-spaces Optical interconnects Surface relief Spin-polarized SEM Sideband imaging The recursive dyadic Green’s function for ferrite circulators Near-field optical imaging Vector transformation SAGCM InP/InGaAs avalanche photodiodes for optical fiber communications SEM image processing Electron holography and Lorentz microscopy of magnetic materials Electron holography of electrostatic fields The dual de Broglie wave Electronic tools in parapsychology Phase-space treatment of photon beams Aspects of mirror electron microscopy The imaging plate and its applications Representation of image operators Z-contrast in materials science HDTV The wave-particle dualism Electron holography Space-variant image restoration X-ray microscopy Accelerator mass spectroscopy Applications of mathematical morphology
xiii
D. Greenspan (vol. 98) K. Hiraga A. Imiya K. Ishizuka P. Jackway (vol. 98) M. A. Karim and K. M. Iftekharuddin J. J. Koenderink and A. J. van Doorn K. Koike W. Krakow C. M. Krowne (vol. 98) A. Lewis W. Li C. L. F. Ma, M. J. Deen, and L. E. Tarof N. C. MacDonald M. Mankos, M. R. Scheinfein, and J. M. Cowley (vol. 98) G. Mattcucci, G. F. Missiroli, and G. Pozzi M. Molski R. L. Morris G. Nemes S. Nepijko T. Oikawa and N. Mori (vol. 99) B. Olstad S. J. Pennycook E. Petajan H. Rauch D. Saldin A. de Santis G. Schmahl J. P. F. Sellschop J. Serra
xiv
PREFACE
Set-theoretic methods in image processing Focus-deflection systems and their applications Mosaic color filters for imaging devices
New developments in ferroelectrics Electron gun optics Very high resolution electron microscopy Morphology on graphs
M. I. Sezan T. Soma T. Sugiura, K. Masui, K. Yamamoto, and M. Tni J. Toulouse Y. Uchikawa D. van Dyck L. Vincent
.
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOL. 97
Image Representation with Gabor Wavelets and Its Applications RAFAEL NAVARRO Instituto de Optica “Daza de Vald6s” (CSlC). Serrano 121. 28006 Madrid. Spain
ANTONIO TABERNERO Facultad de Informritica. Universidad Politkcnica de Madrid. Boadilla del Monte. 28660 Madrid. Spain
and GABRIEL CRISTOBAL Instituto de Optica “Daza de Valdis” (CSIC). Serrano 121. 28006 Madrid. Spain
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. Joint Space-Frequency Representations and Wavelets A . Joint Representations. Wigner Distribution. Spectrogram. and Block Transforms B. Wavelets C . Multiresolution Pyramids D . Vision-Oriented Models 111. Gabor Schemes of Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . A . Exact Gabor Expansion for a Continuous Signal B. Gabor Expansion of Discrete Signals C. Quasicomplete Gabor Transform . . . . . . . . . . . . . . . . . . . . . . . . . . IV . Vision Modeling A . Image Representation in the Visual Cortex B. Gabor Functions and the RFs of Cortical Cells C. Sampling in the Human Visual System V . Image Coding. Enhancement. and Reconstruction A . Image Coding and Compression B. Image Enhancement and Reconstruction VI . Image Analysis and Machine Vision A . EdgeDetection B. TextureAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C . MotionAnalysis D . Stereo VII . Conclusion References
2
................. 8 .................................... 8 ........................................ 11 ............................... 13 ............................... 16 .................. ........................
...................................... .................... .................. ....................... .................. ........................... ...................... .......................... .................................... ....................................
.......................................... ......................................... ......................................... 1
19 23 30 34 37 37 41 45 50 50 54 61 63 64 72 74 75 79
Copyright Q 1996 by Academic Press. Inc . All rights of reproduction in any form reserved.
2
RAFAEL NAVARRO ET AL.
I. INTRODUCTION In image analysis and processing, there is a classical choice between spatial and frequency domain representations. The former, consisting of a two-dimensional (2D) array of pixels, is the standard way to represent discrete images. This is the typical format used for acquisition and display, but it is also common for storage and processing. Space representations appear in a natural way, and they are important for shape analysis, object localization, and description (either photometric or morphologic) of the scene. There is much processing that can be done in the space domain: histogram modification, pixel and neighbor operations, and many others. On the other hand, there are many tasks that we can perform in the Fourier (spatial frequency) domain in a more natural way, such as filtering and correlations. These two representations have very important complementary advantages that we often have to combine when developing practical applications. An interesting example is our own visual system, which has to perform a variety of complex tasks in real time and in parallel, processing nonstationary signals. Figure 1 (redrawn from Bartelt ef al., 1980) illustrates this problem with a simple nonstationary 1D temporal signal. (It is straightforward to extend the following discussion to 2D images or even 3D signals, such as image sequences.) The four panels show different ways of representing a signal corresponding to two consecutive musical notes. The upper left panel shows the signal as we would see it when displayed in an oscilloscope. Here we can appreciate the temporal evolution, namely, the periodic oscillations of the wave, and the transition from one note to the next. Although this representation is complete, it is hard with a simple glimpse to say much about the exact frequencies of the notes. The Fourier spectrum (upper right) provides an accurate global description of the frequency content of the signal, but it does not tell us much about the timing and order of the notes. Despite the fact that either of these two descriptions may be very useful for sound engineers, the music performer would rather prefer a stave (bottom left) of a musical score, that is, a conjoint representation in time ( t axis) and frequency (log Y axis). The Wigner distribution function (Wigner, 1932), bottom right, provides a complete mathematical description of the joint time-frequency domain (Jacobson and Wechsler, 19881, but at the cost of very high redundancy (doubling the dimension of the signal). Regular sampling of a signal with N elements in the spatial (or frequency) domain will require N 2 samples in the conjoint domain defined by the Wigner distribution to be stored and analyzed. Although this high degree of redundancy may be necessary in
IMAGE REPRESENTATION WITH GABOR WAVELETS
3
+-
FREQUENCY REPRESENTATION
SIGNAL
~
.._ .._.__....._._...
W(t.U)
I-
t MUStCAL SCORE
lI l -
q
w
WIGNER DISTRIBUTION FUNCTION
FIGURE1. Four different descriptions of the same signal: time domain (upper left); frequency domain (upper right); conjoint: stave (lower left) and Wigner distribution function (lower right). Reprinted with permission from Bartelt el al., The Wigner distribution function and its optical production, Optics Cornm. 32, 32-38. Copyright 1980, Elsevier Sci. Ltd., The Boulevard, Langford Lane, Kidlington OX5 IGB, UK.
some especially difficult problems (Cristobal et al., 1990, such an expensive redundancy cannot be afforded in general, particularly in vision and image processing tasks (2D or 3D signals). The musician will prefer a conjoint but compact (and meaningful) code, like the stave: only two samples (notes) are required to represent the signal in the example of Fig. 1. Such kind of conjoint but compact codes is more likely to be found in biology, combining usefulness with maximum economy. A possible approach to building a representation with these advantages is to optimally sample the conjoint domain trying to diminish redundancy without losing information. The uncertainty principle tells us that there exists a limit for joint (space-frequency) localization (Gabor, 1946; Daugman, 19851, that is, if we apply fine sampling in the space (or time) domain, then apply coarse frequency sampling and vice versa. The uncertainty product limits the minimum area for sampling the conjoint domain. Gabor, in his Theory of Communication (19461, observed that Gaussian wave packets (Gabor wavelets or Gabor functions) minimize such conjoint
4
RAFAEL NAVARRO ET AL.
Time
-
Time
-
FIGURE2. Two ways of sampling the conjoint time-frequency domain, with sampling units having constant area: homogeneous (left); adapting the aspect ratio to the spatialfrequency band (right).
uncertainty, being optimal sampling units, or logons, of the conjoint domain. The left panel of Fig. 2 shows the “classical” way of homogeneously sampling this 2D space-frequency conjoint domain. The right panel represents smarter sampling, as the one used in wavelet or multiscale pyramid representations and presumably by our own visual system. Here, the sampling area is kept constant, but the aspect ratio of the sampling units changes from one frequency level to the next. This is smarter sampling because it takes into account that low-frequency features will tend to occupy a large temporal (or spatial) interval requiring rather coarse sampling, whereas high frequencies require fine temporal (or spatial) sampling. In both cases, the sampling density is very important. Critical sampling (Nyquist) will produce the minimum number of linearly independent elements ( N ) to have a complete representation of the signal; a lower sampling density will cause aliasing artifacts, whereas oversampling will produce a redundant representation (this will be further discussed later). One of the most exciting features of wavelet and similar representations is that they appear to be useful for almost every signal processing application (either acoustical, 1D; 2D images or 3D sequences), including the modeling of biological systems. However, despite several early developments of the basic theory (Haar, 1910), only in the 1980s were the first applications to image processing published. Wigner (1932) introduced a complete joint representation of the phase space in quantum mechanics; Gabor (1946) proposed Gaussian wave packets, logons or information quanta, for optimally packing information. Cohen (1966) developed a generalized framework for phase space distribution functions, showing that
IMAGE REPRESENTATION WITH GABOR WAVELETS
5
most of these conjoint image representations belong to a large class of bilinear distributions. Any given representation is obtained by choosing an appropriate kernel in the generalized distribution. Until recently, these theoretical developments were not accompanied by practical applications in signal processing. Apart from the availability of much cheaper and more powerful computers, several factors have accelerated this field in the 1980s and 1990s. On the one hand, Gabor functions were successfully applied to model the responses of simple cells in the brain's visual cortex, in both 1D (Marcklja, 1980) and 2D (Daugman, 1980). On the other hand, Bastiaans (1981) and Morlet et al. (1982) provided the theoretical basis for a practical implementation of the Gabor and other expansions. Further generalizations of the Gabor expansion (Daugman, 1988; Porat and Zeevi, 1988) and the development of wavelet theory (Grossman and Morlet, 1984; Meyer, 1988; Mallat, 1989b; Daubechies, 1990) have opened broad fields of applications. In particular, wavelet theory has constituted a unifying framework, merging ideas coming from mathematics, physics, and engineering. One of the most important applications has been to image coding and compression, because of its technological relevance. In fact, many conjoint schemes of representation, such as multiresolution pyramids (Burt and Adelson, 19831, or the discrete cosine transform (Rao, 1990) used in Joint Photographic Experts Group (JPEG) and Moving Picture Experts Group (MPEG) image and video standards, were specifically directed to image compression. A Gabor function, or Gaussian wave packet, is a complex exponential with a Gaussian modulation or envelope. From now on, we will use the variable t (time) for the 1D case and x , y for 2D (despite the fact that this review is mainly focused on 2D images, it is simpler and more convenient to use a 1D formulation that can be easily generalized to the 2D case). In one dimension, the mathematical expression of a Gabor function is glo.&)
= a exp[ - 4
t - t d 2 ] exp[ 4 w d +
411.
(1)
The two labels to, wo stand for the temporal and frequency localization or tuning. The parameter a determines the half-width of the Gaussian envelope, and 4 is the phase offset of the complex exponential. The most characteristic property of the Gabor functions is that they have the same mathematical expression in both domains. The Fourier transform of gl, q l( t ) will be
where c$' = wot, + (6. This property, which allows fast implementations in either the space or frequency domain, along with their optimal localization
6
RAFAEL NAVARRO ET AL.
(Gabor, 1946), will yield a series of interesting applications. Moreover, by changing a single parameter, the bandwidth a, we can continuously shift the time-frequency, or in 2D the space/spatial-frequency localization, from one domain to the other. For instance, visual models (as well as those for most applications) use fine spatial sampling (high localization) and coarse sampling of the spatial-frequency domain (see Section IV). In addition to the two, space (or time) and Fourier, possible computer implementations (Navarro and Tabernero, 1990, Bastiaans (1982) proposed a parallel optical generation of the Gabor expansion. Subsequently, several authors (Freysz et al., 1990; Li and Zhang, 1992; Sheng et al., 1992) reported optical implementations. In the two-dimensional case, it is common to use Cartesian spatial coordinates but polar coordinates for the spatial-frequency domain: gXO.YOJ".~O
=
exp{i[2.rrf,(xcos 8, +ysin 8,)
+ +])gauss(x -xg,y
-yo) ( 3a)
where the Gaussian envelope has the form gauss(x,y) =aexp(-aa2[(xcos8, +ysinO,)* +y2(xsin 8, -ycos 8 , ) ' ] ) .
(3b)
The four labels xg, yo, fo, 8,, stand for the spatial and frequency localization. The parameters a and y define the bandwidth and aspect ratio of the Gaussian envelope, respectively (we have restricted the Gaussian to have its principal axis along the B0 direction); 4 is again the phase offset. Apart from the interesting properties mentioned previously, Gaussian wave packets (or wavelets), GWs, also have some drawbacks. Their lack of orthogonality makes the computation of the expansion coefficients difficult. A possible solution is to find a biorthogonal companion basis that facilitates the computation of the coefficients for the exact reconstruction of the signal (Bastiaans, 1981). This solution is computationally expensive, and the interpolating biorthogonal functions can have a rather complicated shape. Several practical solutions for finding the expansion coefficients have been proposed, such as the use of a relaxation network (Daugman, 1988). By oversampling the signal to some degree, we can obtain dual functions more similar in shape to the Gabor basis (Daubechies, 1990). The redundancy inherent in oversampling is, of course, a bad property for coding and compression applications. However, for control systems, redundancy and lack of orthogonality are desirable properties that are necessary for robustness. Biological vision (and sensory systems in general) lacks orthogonality, producing a redundancy that is highly expen-
IMAGE REPRESENTATION WITH GABOR WAVELETS
7
sive, this being the price of robustness. The use of redundant sampling permits us to design quasicomplete Gabor representations (Navarro and Tabernero, 1991) that are simple, robust, and fast to implement, providing reconstructions with a high signal-to-noise ratio (SNR) and high visual quality. A minor drawback is that Gabor functions are not pure passband, which is a basic requirement for being an admissible wavelet (but their DC response is very small anyway-less than 0.002 for a 2D, one octave bandwidth Gabor function). These drawbacks have motivated the search for other basis functions, orthogonal when possible. This, along with the wide range (still increasing) of applications and the merging of ideas from different fields, has produced the appearance of many different schemes of image representation in the literature (we will review the most representative schemes in Section 11, before focusing on GWs in Section 111). Almost every author seems to have a favorite scheme and basis function, depending on his or her area of interest, personal background, etc. In our case, there are several reasons why GWs (Gabor functions) constitute our favorite basis for image representation. Apart from optimal joint localization (as pointed out by Gabor), good behavior of Gaussians, and robustness, perhaps the most interesting property is that they probably have the broadest field of application. For a given application (for example, coding, edge detection, motion analysis) one can find and implement an optimal basis function. For instance, Canny (1986) has shown that Gaussian derivatives are optimal for edge detection in noisy environments. Gabor functions are probably not optimal for most applications, but they perform well in almost all cases and in most of them are even nearly optimal. This can be explained intuitively in terms of the central limit theorem (Papoulis, 19891, i.e., that the cumulative convolution of many different kernels will result in a Gaussian convolution. The following is not a rigorous but only intuitive discussion: The good fit obtained with GWs to the responses of cortical neurons could be, roughly speaking, a consequence of the central limit theorem in the sense that from the retina to the primary visual cortex, there is a series of successive neural networks. In a rough linear approach, we can realize each neural layer as a discrete convolution. Thus, the global effect would be approximately equivalent to a single Gaussian channel. Although this idea is far from having a rigorous demonstration, it has been applied to the implementation of multiscale Gabor filtering (Rao and Ben-Arie, 1993). On the other hand, with the central limit theorem in mind, one could tend to think that when trying to optimize a basis function (a filter) for many different tasks simultaneously, the resulting filter could tend to show a Gaussian envelope.
8
RAFAEL NAVARRO ET A L
The field of application of GWs and similar schemes of image representation is huge and continuously increasing. They are highly useful in almost every problem of image processing, coding, enhancement, and analysis and low to mid-level vision (including modeling biological vision). Moreover, multiscale and wavelet representations have provided important breakthroughs in image understanding and analysis. Furthermore, Gabor functions are a widely used tool for visual testing in psychophysical and physiological studies. Gaussian envelopes are very common in grating stimuli to measure contrast sensitivity, to study shape, texture, motion perception, (Caelli and Moraglia, 1985; Sagi, 1990; Geri et al., 1995; Watson and Turano, 19951, or modeling brightness perceptioon (du Buf, 1995). Although these applications are beyond the scope of this review, we want to mention them because of their increasing relevance. All these facts suggest that GWs are especially suitable for building general-purpose environments for image processing, analysis, and artificial vision systems. Here, we have classified the most relevant applications in three groups: modeling of early processing in the human visual system in Section IV; applications to image coding, enhancement, and reconstruction in Section V; and applications to image analysis and machine vision in Section VI. Prior to these applications, we review the main conjoint image representations in Section 11, and then Section I11 specifically treats Gabor representations. 11. JOINT SPACE-FREQUENCY REPRESENTATIONS AND WAVELETS A. Joint Representations, Wiper Distribution, Spectrogram, and
Block Transforms Stationary signals or processes are statistically invariant over space or time (e.g., white noise or sinusoids), and thus we can apply a global description or analysis to them (e.g., Fourier transform). As in the example of Fig. 1, an image composed of several differently textured objects will be nonstationary. Images can also be affected by nonstationary processes. For instance, optical defocus will produce a spatially invariant blur in the case of a flat object that is perpendicular to the optical axis of the camera. However, in the 3D world, defocus will vary with the distance from the object to the camera, and hence it will be nonstationary in general. The result is a spatially variant blur that we cannot describe as a conventional convolution. Spatially variant signals and processes can be better characterized by conjoint time-frequency or space/spatial frequency representations.
IMAGE REPRESENTATION WITH GABOR WAVELETS
9
1. Wigner Distribution Function Wigner (1932) introduced a bilinear distribution as a conjoint representation of the phase space in quantum mechanics. Later, Ville (1948) derived the same (Wigner or Wigner-Ville) distribution in the field of signal processing. As we have mentioned before, we will be using the variable t for the 1D case (equivalent expressions can be derived for the 2D spatial domain or higher dimensions). For a continuous and integrable signal f ( t ) , the symmetric Wigner distribution (WD) is given by (Claasen and Mecklenbrauker, 1980)
where s is the integrating variable, w the is frequency variable, and f* stands for the complex conjugate of f. The WD belongs to the Cohen class of bilinear distributions (Cohen, 1966), in which each member is obtained by introducing a particular kernel, +( 5,a),in the generalized distribution (Jacobson and Wechsler, 1988). These bilinear distributions C ( t , w ) , can be expressed as the 2D Fourier transform of weighted versions of the ambiguity function:
where A( 5,a) is the ambiguity function
The Wigner distribution, because of its bilinear definition, contains crossterms, complicating its interpretation, especially in pattern recognition applications. 2. Complex Spectrogram Another way to obtain a conjoint representation is through the complex spectrogram, which can be expressed as a windowed Fourier transform: F(t,w)
=
j
m
w ( s - t)f(s)e-'wsds --m
(7)
where w ( s ) is the window that introduces localization in time (or space). The signal can be recovered from the complex spectrogram by the inversion formula (Helstrom, 1966):
10
RAFAEL NAVARRO ET AL.
The Wigner-Ville distribution can be considered as a particular case of the complex spectrogram, where the shifting window is the signal itself (complex conjugated). Both the spectrogram and the Wigner-Ville distribution belong to the Cohen class (with kernels 4 = W,(t, w ) and 4 = 1, respectively), are conjoint, complete, and invertible representations, but at the cost of high redundancy. When the window w ( s ) is a Gaussian, we can make a simple change, calling g,,,(s)
=
w ( s - t)eiws.
(9)
Then g,, ,(s) is a Gabor function, and Eq. (7)becomes
Therefore, we can obtain the “gaussian” complex spectrogram at any given point ( t , w ) as the inner product between the signal f and a localized Gabor function. The decomposition of a signal into its projections on a set of displaced and modulated versions of a kernel function appears in quantum optics and other areas of physics. The elements of the set {g,, ,(s)) are the coherent states associated with the Weyl-Heisenberg group that sample the phase space ( t , 01. The spectrogram of Eq. (10) provides information about the energy content of the signal at ( t , w ) , because the inner product captures similarities between the signal f and the “probe” function gt,,that is localized in the joint domain. To recover the signal in the continuous case, we rewrite Eq. (8) as
The window function does not need to be Gaussian in general. However, as we said in the Introduction, Gabor functions have the advantage of maximum joint localization; i.e., they achieve the lower bound of the joint uncertainty. This has also been demonstrated in the 2D case for separable Gabor functions (Daugman, 1985). Signal uncertainty is commonly defined in terms of the variances of the marginal energy distributions associated with the signal and its Fourier transform. An alternative definition of infomutionul uncertainty (Leipnik, 1959) has been introduced in terms of the entropy of the joint density function. Interestingly, Leipnik (1960) found that Gabor functions (among others) are entropy-minimizing signals. [See Stork and Wilson (1990) for a more recent discussion of alternative metrics or measures of joint localization.]
IMAGE REPRESENTATION WITH GABOR WAVELETS
11
3. Block Transforms Both the WD and the complex spectrogram involve high redundancy and permit exact recovery of the signal in the continuous case. In practical signal processing applications, we have to work with a discrete number of samples. In the Fourier transform, the complex exponentials constitute the basis functions, in both the continuous and discrete cases. For the latter case, signal recovery is guaranteed for band-limited signals with a sampling frequency greater than or equal to the Nyquist frequency. The W D also permits signal recovery in the discrete case (Claasen and Mecklenbrauker, 1980). In the case of the discrete spectrogram, with a discrete number of windows, image reconstruction is guaranteed only under certain conditions (this will be discussed in Section 111). When looking for a complete but compact discrete joint image representation, one can think of dividing the signal into nonoverlapping blocks and independently processing each block (contrary to the case of overlapping continuously shifted windows). Each block is a localized (in space, time, etc.) portion of the signal. Then if we apply an invertible transform to each block, we will be able to recover the signal whenever the set of blocks is complete. This is the origin of a series of block transforms, of which the discrete cosine transform (DCT) is the most representative example (Rao, 1990). Current standards for image and video compression are based on the DCT. However, the sharp discontinuities between image blocks may produce ringing and other artifacts after quantization, specially at low-bit-rate transmission, that are visually annoying. We can eliminate these artifacts by duplicating the number of blocks, in what is called the lapped orthogonal transform (LOT) (Malvar, 1989). This is a typical example of oversampling, which generates a linear dependence (redundancy) that improves robustness (this is discussed further in Section 111). We will see later that if we apply a blocklike decomposition in the Fourier domain, we can obtain a multiscale or multiresolution transform. In block transforms orthogonality is guaranteed, but there is not a good joint localization.
B. Wavelets In wavelet theory, the signal is represented with a set of basis functions that sample the conjoint domain time-frequency (or space/spatialfrequency), providing a local frequency representation with a resolution matched to each scale, so that
12
RAFAEL NAVARRO ET AL.
where are the basis functions and ci are the coefficients that constitute the representation in that basis. The key idea of a wavelet transform is that the basis functions are obtained by translations and dilations of a unique wavelet. A wavelet transform can be viewed as a decomposition into a set of frequency channels having the same bandwidth on a logarithmic scale. The application of wavelets to signal and image processing is recent (Mallat, 1989b; Daubechies, 19901, but their mathematical origins date back in 1910, with the Haar (1910) orthogonal basis functions. After Gabor's seminal Theory of Communication (1946), wavelets and similar ideas were used in solving differential equations, harmonic analysis, theory of coherent states, computer graphics, engineering applications, etc. [See, for instance, Chui (1992a, 1992b), Daubechies (19921, Meyer (1993), and Fournier (1994) for reviews on wavelets.] Grossman and Morlet (1984) introduced the name wavelet (continuous case) in the context of geophysics. Then the idea of multiresolution analysis was incorporated along with a systematic theoretical background (Meyer, 1988, 1993; Mallat, 1989b). In the continuous 1D case, the general expression of a wavelet basis function is
where the translation and dilation coefficients ( b and a, respectively) of the basic function vary continuously. In electrical engineering, this is called a "constant" Q resonant analysis. The continuous wavelet transform W of a function f E L2(%),i.e., square integrable, is
The basis function q must satisfy the admissibility condition of finite energy (Mallat, 1989a). This implies that its Fourier transform is pure bandpass having a zero DC response @(O) = 0. Thus, the function q must oscillate above and below zero as a wave packet, which is the origin of the name wavelet. The wavelet transform (WT) has a series of important properties. We list only a few of them. The WT is an isometry, up to a proportional coefficient, L 2 ( % )+ L2(9ti'X 8)(Grossman and Morlet, 1984). It can be discretized by sampling both the scale (frequency) and position (space or time) parameters as shown in Fig. 2b. Another property is that wavelets easily characterize local regularity, which is interesting in texture analysis. In the discrete case, more interesting in signal processing, there exist
IMAGE REPRESENTATION WITH GABOR WAVELETS
13
necessary and sufficient conditions that the basis functions have to meet so that the WT has an inverse (Daubechies, 1992). A specially interesting class of discrete basis functions is orthogonal wavelets. A large class of orthogonal wavelets can be related to quadrature mirror filters (Mallat, 1989b). There are important desirable properties of wavelets that are not fully compatible with orthogonality, namely, small (or finite at least) spatial support, linear phase (symmetry), and smoothness. This last property is very important in signal representation to avoid annoying artifacts, such as ringing and aliasing. The mathematical description of smoothness has been made in terms of the number of vanishing moments (Meyer, 1993), which determines the convergence rate or wavelet approximation to a smooth function. Finite impulse response (small support) is necessary for having spatial localization. Among these desirable features, orthogonality is a very restrictive condition that may be relaxed to meet other important properties, such as better joint localization. In particular, the use of linearly dependent (redundant) biorthogonal basis functions (Daubechies, 1990) makes it possible to meet smoothness, symmetry, and localization requirements while keeping most of the interesting properties derived from orthogonality.
C. Multiresolution Pyramids Multiresolution pyramids are a different approach to joint representations (Burt and Adelson, 1983). The basic idea is similar to that of the block transforms but applied to the frequency domain. Let (W;(w)}be a set of windows that completely cover the Fourier domain, i.e., C W & w ) = 1. Then we can decompose the Fourier transform F of the signal in a series of bands so that f(t)
=
xfi:(t)= i
1 2T
F ( w ) [ W ; ( o ) e ’ ” ‘ ]d w . --m
Here we have represented the signal as the sum of filtered versions, f$), one for each window (band). This produces a representation that is localized in space (or time) and frequency (depending on the width of the window). The product within the bracket is a sort of Fourier (complex) wavelet that forms a complete basis. The set of windows {W;(w)}can be implemented as a bank of filters. Mallat (1989a) has shown that there exists a one-to-one correspondence between the coefficients of a wavelet expansion and those of multiresolution pyramid representations, as illustrated in Fig. 3. This is done through a mother wavelet and a scaling function 4. Figure 4 shows an example of a scaling function in both spatial
14
RAFAEL NAVARRO ET AL.
t
Sliding Window g(t 1
FIGURE3. The Fourier-windowed transform (STn3 as a filter bank. If the window is a Gaussian the modulated filter bank produces a Gabor transform. The output of this bank can be plotted on a joint diagram as in Fig. 2. The entries in any column represent the DFT of the corresponding batch of data. Each row represents the contribution to each harmonic from the bank filter. Redrawn by permission from Rioul and Vetterli, Wavelets and signal processing. IEEE Signal Proc. Mug. 8, 14-38. Copyright 1991 IEEE.
and Fourier domains as well as its associated wavelet function, also in both domains. The basic idea is to split the signal into its lower and higher frequency components. One of the main applications of multiresolution representations is in coding and compression, in which each frequency band is sampled to achieve a maximum rate of compression. The name pyramid comes from the fact that the sampling rate depends on the bandwidth of each particular subband (Tanimoto and Pavlidis, 1975). Therefore, if we put the samples of each band on top of the previous one we obtain a pyramid. There are basically two different strategies for sampling. Critical sampling is used to eliminate redundancy so that the conjoint representation has no more samples than the original signal. Although we can obtain higher rates of compression with critical sampling, it has an important cost. Namely, we end up with a representation that is not robust (losing a single sample will cause very disturbing effects) and that is not translational invariant (i.e., a small displacement of the signal will produce a representation that is completely different), which preclude its application to vision (Simoncelli et al., 1992). In some applications, it is possible to solve the translation dependence by a circular shift of the data (Coifman and Donoho, 1995). However, a much more robust representation is obtained by Nyquist
15
IMAGE REPRESENTATION WITH GABOR WAVELETS t
1
0
-5
0.4
0.2
0
5
0
x
1, -10
-B
0
I
10
w
L
w
4. Example of a scding function 4 ( x ) (upper left) and its transfer function I&W) (lower left), along with the impulse response of the associated wavelet filter $ ( X I (upper right) and its Fourier transform $(o)(lower right). Redrawn by permission from Mallat, A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Patt. Anal. Machine Intell. 11, 674-693. Copyright 1989 IEEE. FIGURE
sampling of each band, i.e., taking samples with a frequency double the maximum frequency present in the band. The result will be a shiftable and robust multiscale transform, at the cost of some redundancy. One practical problem is that of designing filters with a finite impulse response, simultaneously having good frequency resolution. One solution is to use quadrature mirror filters consisting of couples of low pass and high pass that are in phase quadrature (Esteban and Galand, 1977). This constitutes an orthogonal basis that permits obtaining good localization in both domains, avoiding aliasing artifacts, and obtaining an exact reconstruction of the signal.
16
RAFAEL NAVARRO ET AL.
RGURE5. Original image (a); wavelet transform pyramid with biorthogonal basis functions (b); recovered image (c) after thresholded coefficients (d).
The extension to 2D (for application to image processing) of most of the analysis done above in 1D is straightforward. Figure 5 shows an example of a multiscale wavelet transform (b) of a woman’s portrait (a), including the application to compression: after thresholding the coefficients as explained in Section V,A (d) and image recovered (c) from (d). D. Viwn-Oriented Models
One striking fact about joint multiscale representations and wavelets is that a similar representation has been found in the human visual system
IMAGE REPRESENTATION WITH GABOR WAVELETS
17
(see Section IV). Marr (1982) and co-workers established the basis for the modern theory of computational vision defining the primal sketch. It consisted of detecting edges (abrupt changes in the gray levels of images) by applying a Laplacian of a Gaussian operator and then extracting the zero crossings. This is done at different scales (resolutions). Using scaled versions of this operator, Burt and Adelson (1983) constructed the Laplacian pyramid. Each layer is constructed by duplicating the size of the Laplacian operator, so that both the peak frequency an the bandwidth are divided by 2. In their particular pyramid implementation, they first obtained low pass-filtered versions of the image using Gaussian filters, then subtracted the results from the previous version. Then they subsampled the low pass-filtered version and repeated the process several times. Consequently, the Nyquist sampling of low pass-filtered versions of the image gives (1/2)* less samples, producing the pyramid scheme. This yields an overcomplete representation with 4/3 more coefficients than the original image. One important experimental finding in human vision is orientation selectivity, which is not captured by the Laplacian pyramid. Consequently, Daugman (1980) used 2D Gabor functions (GFs) to fit experimental data, and Watson (1983) implemented a computational model of visual image representation with GFs. By sampling the frequency domain in a lossless and polar-separable way, Watson (1987a) introduced an oriented pyramid called the cortex transform that permitted a complete representation of the image. The filters, four orientations by four frequencies plus low-pass and high-pass residuals, are constructed in the Fourier domain as the product of a circularly symmetric dom filter with an orientation selectivity fan filter (see Fig. 6a). The impulse response of the cortex filter (Fig. 6b) roughly resembles a 2D Gabor function with ringing artifacts. Marr (1982), Young (1985, 1987), and others have proposed Gaussian derivatives (GDs) as an alternative to Gabor functions for modeling the receptive fields of simple cortical cells. Figure 7 shows the four first derivatives in lD, and their frequency responses, Go, G,, G,, and G,, respectively correspond to the Gaussian and its first, second, and third derivatives. Cauchy filters (Klein and Levi, 1985) or even Hermite polynomials with a Gaussian envelope (Martens, 1990a, 1990b) have also been used but to a much smaller extent. Gabor functions turn out to be a particular case of Hermite polynomials when the degree of the polynomial tends to infinity. GDs are commonly used in the literatiire as an alternative to Gabor functions, having very similar properties but with the additional advantage of being pure bandpass (i.e., meeting the admissibility condition of wavelets), but at the cost of lower flexibility, i.e., fixed orientations, etc. (GDs are orthogonal only when centered on a fiycd origin of coordinates,
18
RAFAEL NAVARRO ET AL.
FIGURE6. Construction of a cortex filter in the frequency domain: (a) dom filter; (b) fan filter; (c) the cortex filter as the product of a dom and a fan filter; (d) the spatial impulse response of a cortex filter resembling Gabor function. Reprinted by permission from Watson. The cortex transform: rapid computation of simulated neural images. Comp. W .G ~ p h . Image h c . 39,311-327. Copyright 1987 Academic Press, Orlando, FL.
but under translation they lose their orthogonality). To solve this problem, steerable filters can be synthesized in any arbitrary orientation as a linear combination of a set of basis filters (Freeman and Adelson, 1991). Figure 8 shows examples of steerable filters constructed from the second derivatives of a Gaussian, G,, and their quadrature pairs H,. Figure 9 illustrates the design of steerable filters in the Fourier domain. Based on steerable filters, Simoncelli et al. (1992) have proposed a shiftable multiscale transform. Perona (1995) has developed a method for generating deformable kernels to model early vision. Trying to improve the biological plausibility of spatial sampling, Watson and Ahumada (1989) proposed a hexagonal-oriented quadrature pyramid,
IMAGE REPRESENTATION WITH GABOR WAVELETS
19
FIGURE 7. Gaussian derivative wavelets (left) along with their frequency responses (right). g o , . . . ,g, correspond to a Gaussian and the first, second, and third derivatives, respectively.
with basis functions that are orthogonal, self-similar, and localized in space and spatial frequency. However, this scheme has some unrealistic features such as multiple orientation selectivity. In summary, a large variety of schemes of image representation have appeared in different fields of application, including vision modeling. In particular, Fig. 10 shows 1D profiles and frequency responses for Gabor functions with different frequency tuning. We have mentioned Gabor functions briefly in this section, but we will give a detailed analysis next. For a thorough comparative evaluation and optimal filter design for several of the more used decomposition techniques see Akansu and Haddad (1992). 111. GABORSCHEMES OF REPRESENTATION
To introduce Gabor schemes of representation, let us consider the question of reconstructing a signal from a sampled version of the complex spectrogram (Section 11,A). It was shown [Eq. (1011 that a sample of the spectrogram at time t and frequency w could be seen as the projection of
20
RAFAEL NAVARRO ET AL.
FIGURE8. G, and H, quadrature pair basis filters (rows a and d) that span the space of all rotations of their respective filters. G, and H2 have the same frequency response (rows b and e) but a 90" shifted phase (quadrature). Rows (c) and (f) show equivalent x-y separable basis functions. Reprinted by permission from Freeman and Adelson. The design and use of steerable filters for image analysis, enhancement, and wavelet representation. IEEE Trans. Pan. Anul. Mach. Intell. 13,891-906. Copyright 1991 IEEE.
the signal onto a modulated and displaced version of the window, g,, Js). Instead of a continuum, we now have only a discrete set of functions: { g n m ( s ) )= { g n T , m w ( s )= } (w(s - nT)eimwS},with n , m integers,
( 16) that sample the joint domain at points (nT,mW). Recovering the signal from the sampled spectrogram is equivalent to reconstruct f(s> from its
21
IMAGE REPRESENTATION WITH GABOR WAVELETS
C
d
f
e
FIGURE9. Design of a steerable digital filter in the frequency domain. (a) The desired radial frequency distribution; (b) the corresponding angularly symmetric 2D frequency response obtained through frequency transformation. The resulting responses of the four steerable filters (c)-(f) are obtained by multiplying by cos3(v- Oil. Reprinted by permission from Freeman and Adelson. The design and use of steerable filters for image analysis, enhancement, and wavelet representation. IEEE Trans. Pan. Anal. Mach. Infell. 13,891-906, Fig. 6, p. 895. Copyright 1991 IEEE.
projections on that set, that is, with a summation on the indexes (n,rn> instead of a double integral in t and o.Another related problem would be to express f(s) as a linear combination of the set of functions {gnm(s)}:
In the continuous case, Eq. (11) provides us with the answer to both questions, as it uses both the projections (f,g,, and the functions g,, to recover the signal. One could say that in that case, the same set of functions is used for the analysis (obtaining the projections) and synthesis (regenerating the signal). As we will see, that is not true, in general, when one counts only on a discrete number of projections. In that case, expressing f(s) as an expansion of a set of functions may constitute a problem different from using the projections of f(s) onto that set to recover it. These two problems are closely related, as we will see in Section II1,A. The Gabor expansion arises when the basis functions gnm(s)in Eq. (17) are obtained by displacements and modulations of a Gaussian window function [ w ( s ) ,in Eq. (16)]. Gabor (1946) based his choice of the window on the fact that the Gaussian has minimal support in the joint domain. Later, Daugman (1985) showed that this was also the case in 2D.
>,
22
RAFAEL NAVARRO ET AL.
o.:m
0 -1
-5
0
5
-05
0
5
FIGURE10. One-dimensional Gabor functions (left) and their frequency responses (right). The peak frequencies (O,f,, 2fl, and 4fl) and bandwidths correspond to a multiscale logarithmic scheme.
There are many possibilities when designing a Gabor expansion. Apart from choosing the width of the Gaussian envelope (which determines the resolution in both domains) and the phase of the complex exponential, the key issue is to decide the sampling intervals T (time or space) and W (frequency) that govern the degree of overlap of the “sampling” function g,,(s). Intuitively, it seems clear that a sampling lattice too sparse ( T , W large) will not allow exact reconstruction of the signal. The original choice of Gabor (1946) was TW = 2a,which corresponds to the Nyquist density. This is the minimum required to preserve all the information and, therefore, is called the critical sampling case. Schemes with TW < 2a correspond to oversampling. For a fixed 7” we can continuously vary the ratio T / W depending on whether we want more resolution in one or the other domain. The main problem of this expansion is the lack of orthogonality of the Gabor functions, which makes the computation of the expansion coefficients an, difficult. The task is trivial when the set {g,,,(s)) is orthogonal, because in that case the coefficient an, are the projections onto the same set of functions; that is, the analysis and synthesis windows are the same.
IMAGE REPRESENTATION WITH GABOR WAVELETS
23
For example, an orthogonal set is generated if the window function is a rectangular pulse as in block transforms. However, that window is not well localized in the frequency domain (as opposed to Gaussian windows), and therefore the coefficient anmmay capture components of the signal far from the desired frequency rn W. Unfortunately, this is a general drawback of orthogonal sets. The Low-Balian theorem (see Daubechies, 1990) states that no orthogonal set of functions can be generated from a window that is well localized in both domains. Therefore, as we mentioned before, joint localization and orthogonality of the set of functions are properties that cannot be met simultaneously. Much work has been done to overcome the problem of the lack of orthogonality of the Gabor functions, developing efficient ways to compute or approximate the expansion coefficients anm. This will be the subject of most of this section. A. Exact Gabor Expansionfor a ContinuousSignal Here we shall follow the theoretical formulation of Daubechies (1990) and review the main approaches to solving the continuous case. For this purpose, we will introduce the so-called biorthogonal functions (Bastiaans, 1981,1985; Porat and Zeevi, 1988) and the related Zak transform (Eizinger, 1988; Zeevi and Gertner, 1992; Bastiaans, 1994). We delay until the next subsection the discussion of the discrete case, where the computation of the coefficients is transformed into solving a linear system of (many) equations. For simplicity we restrict the discussion to the 1D case. All integrals and summatories are in Z unless otherwise stated. Following Daubechies, given a set of coherent states [displaced and modulated versions of a seed function, Eq. (1611, which we will call { q j ( s ) } (for simplicity we consider just an index), we define an operator T that maps a function f(s) in L2 (square integrable functions) into a sequence formed by its projections on the coherent states: T ( f ( s ) ) = {(f,*i)} ( 18) and the corresponding operator T* ,which reverses the process, mapping a sequence of coefficients {cJ into a function
T*((cj}) =
ciqj(s). i
Now if we define T as (T*T), this new operator maps L2 into L2. T computes the projections of f(s) into the set ( q i ( s ) } , and T* reconstructs a function g(s) from the resulting sequence. However, in general g(s) # f(s>; i.e., the operator T is not the identity. Consequently, trying to regenerate a function from its projections will not always recover the original signal.
24
RAFAEL NAVARRO ET AL.
This is, in operator notation, the already known fact that in general we cannot compute the coefficients of the Gabor expansion by simply calculating the inner products, as the set of Gabor functions is not orthogonal. To be able to reconstruct the signal, apart from T being a one-to-one map, in practice, stability is also required. This means that if two signals g(s) and f(s) are similar, their sequences should be close too. Mathematically we want Allf1I2 I
cl 0, B < m,
(20)
so that if Ilf - gll + 0, the sum of the squared differences of the projections should also tend to zero. The foregoing condition can be expressed using operator notation as A1 IT IBI,
(21)
with I the identity matrix. A set of functions that generates an operator T complying with the foregoing conditions is said to form a frame (Duffin and Schaeffer, 1952). The constants A , B are called frame bounds and determine some important properties. A frame can be seen as a generalization of the concept of a linear basis in a Hilbert space, being able to generate the space, but leaving, in general, “too many” vectors. An irreducible frame will be a basis with linearly independent elements; otherwise, the frame is redundant with elements that are not linearly independent. There are two advantages in using redundant frames. First, redundant frames are not orthogonal, and as we mentioned before (Low-Balian theorem), relaxing the orthogonality condition permits elements with better localization. Second, the linear dependence of the elements of a redundant frame implies robustness, in the sense that the combination of elements can “do the work” of another element that is lost, destroyed, etc. Orthogonal bases are a particular case of nonredundant, linearly independent frames whose functions present bad localization properties. Using T, we can construct a dual set of functions that also constitutes a frame as +;(s)
=
T-’*;(s).
(22)
The dual frame is very useful because if we denote the dual operator as ?, then
?f= I
I
.
,
.
.
.
.
.
.
,
.
.
,
.
.
.
.
.
,
,
#
*
.
*
,
.
~
f
f
.
.
.
.
.
.
.
.
.
,
.
.
.
.
.
.
-
.
//lit?...............,. 8
0
1
1
r
*
.
.
.
.
.
.
. . . . . . . .
........................ ......................
*
.* .*... .. .. . ( . ~ . . ( . l . . . . . l. .. . . . . . . . . .-.... .....*. .. \ \\ (\ \\ \. .~ (. .. .. .. -. .. .. ... .. . . ... .. .. .. .. .. .. ......
,
C
l
f
t
.
(
.
t
.
.
I
.
.
\
.
\
.
\
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
_
.
.
1
error (d): difference between (b) and (c). From Heeger. Model for the extraction of image flow. J . Opt. SOC. Am. A 4, 1455-1471. Copyright 1987 Optical Society of America. Reprinted by permission.
78
RAFAEL NAVARRO ET AL.
FIGURE39. Decoding of a random-dot stereogram by a cooperative algorithm. The stereogram appears at the top. The algorithm gradually reveals the structure through a few iterations: 0, 1, 2, 3, 4, 5, 6, 8, and 14. The different shades of gray represent different disparity values. From Marr. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Freeman, New York. Copyright 1982 W. H. Freeman and Company. Used with permission of W.H. Freeman and Co.
some requirements (completeness, orthogonality) permits us to design quasicomplete redundant schemes which may constitute an interesting alternative for real-time implementations. There is a number of open issues that will certainly forward new research in the field. Apart from the choice of the basis functions and the sampling strategies (log-polar, Cartesian, etc.), we want to remark on two of them here. On the one hand there
IMAGE REPRESENTATION WITH GABOR WAVELETS
79
is still a lack of experimental data about important aspects of the visual representation. For instance, the finding of couples of neurons in phase quadrature (Pollen and Ronner, 1983) still lacks further confirmation. On the other hand, we still need to develop widely accepted metrics and standards to evaluate objectively the quality of images and representations. These metrics must consider both how the visual system will respond to the image under test and which is the visually relevant information contained in that image.
ACKNOWLEDGMENTS This work has been partially supported by the Spanish CICYT under grant TIC94-0849. We especially thank Dr. John Daugman for a critical revision of the manuscript and Oscar Nestares and Javier Portilla for their kind collaboration in preparing several figures and graphics.
REFERENCES Adelson, E. H.,and Bergen, J. R. (1985). J. Opt. SOC. Am. A 2, 284-299. Ahumada, A., and Tabernero, A. (1992). OSA Annual Meeting Technical Digest 23, 130-131. Akansu, A. N., and Haddad, R. A. (1992). “Multiresolution Signal Decomposition.” Academic Press, Boston. Anderson, P. (1992). Wavelet transforms and image compression. MsSci. thesis, Chalmers University of Technology, Goteborg, Sweden. Antonini, M., Barlaud, M., Mathieu, P., and Daubechies, I. (1992). IEEE Trans. Image Process. 1, 205-220. Bargmann, V . , Buttera, P., Girardello, L., and Klauder, J. R. (1971). Rep. Math. Phys. 2, 221 -228. Bartelt, H. O., Brenner, K. H., and Lohmann, A. W. (1980). Opt. Comm. 32, 32-38. Bastiaans, M. J. (1981). Opt. Engineer. 20, 594-598. Bastiaans, M. J. (1982). Opt. Acra 29, 1349-1357. Bastiaans, M. J. (1985). IEEE Trans. ASSP 33, 868-873. Bastiaans, M. J. (1994). Appl. Opf.33,5241-5255. Battiti, R., Amaldi, E., and Koch, C. (1991). Int. J. Comput. Vision 6, 133-145. Bhatia, M., Karl,, W. C., and Willsky, A. S. (1993). Proc. SPIE 2034,58-69. Blanc-Fkraud, L., Charbonnier, P., Lobel, P., and Barlaud, M. (1994). h c . IEEE ICASSP, pp. 491-494, Adelaide, Australia. Bovik, A. C., Clark, M., and Geisler, W. S. (1990). IEEE Trans. PAMI 12, 55-73. Braccini, C. Gambardella, G., Sandini, G., and Tagliasco, V. (1982). Biol. Cyber. 44, 47-58. Bradley, J . N., and Brislawn, C. M. (1993). Proc. SPIE 1961, 293-304. Braithwaite, R. N., and Beddoes, M. P. (1992). IEEE Trans. Image Process. 1, 243-234. Burt, P. L., and Adelson, E. H. (1983). IEEE Trans. Comm. 31, 532-540. Caelli, T., and Moraglia, G. (1985). Vision Res. 25, 671-684. Campbell, F. W., and Robson, J. G. (1968). J . Physiol. (Lond.) 197, 551-566.
80
RAFAEL NAVARRO ET AL.
Campbell, F. W., Cooper, G. F., and Enroth-Cugell, C. (1969). J. Physiol. (Lond.) 203, 223-235. Canny, J. (1986). IEEE Trans. PAM1 8, 679-698. Chui, C. K., Ed. (1992a). “An Introduction to Wavelets.” Academic Press, San Diego, CA. Chui, C. K., Ed. (1992b). “Wavelets: A Tutorial in Theory and Applications.” Academic Press, San Diego, CA. Claasen, T. A. C. M., and Mecklenbrauker, W. F. G. (1980). Parts I, 11, and 111. PhillipsJ. Res. 35,211-250, 276-300,372-398. Clark, M., and Bovik, A. C. (1989). Putt. Recogn. 22, 707-717. Cohen, L. (1966). J . Math. Phys. 7, 781-786. Coifman, R. R., and Donoho, D. L. (1995). In “Wavelets and Statistics” (A. Antoniadis, Ed.). Springer-Verlag, New York. Cristbbal, G., and Navarro, R. (1994a). Putt. Recog. Lett. 15, 273-277. Cristbbal, G., and Navarro, R. (1994b). Proc. IEEE-Sp Symposium on Time-Frequency TimeScale Analysis, pp. 306-309, Philadelphia, PA. Cristbbal, G., Gonzalo, C., and B e d s , J. (1991). I n “Advances in Electronics and Electron Physics” (P. W. Hawkes, Ed.), Vol. 80, pp. 309-404. Academic Press, San Diego, CA. Curcio, C. A., and Allen, K. A. (1990). J. Comp. Neurol. 300,5-25. Curcio, C. A., Sloan. K. R., Jr., Kalina, R. E., and Hendrikson, A. E. (1990). J. Comp. Neurol. 292, 497-523. Daubechies, I. (1990). IEEE Trans. Inform. Theory 36, 961-1005. Daubechies, I. (1992). “Ten Lectures on Wavelets.” SIAM, Philadelphia, PA. Daubechies, I., Grossman, A., and Meyer, Y. (1986). J . Math. Phys. 27, 1271-1283. Daugman, J. G. (1980). Ksion Res. 20, 847-856. Daugman, J. G. (1984). Viwn Res. 24,891-910. Daugman, J. G. (1985). J . Opt. SOC. A m . A 2, 1160-1169. Daugman, J. G. (1987). J. Opt. SOC.A m . A 5, 1142-1148. Daugman, J. G. (1988). IEEE Trans. ASSP 36, 1169-1179. Daugman, J. G. (1993). IEEE Tram. PAM1 15, 1148-1161. U.S.Patent 5,291,560 (March 1, 1994). Daugman, J. G., and Downing, C. J. (1995). J. Opt. SOC. A m . A 12, 641-660. DeValois, R. L., Albrecht, D. G., and Thorell, L. G. (1982a). Viion Res. 22,545-559. DeValois, R. L., Yund, E. D., and Hepler, N. (1982b). V O n Res. 22, 531-544. Devore, R. A., Jawerth, B., and Lucier. B. J. (1992). IEEE Trans. lnf. Theory 38, 719-746. Djamdji, J. P., and Bijaoui, A. (1995). IEEE Trans. Geosci. Remote Sensing 33, 67-76. Donoho, D. (1995). IEEE Trans. Info. Theory 41, 613-627. du Buf, J. M. H. (1990). S i g n a l h e s s . 21, 221-240. du Buf, J. M. H.,and Heitkamper, P. (1991). Signal Process. 23, 227-244. du Buf, J. M. H., and Fischer, S. (1995). Opt. Engineer 34, 1900-1911. Duffin, R. J., and Schaeffer, A. C. (1952). Tmns. Am. Math. SOC.72, 341-366. Dunn, D., and Huggins, W. E. (1994). IEEE Trans. 16, 130-149. Ebrahimi, T., and Kunt, M. (1991). Opt. Engineer. 30,873-880. Ebrahimi, T., Reed, T. R., and Kunt, M. (1990). Signal Process. 5, 769-772. Ehlers, M. (1991). ISPRS J. Photogrammetry Rem. Sensing 46, 19-30. Eizinger, P. D. (1988). Elec. Lett. 24, 810-811. Eizinger, P. D., Raz, S.,and Farkash, S. (1989). Elec. Letr. 25, 80-82. Enroth-Cugell, C., and Robson, J. G. (1966). J. Physiol. (Lond.) 187, 517-552. Esteban, D., and Galand, C. (1977). Proc. Intnl. Conf. on Acoust. Speech and Signal Proc. ICASSP, pp. 191-195, Washington, D.C. Fahnestock, J. D., and Schowenderdt, R. A. (1983). Opt. Engineer, 22, 378-381.
IMAGE REPRESENTATION WITH GABOR WAVELETS
81
Field, D. J. (1987). J. Opt. SOC.A m . A 4, 2379-2394. Fogel, I., and Sagi, D. (1989). Biol. Cyber. 61, 103-113. Fournier, A., Ed. (1994). Wavelets and their applications in computer graphics. IGGRAPH’94 Course Notes, University of British Columbia. Freeman, W. T., and Adelson, E. H. (1991). IEEE Trans. PAMI 13, 891-906. Freysz, E., Pouligny, E., Argoul, F., and Ameodo, A. (1990). Phys. Rev. Lett. 64, 745-748. Froment, J., and Mallat, S. (1992). In “Wavelets: A tutorial in Theory and Applications” (C. K. Chui, Ed.), pp. 655-678, Academic Press, San Diego, CA. Gabor, D. (1946). J. Ins?. Electr. Eng. 93, 429-457. Gabor, D. (1965). Lab. hues?. 14, 801-807. Genossar, T., and Porat, M. (1992). IEEE Trans. Systems, Man Qbemetics 22, 449-460. Geri, G., Lyon, D. R., and Zeevi, Y. Y.(1995). Vision Res. 35,495-506. Gonzdlez, R. C. (1986). In “Handbook of Pattern Recognition and Image Processing” (T. Y. Young and K. S. Fu, Eds.), pp. 191-213, Academic Press, San Diego, CA. Greenspan, H., Goodman, R., Chellappa, R., and Anderson, C. H. (1994). IEEE Trans. PAMI 16, 894-901. Gross, M. H., and Koch, R. (1995). IEEE Trans. Viualization Computer Graphics 1, 44-59. Grossman, A., and Morlet, J. (1984). SL4M J. Math. 15, 723-736. Haar, A. (1910). Math. Ann. 69, 331-371. Haralick, R. M., and Shapiro, L. G. (1992). “Computer and Robot Vision,” Vol. I. AddisonWesley, Reading, MA. Haralick, R. M., and Shapiro, L. G. (1993). “Computer and Robot Vision,” Vol. 11. AddisonWesley, Reading, MA. Hawken, M. J., and Parker, A. J. (1987). Proc. R . SOC. Lond. B 231, 251-288. Hebb, D. 0. (1949). “The Organization of Behaviour.” John Wiley & Sons, New York. Heeger, D., and Pentland, A. P. (1986). IEEE Proc. Workshop on Motion: Representation and Anafysis, pp. 131-136, Charleston, SC. Heeger, D. J. (1987). J. Op?. SOC. Am. A 4, 1455-1471. Heitger, F., Rosenthaler, L., von der Heydt, R., Peterhans, E., and Kubler, 0. (1992). Viion Res. 32, 963-981. Helstrom, C. W. (1966). IEEE Trans. lnf. Theory. 12, 81-82. Hilton, M. L., Jawerth, B. O., and Sengupta, A. (1994). Mulrimedia Sys?ems 2, 218-227. Horn, B. K. P., and Schunk, B. G. (1981). Artif. Inrell. 17, 185-203. Hubel, D. H., and Wiesel, T. N. (1962). J. Physiol. (Lond.) 160, 106-154. Hummel, R., and Marriot, R. (1989). IEEE Trans. ASSP. 37, 2111-2130. Jacobson, L. D., and Wechsler, H. (1988). Signal Process. 14, 37-68. Jiihne, B. (1991). “Digital Image Processing.” Springer-Verlag, Berlin. Jain, A. K. (1989). “Fundamentals of Digital Image Processing,” Prentice Hall, Englewood Cliffs, NJ. Jain, A. K., and Bhattachajee, S. K. (1992). Pa??. Recog. 25, 1459-1477. Jain, A. K., and Farrokhinia, F. (1991). Pa??.Recog. 24, 1167-1186. Jones, J., and Palmer, L. (1987). J. Neurophyswl. 58, 1233-1258. Julesz, B., Gilbert, E. N., Sheep, L. A., and Frisch, H. L.(1973). Perception 2, 391-405. Kelly, D. H. (1990). Proc. SPIE 1249, 90-117. Klein, S. A., and Levi, D. M. (1985). J. Op?. Soc. Am. A 2, 1170-1190. Kritikos, H. N., and Farnum, P. T. (1987). IEEE Trans. System, Xan, Qbemerics 17, 978-981. Kulikowski, J. J., MarEelja, S., and Bishop, P. 0. (1982). Biol. Cybet. 43, 187-198. Landy, M. S., and Movshon, J. A., Eds. (1991). “Computational Models of Visual Processing.” MIT Press, Cambridge, MA.
82
RAFAEL NAVARRO ET AL.
Lang, M., Guo, H., Odegard, J. E., and Burrus, C. S. (1995). Proc. SPIE 2491, 640-651. Lau, P., Papanikolopoulos, N. P., and Boley, D. (1993). Elec. Left. 29, 2182-2183. Leipnik, R. (1959). Information Control 2, 64-79. Leipnik, R. (1960). Information Control 3, 18-25. Levingston, M., and Hubel, D. (1988). Science 240, 740-749. Lewis, A. S., and Knowles, G. (1992). IEEE Trans. Image Process. 1, 244-250. Li, H., Manjunath, B. S., and Mitra, S. K. (1995). Gruph. Models Image Process. 57, 235-245. Li, Y., and Zhang, Y. (1992). Opt. Engineer. 31, 1865-1885. Lindenbaum, M., Fischer, M., and Bruckstein, A. (1994). Putt. Recog. 27, 1-8. Lourens, T., Petkov, N., and Kruizinga, P. (1994). Fututv Gener. Syst. 10, 351-358. MacLeod, 1. D. G., and Rosenfeld, A. (1974). Vision Res. 14, 909-915. Maffei, L., and Fiorentini, A. (1973). Vision Res. 13, 1255-1267. Malik, J., and Perona, P. (1990). J . Opr. SOC.A m . A 7, 923-932. Mallat, S. G. (1989a). IEEE Trans. ASSP 37, 2091-2110. Mallat, S. G. (1989b). IEEE Trans. PAMI 11, 674-693. Mallat, S. G., and Hwang, W. L. (1992). IEEE Tmns. Info. Theury 38, 617-643. Mallat, S. G., and Zhong, S. (1992). IEEE Trans. PAMI 14, 710-732. Malvar, H. S. (1989). IEEE Trans. ASSP 37,553-559. MarEelja, S . (1980). 1. Opr. SOC.Am. 70, 1297-1300. Marr, D. (1982). “Vision: A Computational Investigation into the Human Representation and Processing of Visual Information.” Freeman, New York. Marr, D., and Hildreth, E. C. (1980). Proc. R. Sac. Lond. B 207, 187-217. Marr, D., and Poggio, T. (1976). Science 194, 283-287. Marr, D., and Poggio, T. (1979). Proc. R. Sac. Lond. B 204, 301-328. Martens, J. B. (1990a). IEEE Trans. ASSP 38, 1595-1606. Martens, J. B. (1990b). IEEE Trans. ASSP 38, 1607-1618. Martinez-Uriegas, E., Peters, J. D., and Crane, H. D. (1993). Proc. SPIE 1913, 462-472. Mehrotra, R., Namuduri, K. R., and Ranganathan, N. (1992). Putt. Recog. 25, 1479-1494. Meyer, Y.(1988). “Ondelettes et OpCrateurs.” Hermann, Paris. Meyer, Y. (1993). “Wavelets, Algorithms and Applications.” SIAM, Philadelphia, PA. Miller, K. D. (1990). In “Neuroscience & Connectionist Theory” (M.A. Cluck and D. E. Rumnelhart, Eds.), pp. 267-353, Lawrence Erlbaum Associates, Hillsdale, NJ. Miller, K. D., Keller, J. B., and Stryker, M. P. (1989). Science 245, 605-615. Morlet, J., Forgeau, I., and Giard, D. (1982). Geophysics 47, 203-236. Movhson, J. A., Thompson, 1. D., and Tolhurst, D. J. (1978a). J. Physiol. (Lond.) 283, 53-77. Movhson, J. A., Thompson, I. D., and Tolhurst, D. J. (1978b). J . Physiol. (Lond.) 283, 79-99. Movhson, J. A,, Thompson, I. D., and Tolhurst, D. J. (1978~).J . Physiol. (Lond.) 283, 101-120. Navarro, R., and Tabernero, A. (1991). Multidim. Sys. Signal Process. 2, 421-436. Navarro, R., Santamaria, J., and Gbmez, R. (1987). Asiron. Astrophys. 174, 344-351. Portilla, J., and Tabernero, A. (1995). lnstituto de Optica (CSIC), Navarro, R., Nestares, 0.. Technical Report no. 51, Madrid, Spain. Nill, N. B., and Bouzas, B. H. (1992). Opt. Engineer. 31, 813-825. Olson, T., and DeStefano, J. (1994). IEEE Trans. Signal Process. 42, 2055-2067. Papoulis, A. (1989). “Probability, Random Variables and Stochastic Processes.” McGraw-Hill, New York. Pattison, T. R. (1992). Biol. Cyber. 67, 97-102. Peli, E. (1987). Opf. Engineer. 87, 655-660. Peli, E. (1990). J. Opf.Sac. Am. A 7, 2032-2040. Perona, P. (1995). IEEE Trans. Pan. Anal. Machine Intell. PAMI 17, 488-499.
IMAGE REPRESENTATION WITH GABOR WAVELETS
83
Peyrin, R., Zaim, M., and Goutte, R. (1993). J . Math. Imaging h i o n 3, 105-121. Pollen, D. A., and Ronner, S. F. (1983). IEEE Trans. Systems, Man, Cybernetics 13, 907-916. Porat, M., and Zeevi, Y. Y. (1988). IEEE Trans. PAM1 10, 452-468. Porat, M., and Zeevi, Y. Y. (1989). IEEE Trans. Biomed. Eng. 36, 115-129. Qian, S., and Chen, D. (1993). IEEE Trans. Signal Process. 41, 2429-2438. Qian, S., Chen, K, and Li, S. (1992). SignalProcess. 27, 177-185. Rao, K. R. (1990). “Discrete Cosine Transform: Algorithms, Advantages, Applications.” Academic Press, Boston. Rao, K. R., and Ben-Arie, J. (1993). Analog Integr. Circuit Signal Process. 4, 141-160. Rioul, 0.. and Vetterli, M. (1991). IEEE Signal Process. Mag. 8, 14-38. Rodieck, R. W. (1965). Vision Res. 5, 583-601. Rolls, E. T., and Cowey, A. (1970). Exp. Brain Res. 10, 298-310. Rovamo, J., Virsu, V., and Nasanen, R. (1978). Nature 271, 54-56. Sagi, D. (1990). Vl.Res. 30, 1377-1388. Sakkit, B., and Barlow, H. B. (1982). Biol. Cyber. 43,97-108. Sanger, T. D. (1989). Neural Networks 2, 459-473. Santamaria, J., and Gbmez, M. T. (1993). Proc. Annual Meeting of the European Opiical Society, EOS’93, pp. 97-98, Zaragoza, Spain. Shannon, C. E. (1948). Bell Syst. Tech. J . 27, 370-423, 623-656. Sheng, Y., Roberge, D., and Szu, H. H. (1992). Opt. Engineer. 31, 1840-1845. Shustorovich, A. (1994). Neural Networks 7, 1295-1301. Simoncelli, E. P. (1993). Distributed representation and analysis of visual motion. Ph.D. Thesis, MIT, Cambridge, MA. Simoncelli, E. P., and Adelson, E. H. (1990). In “Subband Image Coding” (J. W. Woods, Ed.), Chap. 4, Kluwer, Norwell, MA. Simoncelli, E. P., Freeman, W. T., Adelson, E. H., and Heeger, D. J. (1992). IEEE Trans. lnf. Theory 38,587-607. Stockman, T. G., Cannon, T. M., and Ingebretsen, R. G. (1975). Proc. IEEE 63,678-692. Stork, D. G., and Wilson, H. R. (1990). J. Opt. SOC.Am. A 7, 1362-1373. Stryker, M. P., and Harris, W. (1986). J. Neurosci. 6, 2117-2133. Super, B. J., and Bovik, A. C. (1991). J. V h a 1 Comm. Image Repres. 2, 114-128. Sutter, A., Beck, J., and Graham, N. (1989). Percept. Psychophys. 46,312-332. Tabernero, A., and Navarro, R. (1990). Annual meeting of the Optical Society of America Boston, Conference Digest, 25. Tabernero, A., and Navarro, R. (1993a). Perception 22 (Suppl.), 130-131. Tabernero, A., and Navarro, R. (1993b). In “Optics in Medicine, Biology and Environmental Research” (G. von Bally and S. Khanna, Eds.), Vol. 1, pp. 272-274, Elsevier, Amsterdam. Tanimoto, S., and Pavlidis, T. (1975). Comp. Graphics Image Process. 4, 104-119. Teo, P. C., and Heeger, D. (1994). Proc. SPlE 2179, 127-141. Thomas, J. P., and Gille, J. (1979). J. Opt. SOC. Am. 69, 652-660. Toet, A. (1992). Opt. Engineer. 31, 1026-1031. Turner, M. (1986). Biol. Cyber. 55, 71-82. Van Essen, D. C., Newsome, W. T., and Maunsell, J. H. R. (1984). hlion Res. 24, 429-448. Ville, J. (1948). Cables ef Transmission 2A,61-74. Wang, H., and Yan, H. (1992). Elec. Lett. 28, 1755-1756. Wang, H., and Yan, H. (1993). J. Elecf. Imaging 2,38-43. Watson, A. B. (1983). In “Physical and Biological Processing of Images” (A. C. Slade, Ed.), pp. 100-114, Springer-Verlag, Berlin. Watson, A. B. (1987a). Comp. &ion, Graph. Image Process. 39, 311-327. Watson, A. B. (1987b). J . Opt. SOC. Am. A 4, 2401-2417.
84
RAFAEL NAVARRO ET AL.
Watson, A. B. (1990). J. Opi. SOC. Am. A 7 , 1943-1954. Watson, A. B., Ed. (1993). “Digital Images and Human Vision.” MIT Press, Cambridge, MA. Watson, A. B., and Ahumada, J. A. J. (1985). J. Opi. SOC. Am. A 2, 322-341. Watson, A. B., and Ahumada, J. A. J. (1989). IEEE Trans. Bwmed. Eng. 36.97-106. Watson, A. B., and Turano, K. (1995). Viiion Res. 35,325-336. Weber, J., and Malik, J. (1995). h i . J . Comp. Vision 14, 67-81. Webster, M. A., and DeValois, R. L. (1985). J. Opi. SOC.Am. A 2, 1124-1132. Weiman, C. F. R., and Chaikin, G. (1979). Compuier Graphics Image Process. 11, 197-226. Wexler, J., and Raz, S. (1990). Signal Process. 21, 201-220. Wigner, E. (1932). Phys. Rev. 40, 749-759. Wilson, H. R., and Bergen, J. R. (1979). Viiion Res. 19, 19-32. Woods, J. W., and ONeil, S. D. (1986). IEEE Trans. ASSP 34, 1278-1288. Yao, J. (1993). IEEE Trans. Image Process. 2, 152-159. Young, R. A. (1985). General Motors Research Labs. Technical Report GMR-4920. Young, R. A. (1987). Spatial Viiwn 2,213-293. Young, R. A. (1993). General Motors Research Labs. Technical Report GMR-7878. Yuille, A. L., Kammen, D. M. (1989). Biol. Cyber. 61, 183-194. Zak, J. (1967). Phys. Reu. Lett. 19, 1385-1397. Zeevi, Y. Y., and Gertner, I. (1992). J. Viual Comm. Image Repres. 3, 13-23. Zibulski, M., and ZRevi, Y. Y. (1993). IEEE Trans. Signal Process. 41, 2679-2687.
.
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOL. 97
Models and Algorithms for Edge-Preserving Image Reconstruction L. BEDINI. I. GERACE. E. SALERNO. AND A. TONAZZINI Consiglio Nazionale delle Ricerche. Istituto di Elabomzione della Informazione. Via Santa Maria. 46. 1-56126 Pisa. Italy
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 ..................... 91 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 . . . . . . . . . . . . .94 . . . . . . . . . . . . . . . . . . .94 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 ........................ 98 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 . . . . . . . . . . . . . . . . . . . 101 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 ....................... 104 ............................ 106 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 . . . . . . . . . . . . 119 ..................... 120 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 .......................... 129 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 ............................... 137 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 ..................... 149 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
1. Introduction
A . Regularization and Smoothness B. Accounting for Discontinuities C. Edge-Preserving Reconstruction Algorithms D.Overview I1. Inverse Problem. Image Reconstruction. and Regularization A . Objects. Observations. and the Direct Problem B. Data and the Inverse Problem C. Regularization 111. Bayesian Approach A . Composition of States of Information B. Solving the Inverse Problem C. Optimal Estimators Based on Cost Functions D . The Gaussian Case IV . Image Models and Markov Random Fields A. MRFs and Gibbs Distributions B. Introducing Discontinuities V. Algorithms A . Monte Carlo Methods for Marginal Modes and Averages B. Stochastic Relaxation for MAP Estimation C . Suboptimal Algorithms VI . Constraining an Implicit Line Process A . Mean Field Approximation B. ExtendedGNC C . Sigmoidal Approximation VII . Determining the Free Parameters A . Regularization Parameter B. MRF Hyperparameters C. Parameter Estimation from Training Data VIII . Some Applications A . ExplicitLines B. ImplicitLines IX. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References
......................................... 85
181 184
.
Copyright 8 1996 by Academic Press Inc . All rights of reproduction in any form resewed.
86
L. BEDINI ET AL.
I. INTRODUCTION Image restoration and reconstruction are fundamental in image processing and computer vision. Indeed, besides being very important per se, they are preliminary steps for recognition and classification and can be considered representative of a wide class of tasks performed in the early stages of biological and artificial vision. As is well known, these are ill-posed problems, in that a unique and stable solution cannot be found only by using observed data but always entails using regularization techniques. The rationale is to force some physically plausible constraints on the solutions by exploiting a priori information. The most common constraint is to assume globally smooth solutions. Although this may render the problem well-posed, it was evident from the start that the results were not satisfactory, especially when working with images where abrupt changes are present in the intensity. As these discontinuities play a crucial role in coding the information present in the images, many researchers tried to introduce some refinements in the regularization techniques to preserve them. One idea was to introduce, as constraints, functions that vary locally with the intensity gradient so as to weaken the smoothness constraint where it has no physical meaning. Another approach, closely related to the first, is to consider discontinuities as explicit unknowns of the problem and to introduce constraints on their geometry. In both approaches, the computation is extremely complex, and thus several algorithms have been proposed to find the solution with feasible computation times. In this chapter we begin with Tikhonov’s regularization theory and then formalize the edge-preserving reconstruction and restoration problems in a probabilistic framework. We review the main approaches proposed to force locally varying smoothness on the solutions, together with the related computation schemes. We also report some of our results in the fields of restoration of noisy and blurred or sparse images and of image reconstruction from projections. A. Regularization and Smoothness
From a mathematical point of view, image restoration and reconstruction, as well as most problems of early vision, are inverse and ill-posed in the sense defined by Hadamard (Poggio et al., 1985; Bertero et al., 1988). This means that the existence, uniqueness, and stability of the solution cannot be guaranteed (see Courant and Hilbert, 1962). This is due to the fact that information is lost in the transformation from the image to the data,
EDGE-PRESERVING IMAGE RECONSTRUCTION
87
especially in applications where only a small number of noisy measurements are available. To compensate for this lack of information, a priori knowledge should be exploited to “regularize” the problem, that is, to make the problem well-posed and well-conditioned, so that a unique and stable solution can be computed (Tikhonov, 1963; Tikhonov and Arsenin, 1977). In general, a priori knowledge consists of some regularity features for the solution and certain statistical properties of the noise. One approach to regularization consists of introducing a cost functional, which is obtained by adding stabilizers, expressing various constraints on the solution, to the term expressing data consistency. Each stabilizer is weighted by an appropriate parameter. The solution is then found as the minimizer of this functional (Poggio et al., 1985; Bertero et al., 1988). A number of different stabilizers have been proposed; their choice is related to the implicit model assumed for the solution. In most cases, such models are smooth in some sense, as they introduce constraints on global smoothness measurements. In standard regularization theory (Tikhonov and Arsenin, 1977), quadratic stabilizers, related to linear combinations of derivatives of the solution, are used. It has been proved that this is equivalent to restricting the solution space to generalized splines, whose order depends on the orders of the derivatives (Reinsch, 1967; Poggio et al., 1985). Another classical stabilizer is entropy, which leads to maximum entropy methods. Many authors insisted on the superiority of the entropy stabilizer over any different choice. Maximum entropy has indeed two indisputably appealing properties. First, it forces the solution to be always positive. Second, it yields the most uniform solution consistent with the data, ensuring that the image features result from the data and are not artifacts. For this reason, maximum entropy methods have been extensively studied and used in image restoration/reconstruction problems (Minerbo, 1979; Burch et al., 1983; Gull and Skilling, 1984; Frieden, 1985). In Leahy and Goutis (1986) and Leahy and Tonazzini (1986), the model-based interpretation of regularization methods was well formalized and the explicit form of the model for the solution was given for a set of typical stabilizers. This interpretation shows that no stabilizer can be considered superior to the others, which means that it should be chosen from our prior expectations about the features of the solution. Another approach to regularization, which proves to be intimately related to the variational approaches, is the Bayesian approach. The solution and the data are considered as random variables and all kinds of information as a suitable probability density, from which some optimal solution must be extracted. The reconstruction problem is thus transformed into an inference problem (Jaynes, 1968, 1982; Backus, 1970;
88
L. BEDINI ET AL.
Franklin, 1970). Tarantola (1987) proposed a general inverse problem theory, completely based on Bayesian criteria and fully developed for discrete images and data. Tarantola argued that any existing inversion algorithm can be embedded in this theory, once the appropriate density functions and estimation criterion have been established. Tarantola’s theory enables a deep insight into inverse problems, and it can also be used to interpret or compare different results or algorithms. However, translating each state of information into an appropriate density function is one of the difficulties of this theory. Once this has been done, the so-called prior density, expressing the extra information, is combined with the likelihood function, derived from the measurements and from the data model, thus resulting in the posterior density. This can be maximized, to give the maximum a posteriori (MAP) estimate, or used to derive other estimates. One solution is to look for the estimate that minimizes the expected value of a suitable error function, such as the MPM (maxima of the posterior marginals) and the TPM (thresholded posterior means) estimates. These estimates minimize the expected value of the total number of incorrectly estimated elements and the sum of the related square errors, respectively. Whereas MPM and TPM have a pure probabilistic interpretation, the MAP estimate can be seen as a generalization of the variational approach described above. Indeed, a cost functional can always be seen as the negative exponent (posterior energy) of an exponential form expressing a posterior density. Thus, minimizing a cost functional is equivalent to maximizing a posterior density. From this point of view, the stabilizer can also be seen as the negative logarithm of the prior and the prior as the exponential of the negative stabilizer. By virtue of this equivalence, hereafter the terms “cost functional” and “energy” will be used indifferently. In many cases, the cost functional is convex. This means that standard descent algorithms can be used to find the unique minimum (Scales, 1985). Nevertheless, because the dimension of the space where the optimization is performed is the same as the image size (typically 256 X 256 pixels or more), the cost for implementing these techniques is very high. This is especially true when the cost functional is highly nonquadratic, as in the case of the entropy stabilizer. Neural networks could be a powerful tool for solving convex, even nonquadratic, optimization problems. This is related to the ability of a stable continuous system to reach an equilibrium state, which is the minimum of an associated Liapunov function (La Salle and Lefschetz, 1961). Electrical analog models of neural networks have been proposed as a basis for their practical implementation (Poggio and Koch, 1985; Poggio, 1985; Koch et al., 1986). The computation power of these circuits is based
EDGE-PRESERVING IMAGE RECONSTRUCTION
89
on the high connectivity typical of neural systems and on the convergence speed of analog electric circuits in reaching stable states. In Bedini and Tonazzini (1990, 19921, we suggested using the Hopfield neural network model (Hopfield, 1982, 1984, 1985; Hopfield and Tank, 1986) to effectively solve the problem of the restoration of blurred and noisy images. Another problem arising in the variational approach to regularization is the choice of the parameters in the cost functional. Regarding a convex cost functional as the Lagrangian associated with a constrained minimization problem and the parameters as the Lagrange multipliers, the necessary conditions for the minimum also specify equations to be satisfied by the Lagrange multipliers (Luenberger, 1969). In many cases, however, the solution of these equations is a formidable computational problem, because they are nonlinear. Bedini et al. (1991) proposed a different method that allows the Lagrange multipliers to be estimated with a relatively low cost. The method is based on the primal-dual theory for solving convex optimization problems (Luenberger, 1984). The original, primal, problem is reformulated in an equivalent form so that the related dual problem can be solved through a single unconstrained maximization. The solution of the original problem is then related to the solution of the dual problem through a model that depends on the particular stabilizer adopted. The method has been derived for the restoration of blurred, noisy images and for different kinds of stabilizers, such as cross-entropy and energy. An alternative, which is common in standard regularization, is merely to consider the regularization parameters as weights that balance data consistency and a priori information in the cost functional. The choice of these weights is still a critical task, as they considerably affect the quality of the reconstructions, so that some objective criteria should be devised to determine them. Some of these criteria are shown in Tikhonov and Arsenin (1977), Golub el al. (1979), and De Mol (1992).
B. Accountingfor Discontinuities The smoothness constraint has its validity in image processing because physical images are usually smoothly varying. However, object boundaries, occlusions, textures, and shadows can cause discontinuities in image intensity. Ordinary regularization techniques normally fail in these situations, because the smoothness properties of the stabilizers cannot be varied over the reconstruction domain. The first attempts to take discontinuities into account were made by using stabilizers that are nonquadratic functions of the image gradients. Their aim was to encourage smoothing within homogeneous regions with-
90
L. BEDINI ET AL.
out excessively penalizing the high gradients that occur at the boundaries between different regions. Some authors suggested using nonquadratic yet still convex stabilizers (Besag, 1989; Green, 19901, and others proposed nonconvex stabilizers that have a finite asymptotic behavior at infinity (Geman and McClure, 1985, 1987; Blake and Zisserman, 1987a, b; Geman and Reynolds, 1992). Another approach to treating image discontinuities is to augment the cost functional by means of the explicit introduction of a line process. Terzopoulos (1986, 1988) introduced a class of “controlled-continuity” stabilizers that preserve the discontinuities by spatially controlling the smoothness of the image. Discontinuities must be located in advance (Terzopoulos, 1986; March, 1988) or can be considered unknowns of the problem (Terzopoulos, 1988; March, 1989). Blake and Zisserman (1987a, b) and Mumford and Shah (1989) proposed variational techniques to optimize a functional where the interaction between the intensity field and the unknown discontinuity set is described by a particular weak membrane energy. Blake and Zisserman assumed a discrete image model and proposed a graduated nonconvexity (GNC) algorithm to recover a piecewise smooth solution image, by first eliminating the binary line process from the weak membrane energy. In a continuous setting, Mumford and Shah established a cost functional with a singular part representing the discontinuity set. The numerical optimization of such a functional is a very difficult problem. March (1992) uses the r-convergence theory (see De Giorgi, 1977; Ambrosio and Tortorelli, 1990) to put the cost functional of Mumford and Shah into a more tractable form. The singular cost functional is transformed into an elliptic functional, which can be minimized by standard numerical techniques. The discontinuity set can be precisely located by means of a sequence of smooth functions, converging to Terzopoulos’ continuity control function. Geman and Geman (1984) outlined the advantages of using a Bayesian approach in which the image is modeled as a pair of Markov random fields (MRFs). One of them (the intensity process) represents the multilevel field of the pixel values; the other (the line process) is a two-level field representing the discontinuities. The Clifford-Hammersley theorem establishes the equivalence between MRFs and Gibbs distributions. The local correlations between image elements can thus be expressed in the form of Gibbs priors. This is particularly useful, as it allows many image features to be accounted for by introducing simple local terms into the prior. It is also flexible in exploiting certain physical and geometrical constraints on the lines, such as their smoothness and connection features. In this approach, the solution to the reconstruction problem has generally been defined as a MAP estimate, although other estimators have been
EDGE-PRESERVING IMAGE RECONSTRUCTION
91
proposed (Marroquin, 1984; Marroquin et al., 1987). Nevertheless, the nonconvexity of the cost functionals arising in these cases entails using minimization algorithms with extremely high complexity. Many attempts have been made to reduce the complexity of nonconvex optimization or to devise algorithms at feasible costs. As already seen, when the stabilizer acts locally on the image intensity and satisfies certain properties, the discontinuities can be preserved without introducing extra variables. Geman and Reynolds (1992) established a strict relationship between explicit and implicit line treatment. They derived a “duality theorem,” which relates a class of primal energies with implicit lines and a class of dual energies with explicit lines. Primal and dual energies are equivalent, in that their global minima over the intensity variables coincide. The duality theorem states the conditions under which this equivalence exists and gives a tool for deriving the dual from the primal. Implicit line treatment is interesting because efficient, although suboptimal, deterministic algorithms can be used instead of the stochastic algorithms usually required for explicit line treatment. These algorithms have mainly been developed for noninteracting lines. This means that significant constraints are not enforced on line geometry. On the other hand, prior information on the interactions between lines is available in many reconstruction problems. For instance, the discontinuities associated with edge contours are often connected (hysteresis) and thin (nonmaximum suppression). Introducing this information into the problem would greatly improve the quality of the reconstructed image. Thus, some authors proposed approaches which are suitable for treating interacting lines in implicit form. They found that some approximations are needed. Geiger and Girosi (1991) proposed the mean field theory to average out the binary line process from the weak membrane energy and derived some ad hoc approximations to enforce the line continuation constraint. In the context of a GNC algorithm for image reconstruction, we adopted a different approximation, which permits connected and thin lines to be obtained (Bedini et al., 1994a).
C. Edge-PreservingReconstruction Algorithms
As already seen, in regularization techniques involving discontinuities the cost functional is not convex, and the usual descent algorithms do not ensure that the global minimum will be found. In principle, the computation thus has to be performed by using stochastic relaxation algorithms, to
92
L. BEDINI ET AL.
avoid local minima at which descent algorithms would get stuck. Most of these algorithms employ simulated annealing techniques. Despite their convergence properties, these algorithms are computationally very heavy, owing to the size of the problems treated and the number of iterations required. Two main strategies have been adopted to reduce the complexity of the problem and/or the execution times. In the first, the problem is transformed or approximated to permit the use of totally or partially deterministic algorithms. In the second, parallel algorithms are studied, for use on general-purpose or dedicated parallel architectures. With an application for the restoration of blurred and noisy images with explicit line treatment, Geman and Geman (1984) proposed finding the MAP estimate through a parallel Gibbs sampler algorithm. This is possible in image restoration, in that the posterior distribution is still Gibbsian, with relatively small neighborhoods. Parallel approaches to image reconstruction from projections (or any other problem that has the degradation operator with a broad support) are not so direct, because the posterior probability is no longer Gibbsian. In Bedini and Tonazzini (1992) a mixed-annealing algorithm is proposed for the parallel computation of the MAP estimate, which is suitable for both image restoration and reconstruction. In this algorithm, the minimization is performed by an annealing scheme, which can be considered as the cooperation of two computational blocks. The first performs a quadratic minimization over the continuous intensity variables and can be effectively implemented using a linear neural network. The second block updates the binary line process by means of a Gibbs sampler and can be implemented by a grid of processors working in parallel. In this hybrid architecture, the linear neural network would support most of the computation. Iterated conditional modes (ICM) is a deterministic algorithm proposed by Besag (1986) for discrete intensity fields. ICM approximates the MAP estimate by computing the maximum of the posterior probability of each image element, conditioned on the values assumed by all other elements at the previous iteration. Extensions to continuous intensity fields are interesting when simple closed forms for the solutions can be derived. ICM can also be used to update the binary line process, for instance, within a mixed-annealing scheme. In these cases, however, better results can be obtained by a slight modification of ICM, called iterated conditional averages (ICA), which can prevent the (continuous) line process from converging faster than the intensity process (Johnson et al., 1991). Blake and Zisserman (1987a, b) derived their cost functional for implicit line treatment by eliminating the binary line process from the weak membrane energy. Because it remains nonconvex, these researchers developed a special optimization algorithm, which is based on the minimization
EDGE-PRESERVING IMAGE RECONSTRUCTION
93
of a series of approximations of the original cost functional. The minimizations can be performed by standard gradient descents. This is the graduated nonconvexity algorithm, which aroused much interest for its simplicity and reduced computational complexity compared with stochastic approaches. In their mean field annealing approach, Geiger and Girosi (1991) provided a parametric family of energy functions, converging to the same cost functional as Blake and Zisserman’s, and showed that, when applied to this family, GNC can be seen as a deterministic annealing. They approximated the global minimum by iteratively minimizing the energy functions through the solution of deterministic equations. Geman and Yang (1994) proposed a linear algebraic method for implementing regularization with implicit discontinuities. When suitable auxiliary variables are introduced, the posterior distribution becomes Gaussian in the intensity variables, with a block circulant covariance matrix. A simulated annealing algorithm with simultaneous updating of all the pixels can thus be designed using fast Fourier transform (FFT) techniques. In emission tomography, iterative deterministic algorithms based on the expectation-maximization method have been adopted for maximum likelihood estimation (Dempster et al., 1977). Since expectation-maximization exploits knowledge on the random nature of the physical data generation process, these algorithms produce better reconstructions than those of the classical methods, such as filtered backprojection. Generalized expectation-maximization (GEM) algorithms have also been proposed to solve the MAP problem that arises when Gibbs priors are introduced to stabilize the solutions. These techniques can address discontinuities either implicitly or explicitly (Hebert and Leahy, 1989; Gindi et al., 1991; Leahy and Yan, 1991). Like mixed annealing, GEM can handle mixed continuous and binary variables by splitting each iteration into two independent steps, one acting on the continuous variables and the other on the binary variables. For this reason, GEM permits the incorporation of various forms of interactions among discontinuities. GEM algorithms have also been derived for transmission tomography from Poisson data (Lange and Carson, 1984). In Salerno et al. (1993) and Bedini et al. (1994d) we considered the GEM approach for transmission tomography, assuming a Gaussian data model and explicit discontinuities.
D. Overview This chapter is a review (by no means complete) of edge-preserving regularization for image reconstruction and restoration. It obviously re-
94
L. BEDlNl ET AL.
flects our particular point of view and is particularly influenced by our experience in this field. We first describe regularization and the problem of discontinuities in the deterministic and Bayesian approaches (Sections I1 and 111). In Sections IV and V, we introduce our point of view on image reconstruction, which is Bayesian and considers discrete images and data. Some of the issues that are still open in this field, namely the introduction of constraints in the discontinuity set and parameter estimation, are treated in Sections VI and VII, respectively. In Section VIII, we show some applications of the techniques and algorithms described, taken from our previous work, regarding image restoration and tomography. We present the results of some experiments, showing the influence of different methods for introducing the discontinuities in the image model.
AND REGULARIZATION 11. INVERSE PROBLEM, IMAGE RECONSTRUCTION,
In this section, we define the inverse problem of image reconstruction and restoration and introduce an approach for its solution, which is developed in the following sections. Although the physical processes involved and the numerical difficulties in implementing and executing the algorithms are very different in different cases, reconstruction and restoration are formally analogous and can be treated in a unified framework. For this reason, we speak of image reconstruction, meaning both the problems addressed in this chapter. Both the image and the data to be measured can be modeled as continuous or discrete functions. In this respect, the data generation model can be continuous-continuous, continuous-discrete, or discretediscrete (Andrews and Hunt, 1977). However, most works assume a discrete-discrete model, so we too will refer to this type of model, although the formal development of the theory is often more general. In this section in particular, all the relations are valid for any kind of model. A. Objects, Observations, and the Direct Problem
Consider an N-dimensional vector space X that contains the images to be reconstructed, x, whose elements are the image pixels; we call it object space. Suppose that there exists an operator, d, related to some physical process, which maps the object space onto another M-dimensional vector
EDGE-PRESERVING IMAGE RECONSTRUCTION
95
space, Y (the observation space), namely, which transforms each x E X in a unique element y E Y, so that y =MIX).The study of the physical system consists of deriving a model for d. Because the physical process may be very complex or even partially unknown, this model is generally an approximated operator, A. For example, in image restoration A could be a convolution operator, with a certain point spread function; in tomographic reconstruction, A could be the Radon transform. In these two examples, the model A is linear and approximates sufficiently well the real process. This situation is very frequent in practical applications. A linear approximation for the true operator st’ enables the exploitation of the very powerful tools of linear algebra. Computing y from x will be referred to as the direct problem, because the goal is to obtain the effect, or response, of a known physical system to the input object, or stimulus. In a causal world, the solution of the direct problem is unique, but there will always be some uncertainty. In fact, operator A introduces an error n1 (the model error), because, as already mentioned, some physical approximations are introduced to model the data generation process. It is
y =Ax
+ nl.
(1)
B. Data and the Inverse Problem Suppose now that (as always happens) our measurement system introduces additional uncertainty in the data, and call g the vector obtained through the measurement system. Then g=y
+ n 2 = A x + n,
(2)
where n is the system noise, generated jointly by the model error, n,, and the measurement error, n2. The inverse problem of image reconstruction consists of estimating x from g and A. This can be done by imposing data consistency, i.e., by searching for an x such that Ax is not too far from g. However, this is always an ill-posed problem, in the sense of Hadamard, in that the existence, uniqueness, and stability of the solution are never guaranteed (Tikhonov and Arsenin, 1977). Existence is not assured because the system noise can cause incompatibility between data and images. Uniqueness is not assured because A is usually a noninvertible operator. In the continuous-continuous or continuous-discrete model, stability is related to the continuity of the inverse operator. In the discrete-discrete model, a continuous generalized inverse operator always exists, so that the problem
96
L. BEDINI ET AL.
is always well-posed. However, the inverse operator is generally ill-conditioned, and finite (although very small) errors in the data may be highly amplified in the solutions. C. Regularization
The means of overcoming ill-posedness and ill-conditioning is regularization, i.e., any technique for obtaining a unique and stable solution by studying a well-posed restriction of an originally ill-posed problem. This restriction is always knowledge based. Note that the data set is not the only piece of information we have on the problem and that even the classical reconstruction and restoration algorithms, which do not make explicit use of any knowledge but the data set g, actually impose very strong constraints on the solutions. These constraints are always based on implicit assumptions on the solutions. We will show below how additional information can be explicitly introduced into the problem to obtain a regularized solution. Let us start with data consistency, and suppose we have a criterion to determine whether Ax is too far from the data; that is, we know a constant K such that any estimate for x must satisfy the following condition:
where the squared norm term (any norm in the data space) is called residual. Intuitively, K is smaller for lower noise and, in the limit, goes to zero for a noise-free system; K can be defined more precisely if we introduce particular assumptions on the noise. The set defined by Eq. (3) contains the so-called feasible solutions of the problem. As already said, condition (3) alone often establishes an ill-posed problem. Using the constrained least-squares approach (Andrews and Hunt, 19771, image reconstruction can be reformulated as a variational problem, and its solution can be computed by optimizing suitable cost functions over the set of feasible solutions. These functions model our prior information on the solution to be sought and can be seen as particular norms in the object space. Let C(x) be one of these functions. Minimizing C(x) subject to Eq. (31, if C is convex, leads to a unique solution, which is also the solution of an equivalent unconstrained minimization problem: 2
=
arg mini18 X - AXII*+ AC(X),
(4)
for a particular value of the Lagrange multiplier A. In standard regularization (Tikhonov and Arsenin, 1977) this estimate has another interpreta-
EDGE-PRESERVING IMAGE RECONSTRUCTION
97
tion: the solution image results from a compromise between the residual, related to data consistency, and the cost functional, now called stabilizer, which enforces regularity in the solution. In this case, the nonnegative regularization parameter, A, is no longer a Lagrange multiplier and determines this compromise, in that its value can be tuned to balance the effect of data and regularity on the solution. Intuitively, A should be large if the data set is heavily corrupted by noise and small otherwise. In Section VII, we shall see some selection strategies for the regularization parameter. Here are some examples of the stabilizers:
C,(x) C,(X)
=
=
llVxIl2,
c x i log x i . i
When used in Eq. (41,stabilizer C, prevents the solution from having a large energy, stabilizer C, prevents the solution gradient from having a large energy, stabilizer C, gives the so-called maximum entropy solution. These three functionals, and many others, enforce global constraints on the solution: they require a particular property of the solution to be satisfied everywhere in its support region. From physical considerations (e.g., coherence of matter), this property is often smoothness. For the case of C,, for example, first-order smoothness means no large gradient magnitudes. Higher order smoothness can be obtained using the Laplacian or higher order differential operators. For C,, it can be proved (see Jaynes, 1982; Burch et al., 1983) that the solution obtained is maximally flat, compatible with data consistency. The use of global constraints has been proved to give unique and stable solutions in image reconstruction, but normally the smoothness assumption is not valid everywhere. In fact, any physical image can be assumed piecewise smooth, i.e., composed of a number of connected regions where smoothness is verified, separated from one another by a set of curvilinear boundaries, where the smoothness constraint has no physical meaning. Reconstructing one such image using a global smoothness stabilizer leads to a solution that is oversmoothed across the boundaries. In image analysis, whatever its purpose, boundary detection and location are very important tasks. It is thus clear that a reconstructed image with no boundary information is unacceptable for most applications. The way to reconstruct images while preserving the boundaries (edgepreserving image reconstruction) is to adopt stabilizers that act locally on the image to be reconstructed. Although we could also define local
98
L. BEDINI ET AL.
stabilizers in the present context, we prefer to introduce them in the framework of the Bayesian techniques, which give a more comprehensive point of view on image reconstruction and to which the standard regularization approach shown here can be brought back in particular cases.
APPROACH 111. BAYESLAN
According to the general inverse problem theory as presented by Tarantola (1987), the Bayesian approach to the solution of any inverse problem is based on considering any piece of information on the problem as a measure density function on the appropriate space. In particular, any measure density becomes a probability density if it can be normalized, and each parameter of the problem is considered as a random variable, with its probability function. In this theory, the solution to an inverse problem is the joint density (or posterior) obtained by composing the densities for the model information with those for prior knowledge on the solution, the data, and the measurement system. Using the posterior as the final state of information, image reconstruction can be treated as a parameter estimation problem. The theory is fully developed for the discrete case. The generalization for a continuous model is not complete because not all the properties of probability densities can be extended to the infinite-dimensional case. What follows in this section should thus be considered valid for a discrete-discrete model In the following, we briefly outline the ideas and the formulas of the composition of states of information and the theory of estimation. We also show a particular case in which the result is the same as the one obtained by standard regularization. We use the same notation and functional space definitions introduced in the previous section; a complete treatment can be found in Tarantola’s textbook.
A. Composition of States of Information
Let us consider a sample space 8,mapped, as usual, onto a vector space M, isomorphic to RN,whose elements are N-vectors describing the events in 8. We can establish a probability measure in 3,described by a probability density in M . This probability density, say f , is a state of information on M , and each subset in M has its measurable information
EDGE-PRESERVING IMAGE RECONSTRUCTION
99
content on the basis of f. The way to solve an inference problem is to collect all possible states of information on M and derive the final state of information from them. If m is the generic element of M, the formula for the composition of two states of information, related to the densities f,(m) and f,(m) is the following:
This can be shown to be a generalization of the AND logical operator. The normalizing density p ( m ) is called noninformatwe or zero information density and enables the final information content to be kept independent of possible changes of coordinates in M, or equivalently, of different parametrizations of the space a (see also Jaynes, 1968).
B. Solving the Inverse Problem We shall use Eq. (8) to solve our inverse problem after we have translated all our information into probability densities. In this case, the vector space M is the Cartesian product of the object and the observation spaces. The vector m will thus be obtained by juxtaposing vectors x and y. 1. Theoretical Information
Let us consider the direct problem; that is, let us try to model the operator that maps images into data under the form of a probability density. With reference to Eq. (0,because of the presence of the model error, it appears that the relationship between y and x can be described only in probabilistic terms. This can be done by using a conditional probability O(yIx). From this density, we derive the joint density of observations and objects, assuming at this stage no prior knowledge on the real object:
where px is the zero information density for the object. For example, if the error is null or negligible, we have
where S( is the Dirac delta function. Another example, the Gaussian observation model, will be shown in Section III,D.
100
L. BEDINI ET A L
2. Prior Information
Let us now introduce a probability density function that contains all kinds of prior information on the problem. Assuming independence between a priori information on objects and observations, we have: P(Y,X)
= PY(Y)
PX(X).
(11)
The density pv represents the prior state of information on the observations, which includes knowledge of the data set g; the density px represents prior knowledge on the solution. If we assume no prior information on the values of the observations, we have: PY(Y) =
4 B IY) cLY(Y),
( 12)
where g is the measured data vector, p y is the zero information density for the observations, and v(g Iy) is the probability density of the data, given the observations, which is characteristic of the measurement system used. This formula is a generalization of the first equality in Eq. (2). As far as the density px (hereafter called prior) is concerned, we have already said that it should be designed to enforce smoothness on the solution. Sections IV, VI, and VII deal with the determination of useful priors for the solution. 3. Posterior Density
Let us now apply Eq. (8) to compose the information contained in (9) and (11); also using (121, we obtain:
(13)
which does not depend on the unspecified zero information densities. The density function CJ contains all the posterior information about both the object and the observations and can be considered as the solution of the inverse problem. If we look for an estimate of x, we should evaluate the marginal posterior density for that vector:
Since the estimation of y is out of our scope, we shall consider Eq. (14) to be the solution of the reconstruction problem. The unessential subscript X
EDGE-PRESERVING IMAGE RECONSTRUCTION
101
will hereafter be omitted from the notations of the posterior marginal a, and of the prior density p x .
C. Optimal Estimators Based on Cost Functions The posterior a ( x ) is the ultimate state of information, from which we should find an optimal estimate for the object x. To this end, we must choose the particular optimality criterion to follow. A good approach to the representation of optimality criteria is to introduce cost functions. Let us define a distance function, C(x*, x), between two different objects x* and x, and call it cost of x* with respect to x. Let xop,.be the image to be estimated, and let us try to evaluate an x* to minimize the cost C(x*, xOpt). Unfortunately, the only information we have on the solution is contained in o(x), and thus C(x*,xoPl) cannot be directly computed. However, if a ( x ) is reasonably concentrated around xopI, we can assume C(x*, xopl)to be equal to the expected value of C(x*, x) according to d x ) . Assuming that the image pixels have discrete values, we have E,{C(X*,X))
c
=
C(X*,X)
a(x),
( 15)
XEM
where the summation range M is now the set of all possible configurations assumed by x. Let us now introduce a property that may help to minimize (15) over x * . It can be proved (Marroquin, 1985) that, if c(x*,x)
=
Cci(x’,xj),
i = 1 , 2,..., N ,
( 16)
i
with = 0 Ci(a’ ’)( > 0
fora = b, otherwise,
then E,{C;( X ’ ,
Xi)} =
c C;(
XT, X i )
X,E
a i (X i ) ,
( 18)
A
where A is the discrete set of the values assumed by the ith pixel, and a i is the related marginal density. In other words, if the cost is an additive function over all the image pixels, under condition (171, the minimizer of (15) over x* can be found by minimizing separately the expectations of the Ci’s with respect to the related marginals. The difficulty inherent in this approach is the calculation of the marginals, in that each of them is obtained by summing a ( x ) over a huge high-dimensional space.
102
L. BEDINI ET AL.
The particular choice of the cost function will influence the estimate of the image and will clarify exactly how this estimate is optimal. Below, we introduce three different cost functions with the related estimation criteria. 1. Maximizer of the Posterior Marginals The following cost function:
is the count of the pixels of x* that are different from the corresponding pixels of xOpt.Observe that this function has the property (16)-(17), and thus is can be minimized separately for each i . The expectation of each term is the sum of the marginal minus its value for x i = x:. The minimizers of the costs per pixel are thus the maximizers of the marginals. The derived estimate is thus called maximizer of the posterior matginak (MPM). 2. Marginal Posterior Mean
The cost function:
measures the total squared distance of x* from x, t . Once again, the minimization can be made using the marginals (185. Let us show the expected cost of the ith pixel: E{Ci<XT,Xi))=
C
(xi* - x i ) * a ' ( x ; ) = x T 2
- 2(Xi>X?
+ (Xf),
(21)
xi€ A
where ( x i > denotes the expectation of the ith pixel. The minimizer of this expression is obtained by choosing for x,* the discrete value nearest to ( x i > . It is easy to verify that the expectation of Ci in this case is the variance of the pixel value over the marginal m i . 3 . Maximum a Posteriori Probability
The function 0 C(x*,xopt) =
for x* = xopt, for x* # xopt
assigns the same cost to any image different from xOpt.It is easy to verify that, using Eq. (22) in Eq. (151, we obtain the sum of the posterior density,
EDGE-PRESERVING IMAGE RECONSTRUCTION
103
minus a ( x * ) . The minimum cost will then be obtained if x* is the maximizer of d x ) ; the cost function (22) thus defines the well-known maximum a posteriori (MAP) estimation criterion. D. The Gaussian Case
Now, let us show how the Bayesian approach can lead to functionals similar to those met in standard regularization, Eqs. (4)-(7). In fact, this can be achieved if the observation and data models are considered to be Gaussian. We adopt here the same notation introduced in Section II1,B. Let us assume that the observations are related to the true image through operator A, up to a Gaussian deviation with variance a::
where M is the number of observation samples. Furthermore, let us suppose that the measured data are affected by a Gaussian error with variance a;:
If we use Eqs. (23) and (24) in Eq. (14) to calculate the joint posterior, we find the convolution between two Gaussian densities, which is another Gaussian density with variance u 2= a: + a:, multiplied by the prior density p:
d(27rU2p Note that this equation can be seen as the posterior density f ( x Ig), obtained by the Bayes rule:
ignoring the denominator, which is constant in x. Equation (14) is thus a generalization of the Bayes rule. The function which appears in Eq. (14) as an integral becomes the usual likelihood density only when the data and the measurement models are Gaussian.
104
L. BEDINI ET AL.
Let us take the negative logarithm of Eq. (251, ignoring the constant terms: -logu(x) =
llg - Ax1l2 - log p(x) 2u2
= E(x).
Maximizing the posterior u is equivalent to minimizing the negative log-posterior, or posterior energy, E(x). Observe that the minimization of (27) gives the same solution as Eq. (41, with A = 2u2 and C(x) = -log p(x). Thus, the Gaussian assumptions (23) and (24) can reduce the MAP estimation to the standard regularization formula (4). The introduction of a stabilizer in standard regularization, as presented in the previous section, can thus be interpreted as a particular choice of the log-prior in a Bayesian setting. If p(x) is assumed to be constant, the MAP estimation is reduced to the minimization of the square norm term, derived from the Gaussian likelihood function. This is the well-known maximum likelihood (ML) criterion, which leads, for example, to the least squares and pseudoinverse solutions. ML solutions are often very unstable, and special stopping criteria have been developed for the iterative optimization algorithms to avoid too many artifacts in the images (Veklerov and Llacer, 1987). Another strategy for obtaining a stable solution is to give the prior p(x) a suitable form. In the previous section, we showed three global log-prior functions; in the next section, we will introduce a general framework to establish local, edge-preserving, priors.
Iv.
IhrlAGE MODELSAND MARKOV RANDOM
FIELDS
We have shown that the Bayesian approach to regularization offers a way to express our prior knowledge in the form of prior models for the solution. As the aim of image reconstruction is to recover an intensity map that represents the spatial distribution of some physical quantity, these models should be able to describe the complex nature of an image. In particular, they should be able to take into account a variety of attributes which, although related to the behavior of the intensity, have an independent characterization. These attributes can be, for instance, intensity discontinuities, texture types, and connected components. Introducing a priori knowledge available on such attributes would help the reconstruction task and would make the reconstructed images more suitable for image analysis. We have already stressed the importance of taking discon-
EDGE-PRESERVING IMAGE RECONSTRUCTION
105
tinuities into account. In this section, we introduce a general theoretical framework underlying edge-preserving image reconstruction, and we define a class of image models which are suitable for describing the behavior of both the intensity and the discontinuity fields. These models should provide a common characterization for a wide set of images, without overconstraining the features about which we have no prior knowledge. Moreover, the dependence among the field elements should be local, in that the correlation between two elements which are far enough away from each other is not expected to be large in real images. This feature can facilitate the implementation of distributed and parallel reconstruction algorithms. In the literature on image modeling, both deterministic (Andrews and Hunt, 1977) and stochastic (Jeng and Woods, 1988) models have been considered. However, the most common is the class of stochastic models, in which images are considered to be sample functions of an array of random variables (random field). Much research has been devoted to Markov random fields (MRFs) models on finite lattices (Geman and Geman, 1984; Derin and Kelly, 1989; Jeng and Woods, 1991). Indeed, this class of models can describe complex image structures, where both intensity pixels and discontinuities can be treated. Let us start by characterizing the intensity attribute alone. In this case, if we deal with an N X N discrete image, the lattice will be the N x N pixel grid, and the site set S will be any arrangement of the grid, for instance S = { ( i , j ) , i = l , 2,..., N; j = 1,2,..., N } . On the site set, we can define a neighborhood system lY={q,sES).
If s and t are two distinct sites of the lattice, their neighborhoods, gs and g,, respectively, have the following properties: 1. segs 2. s E g,* t
E
PS.
In particular, we can define a homogeneous neighborhood system where the neighborhood of each site (i, j) is the following set:
and r is the order of the neighborhood. Associated with the neighborhood system, there is a set iZ? of cliques. A clique, C,is a single site or a subset of sites such that any two distinct sites
106
L. BEDINI ET AL.
in C are neighbors. The cliques are thus uniquely determined by the neighborhood system chosen. Let us call X a family of random variables associated with S, X = {X,, s E S), whose values x , lie on a common discrete or continuous set A, and call R the set of all the possible configurations
R
=
{x,:x,
E
A,s E S ) .
In our case, each x , represents the value assumed by X, at site s. X is an MRF with respect to the couple (S, 3 ) if and only i f p(x) > 0
vx E R ,
p ( x , I x , , V t # s) = p ( x , I x , , V t
E
(28a)
g,) V s E S , Vx E R , (28b)
where p(x) is the probability density of the random vector X. In other words, the conditional density of the element in s, given all the other elements, actually depends only on the values of its neighbors.
A. MRFs and Gibbs Distributions Any process satisfying Eq. (28a) is completely determined by specifying all the conditional densities. Nevertheless, specifying the conditional densities and deriving the related joint probability is not straightforward. This problem is overcome if the process also satisfies Eq. (28b), i.e., if the process is an MRF. Indeed, by virtue of the Clifford-Hammersley theorem, the joint density of an MRF is that of a Gibbs process, with the same neighborhood system:
where 2 is the normalizing constant, U(x) is called energy function, p is a positive parameter which controls the “peaking” of the distribution, and the potentials VJx) are functions supported on the cliques of the field. This important result allows the joint density of an MRF to be directly derived by specifying the potentials instead of the conditional probabilities. This makes it very easy to model and constrain the local behavior of an MRF with specified neighborhood and clique systems.
EDGE-PRESERVING IMAGE RECONSTRUCTION
107
Assuming an MRF image model and a linear and Gaussian data model, the posterior density (25) and the posterior energy (27) become:
and E(x)=
llg - AX112
+ -,V ( x )
2a2 P respectively. Once again, it is immediate to recognize in Eq. (31) the typical form of the cost functionals derived by the constrained optimization and standard regularization approaches, considering 2 u 2//3 as the regularization parameter. In fact, when U(x) is chosen as a measure for the image derivatives of a given order, it represents a roughness penalty, and Eq. (31) exactly corresponds to the cost functional used in standard regularization. This choice is correctly interpretable as the energy associated with a Gibbs distribution, in that it refers to a neighborhood system whose cliques are the sets of sites needed to compute locally the partial derivatives of a predefined order. In this case, U(x) can be written as
are finite-difference approximations to the kth order (k = where @(XI 0,1,2,3, ...) partial derivatives of x, expressed as functions of the pixels in C (Geman and Reynolds, 1992). Note that, for k = 0 and k = 1, respectively, Eq. (32) is reduced to the energy stabilizer in Eq. (5) and to the gradient stabilizer in Eq. (6). The entropy penalty in Eq. (7) can be seen as an energy associated with a Gibbs Distribution whose only nonzero potential cliques are those made by single elements. Another approach to establishing a prior energy consists of adopting the class of homogeneous Gaussian MRFs (GMRFs). In this subclass of models, the interaction among neighboring pixels can be given by means of the following parameteric model: x '91 . .=
c
hk,l xi-k, j-1
+ ni, j
(33)
k,leN
where: (i) N = {(k,I ) I0 < k 2 + 1' 5 r), r the order of the neighborhood, is the coefficient support region for the space-invariant neighborhood for each pixel.
108
L. BEDINI ET AL.
(ii) (hkJ are a suitable set of coefficients. (iii) ni, is a Gaussian, zero-mean, random field satisfying the following covariance constraint:
E[ni,jn,,,]
=
c'
-hi-k,j-,u2
if ( i , j ) = ( m , n ) , if ( i - m ,j - n ) E N, (34) otherwise.
It is straightforward to verify that a homogeneous GMRF has a Gibbs distribution, as in Eq. (29). In the case of first-order neighborhoods, the cliques can contain one or two sites, and the potentials V,(x) are given by
B. Introducing Discontinuities Although formally expressed as sums of local functions, the prior energies considered above force a global smoothness constraint on the image, due to the propagation of the smoothing throughout the image domain. This drawback can be overcome by introducing prior energies that permit us to locally break or relax the smoothness constraint where discontinuities are likely to occur. This can be accomplished by introducing the discontinuities as explicit auxiliary variables of the problem. The same goal can be reached using particular stabilizers that are able to preserve discontinuities without treating extra variables. In both cases, this can be done in the context of MRF models. 1. Implicit Treatment
In Eq. (32) each potential is a quadratic function of the partial derivatives. This function has the desirable properties of being positive, even, finite in zero, and increasing with the magnitude of its argument. Nevertheless, it increasingly penalizes high differences between neighboring pixels, and this prevents the discontinuities from being recovered. This can be avoided by replacing the quadratic function with a nonquadratic function which retains the good properties mentioned above but allows sharp transitions between distinct regions to be preserved. The general form of the prior
EDGE-PRESERVING IMAGE RECONSTRUCTION
109
energy in this case is
where a and A are positive parameters. Function C#J (also called neighbor interaction function; see Blake and Zisserman, 1987a) is a positive and increasing function of the derivatives @(x), thus enforcing a kth order smoothness constraint on x. Its form, however, can be chosen so as to relax this constraint where it is more likely to have a discontinuity. Various forms for 9 have been proposed in the literature, with different features concerning convexity and asymptotic behaviors. In particular, the following nonmnvex functions (Geman and McClure, 1985; Blake and Zisserman, 1987a; Gindi et al., 1991; Geman and Reynolds, 1992):
share the two properties that their limit at infinity is finite and that c # ~ ( \ / r ) is concave. The plots of these three functions are reported in Fig. 1. When these functions are used in Eq. (36), their meaning can be easily understood. They encourage neighboring pixels to have similar values if the derivatives are lower than A. Beyond this value, a further increase in the derivatives is allowed, with a relatively small increase in the penalty. The differences between neighboring pixels within smooth regions are thus penalized without excessively penalizing the larger differences occurring at the boundaries between different regions of the image. Other nonquadratic, but convex, functions are +‘l(t)
t)
=
Itl,
logmsh( t ) ,
(40)
(41) again shown in Fig. 1. The first was proposed by Besag (1989) and the second by Green (1990) for Bayesian reconstruction in emission tomography. The behavior at infinity of (40) and (41) is linear; discontinuities are thus allowed, but excessive intensity jumps are penalized. The effect of these functions is thus a compromise between a parabola and an asymptotically finite stabilizer such as those shown in Eqs. (37)-(39). Note that the convexity of functions and 45 facilitates the global minimization of the posterior energy. m5(
+,,
=
110
L. BEDINI ET AL. I .25
1.25
I
I
I
-
92w
1
1 -
0.75 0.5
-
0.25
-
0 0
1
2
3
4
0
1
2
3
4
0.5 0.25 0
2.5
5
7.5
10
FIGURE1. Plots of the neighbor interaction functions for implicit line treatment of Eqs. (37), (38), (391, (40) (dotted line), and (41).
All the functions shown above can take discontinuities into account, without introducing extra variables. However, no information on the geometrical line structure can be introduced, at least not straightforwardly. For example, these functions do not distinguish between isolated discontinuities, possibly due to peaks of noise, and connected discontinuities that are part of an object boundary. Blake and Zisserman argue that the truncated parabola of Eq. (37) has a natural hysteresis property; i.e., it tends to promote unbroken edges, without any need to impose additional penalties on line endings. Below, we will show that a more manageable and comprehensive way of treating geometrically constrained discontinuities is to consider them as explicit unknowns of the problem.
2. Explicit Treatment In this approach, the original image is regarded as a pair of interacting MRFs, (X,L), where X is the matrix of the pixel intensities and L is a new field, associated with the discontinuities. We call X the intensityprocess and L the line process.
EDGE-PRESERVING IMAGE RECONSTRUCTION
-----
---I I I I O
O
O
O
111
T
o l o l o l o l o FIGURE2. Grid of intensity and line elements.
In the simplest case L will be made of binary elements, with values 1 (‘‘line on”) and 0 (‘‘line off”). More generally, the line elements can be associated with continuous values, for instance in the range [0,1]. Typically, the line elements are localized in a rectangular interpixel grid and are distinguished into vertical and horizontal elements (Fig. 2). Under this assumption, L will be given by L, and L,, the ( N - 1) x N and N x ( N - 1) random matrices associated with the horizontal and vertical line elements, respectively. The values assumed by the elements of L, and L,. will be denoted by hi, and u,,j , respectively. The set of sites for the global field (X,L)will be given by the union of the intensity and line sites. The configuration space will be the set of pairs (x,l). In this case the neighborhood system Y!. and the related clique system I: must be defined on the mixed set of sites, allowing adjacent pixels and lines to be neighbors. Thus the prior distribution for (X,L) is
where
is the prior energy function. The potentials can be defined on homogeneous cliques (made of intensity sites alone or line sites alone) or mixed cliques (made by intensity and line sites). The general form (42b) thus admits the following decomposition:
112
L. BEDINI ET AL.
where U,<x>models the local constraints of the intensity process, U,(x,I) enforces the dependence between pixel intensities and line element configurations, and U,(I) represents the mutual relationships among neighboring line elements. Besides making the introduction of constraints on the line geometry simple and direct, this approach also allows cooperative processing; for instance, simultaneous reconstruction and edge detection can be performed. The above theory was principally stated by Geman and Geman, who referred to discrete MRFs. Because in many applications it is useful to consider the intensity field Gaussian, compound Gauss-Markov (CGM) models were proposed which combine a continuous intensity process with a discrete (binary) line process. As in the simple GMRFs, interaction among intensity pixels can be expressed in parametric form. However, the coefficients and the noise process depend on the values locally assumed by the line field. Equation (33) is thus modified as follows: x 1. . 1. =
C
'k,/(')Xi-k,j-/
+ ni,j(i)
(43)
k,lcN
where the coefficients hk,.,(l)are controlled by the line process, and the noise process is a conditionally Gaussian noise whose variance is controlled by 1. Computational properties for a broad subclass of CGM models are given in Jeng and Woods (1990). We focus on the class of piecewise smooth images, with connected and thin binary discontinuities. For this class, the neighborhood system in Fig. 3, which is of first order with respect to the intensity sites, is considered sufficient. In Fig. 4 we show two size-two line cliques, one size-four line clique, and two size-three mixed cliques. The possible configurations, up to rotation, for these line cliques are shown in Fig. 5.
I,) --loI a
-Ilo Io1
I
b
0
0 1 ° 1 0 0 C
FIGURE3. First-order neighborhood system for (a) horizontal line element; (b) vertical line element; (c) intensity element.
113
EDGE-PRESERVING IMAGE RECONSTRUCTION
a b C d e FIGURE4. Cliques for the neighborhood system in Fig. 3: (a and h) size-two line cliques; (c) size-four line clique; (d and e) size-three mixed cliques.
The mixed cliques shown in Fig. 4 allow us to enforce a local first-order I): smoothness constraint, according to the following expression for U3(x,
where A and a are positive parameters and flrepresents a threshold on the gradient, above which a discontinuity is likely to be created. In other words, the term U3(x,I)in the prior encourages solutions with discontinuities where the horizontal or vertical gradient is higher than the threshold and are smoothly varying elsewhere. Considered individually, A is a regularization parameter that promotes smoothing in the absence of discontinuities, and (Y represents the cost of creating a discontinuity, so as to prevent the creation of too many discontinuities. If a higher order neighborhood system is chosen for the intensity elements, Eq. (44) can be generalized to take into account higher order
0
0
0
0
a
0
0
0
-
0
0 1 0
0
0 1 0
0
b
0
0
9
7
1
0
0 1 0
e
d
C
-0 1 0
00
0
lo
1.1
h
i
f
FIGURE5 . Possible configurations, up to rotation, for the line cliques of Fig. 4 (a) no line; (b) termination; (c) turn; (d) straight continuation; (e) T-junction; (0cross; (g) no line; (h) single line; (i) double line.
114
L. BEDINI ET AL.
derivatives:
U3(x,l)= CE 0
A[D~(X)]~(~ - 1,) + al,
(45)
where 1, is the line element related to the clique C. The term U,(I) should reflect our prior expectations concerning the structure of discontinuities; for example, we know that lines are generally sparse and connected. A very general form for U,(l) is a table of values related to all the possible configurations for the line cliques. It can be used when there are not too many configurations. A low value of U,(l) makes the corresponding configuration more likely; conversely, the greater U,(I) is, the more unlikely the configuration will be. For instance, it is reasonable to associate a high value with the line termination configuration, in order to impose a low probability of abrupt line endings. For the line cliques in Fig. 4, the term U,(I) has the following form:
where Vl, V,, and V3 are tabular functions associated with all the possible configurations of the size-four and size-two cliques, respectively. To enforce constraints such as the favoring of line continuation and the inhibition of adjacent parallel lines, simple analytic forms for U,(I), which still refer to the same line cliques, can be given as alternatives. One of these forms is
EDGE-PRESERVING IMAGE RECONSTRUCTION
115
where the first and second terms penalize the formation of adjacent parallel lines (double lines), the next six terms favor the formation of continuous lines, and the last five terms penalize the formation of branches and crosses. The values of the parameters pi should thus be negative and can be chosen to give different probabilities to straight lines and turns. 3. Duality Theorem
Let us consider the general form of the posterior energy with explicit lines: E(x,I)
llg =
-
AxI12
2a2
+ U(x,l),
which should be minimized if a MAP estimation is required. If we define a function F ( x ) such that: F(x)
=
inf E ( x , I), I
then arg min F( x)
= x*
,
X
where (x* ,I* )
=
arg min E( x, I). (x, I)
Thus, the search for the global minimum of the posterior energy function E(x, I) can be restricted to the set of pairs (x, I*(x)), where, for each x, I*(x> is the minimum of the energy function over 1. In particular, given the structure of E(x,I), it is
where U ( x ) = inf U ( x ,I). I
For particular forms of U(x,I),U(x) can be computed analytically, and F(x) becomes a function that addresses the discontinuities implicitly rather than explicitly. Considering a prior in the form U3(x,l) in Eq. (44), and exploiting the independence of the line elements, Blake and Zisserman (1987a) calculated a U(x) in the form (36), where the neighbor interaction function 4 is the truncated parabola (37). Geman and Reynolds (1992) generalized this result and derived a duality theorem to establish the conditions on U(x) and U(x,I) for which Eq. (50) holds, so that minimizing E(x,l) is equivalent to minimizing F(x). They started from a prior energy U(x), called primal, in the form of Eq. (36).
116
L. BEDINI ET AL.
Their duality theorem shows sufficient conditions, to be satisfied by 4, for the existence of a function U(x, I) satisfying Eq. (50). U(x,I) is called dual and contains an explicit “line process’’ I, suitably correlated to the intensity process. The theorem is formulated as follows. Given a function 4(1) with the following properties in [0, +m): 1. +(O) = 0. 2. 4tfi) is concave. 3. lim, + - + ( t ) = CY < ~
+m.
then there exist two functions t ( b ) and $ ( b ) defined on an interval [O, MI such that
and satisfying the following properties: 1. 2. 3. 4. 5.
$ ( b ) is decreasing. $(O) = ff.
$(MI = 0. ( ( b ) is increasing. ( ( 0 ) = 0.
The geometrical proof of the theorem is based on the fact that ~$(\/r)is the lower envelope of a one-parameter family of straight lines y = rnt + q, where rn = t ( b ) and q = $(b). Thus, if 4(\/r) is strictly concave, then $ ( b ) is strictly decreasing, t ( b ) is strictly increasing, and M is the right-hand derivative of 4(fi) at the origin. The theorem allows us to define the dual prior energy U(x, I) in the form:
with the property that F(x), as defined in Eq. (49), can be seen as the minimum of E(x,I), as defined in Eq. (481, with respect to 1. Thus, if (x*,l*) is the minimizer of E, then x* is the minimizer of F, and the problems of minimizing functions (48) and (49) are, in this sense, equivalent. In the dual posterior energy E, the line process I is directly associated with the intensity discontinuities. As in F(x), the term [@<x)l2 enforces a magnitude limitation on the value of the kth derivative, that is, a smoothness constraint on x. Nevertheless, the increasing function &-) weakens this constraint where 1, assumes a low value, thus marking the presence of
EDGE-PRESERVING IMAGE RECONSTRUCTION
117
a discontinuity. The function $ has the effect of “balancing” the energy, thus preventing too many lines from being created. Note that, whereas so far the line elements have been considered as being binary, here they can assume continuous nonnegative values in 10, MI. Each of the three neighbor interaction functions in Eqs. (37)-(39) satisfies the requirements of the duality theorem and then leads to a corresponding dual prior energy of the type (521, for suitable forms of the functions 5 and r(l. When + ( t ) is the truncated parabola (37), Geman and Reynolds showed that Eq. (51) is satisfied with
(53a)
5(b) = b , +(b)
=
0 I b I 1;
(1 - b),
the infimum is reached with
and the line process is thus binary. In case (38) it is 5(b) +(b)
=
(b - 2 1 6
=
b,
+ l),
0 I b I 1,
from which the minimizer of Eq. (51) is b*
1
=
(t2
+ 1)2
*
For Eq. (39), Geman and Reynolds computed
and
(53b)
118
L. BEDINI ET AL.
Aubert et af. (1994) extended the duality theorem by relaxing the assumptions on 4. They proved that a function whose behavior at infinity is at most linear may act equally well in preserving discontinuities from excessive smoothing. It can easily be verified that function (41) satisfies the hypotheses of this extended theorem. The duality theory was derived for noninteracting line processes. If interaction terms between two or more discontinuities have to be included in the image model, to encourage or penalize particular line configurations, the discontinuities should be treated explicitly. Alternatively, if we want to maintain implicit lines (for instance, for computational purposes), we must suitably approximate E(x,l) to eliminate the line process. This subject will be developed in Section VI.
V. ALGORITHMS In the previous sections, we showed how the image reconstruction and restoration problems become estimation problems, in which the expectation of a particular cost function over the posterior density must be minimized. We showed three cost functions, leading to the maxima of the posterior marginals (MPM), the marginal posterior means, and the joint posterior mode (MAP) criteria. The posterior densities to be treated are rather complicated functions, especially when edge-preserving priors are used. In particular, they are generally nonconvex and depend on a very large number of variables. The MAP criterion, which requires the maximization of the joint posterior, must thus be implemented as a nonconvex optimization algorithm. The minimization of the two other cost functions shown in Section II1,C is reduced, for the MPM criterion, to the maximization of many one-dimensional functions and, for the marginal mean criterion, to the computation of many averages over one-dimensional densities. However, these one-dimensional densities are obtained by integrating the joint density over a high-dimensional domain. Stochastic methods have been studied to obtain good solutions in all these cases, but they are not the only approaches that can be used. For example, for MAP, deterministic algorithms are often more efficient than stochastic ones and can reach nearly optimal solutions. In this section, we describe stochastic methods for computing the estimates shown in Section III,C and two particular deterministic strategies for MAP estimation. Unless otherwise specified, field x can be
EDGE-PRESERVING IMAGE RECONSTRUCTION
119
interpreted both as the only intensity process and as the coupled intensity-line process.
A . Monte Carlo Methods for Marginal Modes and Averages Because of the high dimension of the joint posterior d x ) , the integrations needed to derive the marginals u i ( x i ) cannot be performed by means of deterministic numerical integration procedures, in that a very large integration grid would have to be used, and this would result in a very inefficient algorithm from a computational viewpoint. Stochastic integration methods can reduce by several orders of magnitude the number of points at which the function must be evaluated. The basic idea of stochastic integration is that a much more efficient integration grid can be obtained by sampling at a higher rate where the integrand assumes higher values. Following a Monte Carlo integration method, this is accomplished by randomly generating points of the integration domain, in accordance with a suitable probability density (see Hammersley and Handscomb, 1985). In our case, this density is d x ) ; we thus need a procedure for generating random samples of the integration space distributed as d x ) . A suitable procedure is the well-known Metropolis algorithm (Metropolis et al., 19531, which simulates the evolution of a multivariate physical system at its thermal equilibrium. Given a density function in the form d x ) = exp{ -E(x)}, the elements of x are visited in any order, and a random value is generated for each of them. Let x* be the state of x when an update for the value of a pixel is proposed, and let x k be the previous state. The value A E = E ( x * ) - E ( x k ) is then calculated; if it is negative, that is, if the update lessens the total energy of the system, the update is accepted, and we let x k + ’ = x*. If the energy change is positive, the update of the pixel is accepted with probability exp( -A.E}. In practice, a random number 7,uniformly distributed in [0,1), is generated and, if T < exp{ - A E } , the update is accepted; otherwise, the proposed update is refused, and x k + ’ = x k . It can be proved that the successive updates of x are a homogeneous Markov chain whose stationary state is distributed with density d x ) . In practice, after a sufficiently large number, say h, of updates, the system reaches the “thermal equilibrium,” and the successive updates of x are distributed in accordance with a ( x ) . In Geman and Geman (19841, another technique for drawing a sample from a probability density is shown: this is the so-called Gibbs sampler algorithm. Geman and Geman proved that, if a ( x ) is a Gibbs distribution, a random sample distributed as u ( x ) can be drawn by updating each pixel
120
L. BEDINI ET AL.
on the basis of its local conditional probability, which depends only on the state of its neighborhood. This means that the algorithm can be easily parallelized. Using a Metropolis algorithm to generate random samples, d f ) ,of the integration space and a Monte Carlo integration method, the posterior marginal for the ith pixel is
that is, for each value of x i , the marginal is given by the relative frequency of that value among the samples generated after equilibrium. From Eq. (59), the MPM estimate is given by x i = arg max a'( a)
Vi,
(I
and the marginal means are
The algorithm (61) for the estimation by the marginal means criterion is called the thresholded posterior mean (TPM), because each pixel is estimated as the mean of the values it assumes after a threshold number, h , of updates.
B. Stochastic Relawtion for MAP Estimation We said that the Metropolis algorithm is capable of generating points of the integration domain distributed according to d x ) . It is interesting to see that the exploration of the state space in this case is such that transitions with increasing energy are allowed with nonzero probability. We can control the generation of increasing energy transitions so as to skip the local energy minima and reach a global minimum. This can be accomplished by introducing a parameter to control the peaking of the posterior. This is the principle underlying the simulated annealing minimization algorithm, which is described below. 1. Simulated Annealing
Simulated annealing is the numerical counterpart of the thermal evolution of a physical system, characterized by a large number of permissible energy states, during an annealing process. The system is suitably heated, so that
EDGE-PRESERVING IMAGE RECONSTRUCTION
121
virtually any state has the same probability of occurring. The temperature is then lowered very slowly, so that the system passes through a series of states of thermal equilibrium, until it “freezes” in a minimum energy state. If the cooling schedule is too fast, the system reaches a state corresponding to a local energy minimum. The numerical procedure generates a nonuniform Markov chain whose stationary state converges to the uniform distribution over the modes of a specified energy function. Let us modify the posterior density (30) by a “temperature” parameter T:
Observe that, if the temperature is high, the density is practically flat over its domain, and all the changes proposed by a Metropolis algorithm are accepted. When the temperature goes to zero, it can be proved that (62) becomes uniform on the set of the global energy minima and zero elsewhere. If we start from high temperatures and reach the thermal equilibrium for several, slowing decreasing, values of the temperature, we are guaranteed to reach a global energy minimum. Suitable convergence criteria for simulated annealing can be found in Geman and Geman (1984) and Aarts and Korst (1989). We do not report here theoretical criteria for convergence in distribution, as they lead to computational requirements that cannot be fulfilled by any feasible procedure. However, practical criteria for reaching a good solution can be established and validated experimentally, although they do not guarantee convergence in distribution. In order to obtain feasible annealing algorithms, we must choose: 0 0
0
An initial value, To, for the temperature.
A criterion for deciding whether quasiequilibrium is reached at each temperature. A suitable cooling schedule. A criterion for determining the final temperature value and stopping the iterations.
For example, Kirkpatrick et al. (1983) proposed the following rules: 0
0
The initial temperature value is established by starting with quite a high temperature and gradually increasing it until the ratio between the numbers of accepted and proposed transitions is practically one. The iteration at temperature Tk should be long enough to permit the quasiequilibrium to be reached. This situation is reached after at least
122
L. BEDINI ET AL.
a fixed number of transitions have been accepted. The transitions are accepted with a decreasing probability for decreasing temperatures, so the iteration might be too long at low temperatures. This difficulty is overcome by fixing a maximum number of iterations per temperature value. The following cooling schedule is proposed: Tk+l = aTk,
0
where a is a real constant in the range [0.8, 0.991. The algorithm is terminated when, for a certain number of consecutive temperature values, the energy remains unaltered.
Other practical rules have been proposed by Aarts and van Laarhoven (see Aarts and Korst, 1989) and Garnero et al. (1991). 2. Mixed Annealing Introducing a line process in the image model, as shown in Section IV,B,2, raises the computational burden of minimizing the posterior. In cases in which the posterior is a convex function of the intensity process alone, the stochastic procedure of simulated annealing can be modified in order to become less expensive. In particular, in Marroquin et al. (1987), a mixed procedure is proposed, where stochastic steps are alternated with deterministic steps that support almost all the computational load of the minimization. In Bedini and Tonazzini (1992) and Bedini et al. (1993a) we propose a similar procedure (mixed annealing), with applications to image restoration and image reconstruction from projections, respectively. The minimization with respect to the intensity process is performed with a standard conjugate gradient algorithm, while the line elements are updated with a Gibbs sampler algorithm. Because the posterior energy E(x,I) is a convex function of x for any fixed configuration of the line process, the search for the global minimum can be restricted to the set of configurations (x*(I), I), where x*(l) is the minimizer of the posterior energy with the line process fixed at I (optimal conditional estimate). The nonconvex posterior energy restricted to this set is now a Gibbs energy, with the same neighborhood system as that of the prior. E(x*(I), I) can be thus minimized by an annealing scheme in which the random samples can be drawn by a Gibbs sampler. In Bedini et al. (1993a) we described this annealing procedure as follows: 1. An initial temperature, To, and a cooling schedule are chosen; the number N,)of iterations spent at temperature 7'0 and an initial guess
I, for the line process are given.
EDGE-PRESERVING IMAGE RECONSTRUCTION
123
2. For each temperature Tk and number of iterations N,, starting from the line configuration I,, the Gibbs sampler draws a line process sample I,+ I , with density
3. New values Tk+ and Nk+ are selected, and the algorithm repeats steps 2 and 3, until the stop criterion is satisfied. Note that, at each update of a single line element by the Gibbs sampler, a new optimal conditional estimate x* should be evaluated. For this reason, the algorithm is still very expensive, but its most expensive part is the deterministic one. Due to the small size of the neighborhoods, and because the line process is binary, the cost of drawing a sample from the line process is low. In Bedini and Tonazzini (19921, we proposed a possible dedicated architecture to perform the mixed-annealing algorithm. It is based on an analog Hopfield neural network and a grid of digital processors. The neural network performs the deterministic minimization over the intensity process. The digital grid, which receives the intensity process as its input, implements the Gibbs sampler algorithm. We also proposed an approximated version of the same algorithm, which experimentally gave good results and can be implemented on conventional hardware. Let 1; be a generic configuration generated by the Gibbs sampler starting from 1, and differing in only a few elements from I,. Let us assume that E(x* (I; ), I;) = E(x* (I,), I; ). Step 2 in the mixed-annealing algorithm can thus be split as follows: 2a. Compute x*(l,) for a given I,. 2b. Starting from I,, the Gibbs sampler draws a line process sample with density
Assuming that a complete scan of the line grid slightly modifies the status of the line process, using Eq. (64) rather than Eq. (63) can considerably reduce the computational cost of the algorithm. In Section VIII, we show some results obtained using this approximated algorithm.
124
L. BEDINI ET AL.
C. Suboptimal Algorithms
In Section IV,B,l we said that the advantages of treating discontinuities implicitly are mainly computational. In fact, by eliminating the line process we can devise deterministic minimization algorithms, which are generally cheaper than stochastic ones, although they give no theoretical guarantee of global convergence. Note that the practical annealing algorithms do not give such a guarantee either. In this section, we present two classes of algorithms that can be totally or partially deterministic and can also treat explicit discontinuities, by eliminating the line process using the duality theory or by splitting a minimization step as in the mixed-annealing procedure. 1. Graduated Nonconvexity An implicitly edge-preserving neighbor interaction function, e.g., one of
those shown in Section IV,B,l, eliminates the explicit line process from the posterior, but the posterior is still a nonconvex function. Blake and Zisserman (1987a) derived a fully deterministic algorithm to minimize a posterior energy with a neighbor interaction function of the type (37); this algorithm can also be extended to other types of functions. Blake (1989) proved its lower complexity when compared with stochastic relaxation techniques. This algorithm is called graduated nonconvexity (GNC). It is based on a sequence of approximations, FP(x), of the posterior energy in the form of Eq. (49), depending on a real parameter, p E [O,p*l, and such that Fo(x) = F(x) and FP*(x) is a convex function. Gradient descent algorithms are then applied to minimize the modified posterior energies, for decreasing values of p . starting from p = p * . The starting point of each minimization is the minimizer found for the previous value of p. If F P * is already a good approximation of F, then the algorithm can reach from the first iteration a point that is close to the desired global minimizer. As FP approaches F, this estimate is refined, so that a good approximation of the global minimizer can be obtained. Once the suboptimal estimate for the intensity x has been obtained, an explicit line process can be recovered from the neighbor interaction function, as shown in Section IV,C. The fundamental step for the implementation of a GNC procedure is the creation of a sequence of approximations such as those described above. This step is specific to any particular posterior energy. For the usual Gaussian likelihood and the implicit weak membrane model, Blake and
EDGE-PRESERVING IMAGE RECONSTRUCTION
125
Zisserman built the following primal energy:
with +(t)
=
min( a ,At').
(66)
To construct their GNC procedure, they substituted this neighbor interaction function with the following series of piecewise polynomials
\a
otherwise,
with C*
c = -
P
9
2 r'="(T+:),
and
a q=-.
Ar
Figure 6 shows a typical diagram for a function in the form (67).
0.5 1 I .5 2 FIGURE6. Typical plot of a function in the form (67),which approximates function (37) (see Fig. 1 for a comparison). 0
126
L. BEDlNl ET AL.
Gerace (1992) derived a series of approximations for a function of the type (39). These are
I.’
+4
otherwise
with aAp - p’(Ap
a A’
r=
2 p ( Ap
4=
+ a)‘ ’
Ap
+a)
+a
It is straightforward to verify that, for p = 0, Eq. (67) is reduced to Eq. (66) and Eq. (68) to a function of the type (39). The important issue is now the search for a value p* such that the resulting FP* is convex. In Blake and Zisserman (1987a1, this is done by “balancing” the positive second derivative of the likelihood term against the negative second derivative of +P*. If c# is designed to satisfy &pP*
at2
2 -c*
vt,
then the Hessian of FP* is positive definite. In their case, Blake and Zisserman found the condition 0 < c* < 1/(8a2). In practice, c* should be chosen such that +“* is as close as possible to and this leads to c* = 1/(8a2). The application of the same criterion to Eq. (68) leads to the condition (Gerace, 1992)
+,
c*
2
A 2a
and
for generic values of h and a. In Fig. 7, we show an example of function (68). Recently More and Wu (1995) outlined a general method for obtaining a family of approximations for an objective function. This method is based on applying the Gaussian filter to the objective function. The role of parameter p is played here by the variance of the Gaussian kernel. Under fairly general assumptions, it can be proved that a value of the variance
EDGE-PRESERVING IMAGE RECONSTRUCTION I
I
1
1
I
1
I
I
127
4JP(t)
I
0.5
0
6 8 10 FIGURE 7. Typical plot of a function in the form (68), which approximates function (39) (see Fig. 1 for a comparison). 0
2
4
always exists beyond which the filtered function is convex; this value plays the role of p*. Applying the Gaussian filter to a general function is not easy; however, it becomes easier if the function assumes a particular form. 2. Generalized Expectation Maximization The expectation-maximization(EM) approach was proposed to solve maximum-likelihood estimation problems (Dempster et al., 1977) and has been applied to tomographic reconstruction problems (see Shepp and Vardi, 1982; Lange and Carson, 1984; Levitan and Herman, 1987; Hebert and Leahy, 1989; Green, 1990; Hebert and Gopal, 1992). Note that, in these applications, the posterior density is not Gibbsian, because the integral relationship between the data and the image gives rise to neighborhoods that coincide with the entire image. This means that the computation required for maximum likelihood or MAP is much more complex than the one required for other image processing applications. In particular, the large “posterior” neighborhood prevents the application of efficient parallel procedures. The EM approach was devised as a tool to develop effective optimization algorithms, although their convergence is guaranteed only on a stationary point of the objective function. This approach consists of reformulating the estimation problem in a new parameter space, where the optimization is simpler than the one to be performed in the original space. The key point of the procedure is, in fact, to define this auxiliary space; in practice, this definition is often suggested by physical considerations,
128
L. BEDINI ET AL.
although a physical relationship between the original and the auxiliary variables is not actually needed. Suppose we have a sample space G of observable data, denoted the incomplete data space. We use a particular realization g E G to estimate a solution x, by maximizing the likelihood function f(g 1x1. The direct maximization of function f may be difficult. The basic idea is to solve our problem in a new sample space, 2, called the complete data space, where this problem is simpler. Space Z does not need any particular property, and its elements do not need to be observable quantities. The only requisite is that the elements of G must be obtained by an explicit many-to-one relation from the elements of Z. Let us denote with z an element of 2, with /(z 1x1 the likelihood function in the new space, and with h : Z + G the many-to-one relation. The EM procedure consists of iteratively finding the maximizer over x of the likelihood f, making use of the associated density /. To obtain this result, we establish an initial guess xo and perform iteratively the following two steps: E-step: calculate the expectation of the function lod/'(z I x)] conditioned by the observed data g and the current estimate X":
&[log[P(z Ix)llg,x"]. (71a) M-step: calculate the new estimate x " + I as the maximizer of (71a):
These two steps can be interpreted intuitively as follows. We cannot directly maximize the function log[/(z 1x11, because it has been derived in an unobservable space. We are thus able only to maximize its expectation conditioned to the knowledge of the observed data and the current estimate of the unknowns. The proof that the successive estimates of x converge to a stationary point of f<s 1x1 can be found in Dempster et al. (1977). This property remains valid even if the M-step is not actually a maximization but is substituted by the following: M'-step: determine a new estimate x"" such that The EM algorithm, modified by substituting Eq. (71b) with Eq. (72), is known as the generalized ~pectation-maximization(GEM) algorithm. Its convergence properties are roughly the same as those of the original EM, except the convergence rate, which is slower in GEM. For the choice between the two approaches, one must bear in mind that the higher
EDGE-PRESERVING IMAGE RECONSTRUCTION
129
convergence rate of EM is normally paid in computational complexity, which is higher for step (71b) than for step (72). The choice should thus be guided by considerations of the complexity in evaluating the objective function. The EM/GEM strategy can be applied to a MAP problem by simply adding the log-prior - U(x) to the function to be maximized (or increased) in the M-step (or M'-step). A GEM strategy can also be applied in cases of posteriors with explicit line processes (see Bedini et al., 1994d1, by splitting the M-step as follows: M,-step: find an x"+ such that: E,[log[J(z
IX"+
')I Ig, x " ] - U(X"+
2 E,[log[/'(zIx")]lg,x"]
M,-step: find an 1""
1,
I")
- U(x",I")
(73)
such that
E,[log[/(
2
Ix"+
')I Ig,x"] - U(X"+
1,
2 ~ ~ [ l o g [ / ' ( z l x " + ~ ) ] l g , x "-]
I"+
1)
U(X"+l,l").(74)
These relations are easily verified, bearing in mind that 0
The line process, as defined in Section IV,B,2, is unobservable and thus does not appear in the likelihood function /'. The log-prior does not depend on either the complete or the incomplete data.
Note that the M-step is split almost as in mixed annealing. In Section VIII, we show an application of this strategy, where the M,-step is performed stochastically. AN IMPLICIT LINEPROCESS VI. CONSTRAINING
So far our analysis of edge-preserving reconstruction techniques has identified two major requirements: (1) feasible image models that can capture all available information, and (2) efficient, even if only near-optimal, algorithms. It would appear that none of the methods reviewed here can deal with both these requirements. Specifically, the relatively efficient deterministic algorithms have all been derived for models that treat implicit discontinuities and do not allow for self-interactions. This prevents the exploitation of the important piece of prior information regarding the geometrical features of the discontinuities. For instance, we know that in most real images discontinuities tend to form connected, thin, and closed
130
L. BEDINI ET AL.
curves, which occasionally have sharp direction changes (corner points, crosses, T-junctions, etc.). This can be seen as a sort of local smoothness property of the line field, which recursively extends the one of the intensity field. This fact has been well described and handled in a continuous setting (March, 1992). As already stressed, the MRF-based approach with explicit lines is very good at modeling constrained discontinuities, but at the price of high nonconvexity for the related energy functions. Moreover, because these functions have mixed, continuous, and binary variables, it is difficult to devise an optimization scheme besides stochastic relaxation. Modeling constraints on the discontinuity field becomes much more complicated when the line process is addressed implicitly. In fact, all the neighbor interaction functions proposed so far can only predict the presence or the absence of a discontinuity, and a formal relationship among the implicit and explicit approaches has been found only for noninteracting lines. Since disregarding evident and well-defined properties of the field to be reconstructed is not in the spirit of regularization, some attempts have been made to deal with the problem of addressing a constrained line process while maintaining the computational advantages of deterministic minimization. All the methods reviewed below are based on considering a model with explicit and self-interacting discontinuities and then on adopting suitable approximations of the prior energy U(x,I) that allow the elimination of the line process itself, thus resulting in the implicit line treatment. These methods have been designed for very simple self-interactions of the lines, such as expressing a line continuation constraint or a penalization of parallel adjacent lines. The line continuation constraint, also referred to as the hysteresis property, is characterized by the following form of U2W: '2(')
C h l , J h l , J +Elf f
C
I.J
1.1
-
=
"l,J'l+1,J9
(75)
where parameter E takes positive values in [0,1) and controls the amount of line propagation. The price to be paid to create a discontinuity is decreased by E(Y when a discontinuity at a neighboring site is present. The discontinuity field is generally thin, in the sense that multiple responses to a single edge are not feasible (nonmaximum suppression) (Canny, 1986). Thus, to penalize the formation of adjacent parallel lines, the following form can be chosen for U2(1):
Uz(1)
=
+Y
C
hi.jht+l,j
I7J
where y can now take any positive value.
+Y
C 1,J
ui,,ui,j+]*
(76)
131
EDGE-PRESERVING IMAGE RECONSTRUCTION
Below, we will restrict our analysis to the line continuation constraint; parallel line inhibition is a slight modification of the former. A. Mean Field Approximation Geiger and Girosi (19911, in the framework of statistical mechanics and mean field techniques, derived an approximated solution for both the intensity and the discontinuity fields, which is also suitable for treating self-interacting discontinuities. They first considered MRF models without interactions of the line process and used the mean field theory to obtain mean statistical values for the intensity field x and for the line process 1. These values are actually functions of the data and the partition function 2. Because of the practical difficulties with computing the partition function, they proposed to eliminate the line process I from 2 and then derived some approximations to obtain a set of nonlinear equations with reduced complexity. They called these equations deterministic to stress the deterministic character of the whole procedure. More specifically, they considered the function Z in the form:
z
=
C exp[ -E(x,I)/T]
(77)
x, I
where the energy E(x,I) is the sum of the data term and the prior energy U(x,I) which, in this case, consists of the mixed term alone. Owing to the independence of the single line elements, the summation on I can be computed analytically and this results in a new expression for Z, where the mixed term now assumes the form of a function U,(X) depending on the temperature T. Assuming that the fluctuations around the mean values are small, suitable approximations can be adopted to obtain the following set of equations for the mean values X i , j , j , and Ei, j .
zj,
21.1 . . = g1.1 ..-2u2A[(X.1.1. - X i , j + l ) ( l +(ii,j
-Ei+l,j)(l
-
-zi,j)
-Ei,j) - ( X i , j - l -Xi,j)(l
-
(Xi-1.j
X.1 . 1.)(1 - % j - , , j ) ] ,
(78a)
1
h 1.1 . .=
-u.
-
-Ei,j-l)
.
1.1
=
(78b) 1
( 78c)
132
L. BEDINI ET AL.
This set of equations can be solved using a fast, parallel, and iterative scheme. Note that the family of functions &(XI can be regarded as a family of approximating functions to be used in a GNC algorithm, with the neighborhood interaction function given by:
( f
~ $ ~ (=t A) t 2 - T In 1 + exp - -(
[
cy
- At2)
)].
(79)
In particular, it is straightfonvard to verify that becomes the truncated parabola, when T goes to zero. Gerace (1992) showed that, using Eqs. (78), a suitably large value of T exists such that UT(x)is convex, as required by the GNC algorithm. Geiger and Girosi also showed that the case of interacting discontinuities can be treated by simply augmenting the prior energy with a term U2(l). They explicitly considered the case in which the line continuation constraint is enforced [Eq. (791 and again derived a set of deterministic equations in the form (78a) for the mean values Zi,j , while for the mean values and Ei, they obtained:
xi,
1
The main drawback in this case is that, for each iteration, the transcendental equations (80) must be solved. In Geiger and Girosi (1989), ad hoc approximations are adopted to obtain a local version of Eqs. (80). Similar equations can be derived when different constraints are considered, for instance to enforce constraints which penalize the formation of adjacent parallel lines [Eq. (7611.
B. ExtendedGNC A different way to implicitly manage the line continuation constraint is given in Bedini et al. (1994a, b). It consists of approximating U2(I) in Eq.
133
EDGE-PRESERVING IMAGE RECONSTRUCTION
(75) in the following way:
where
In practice, this assumption means that we approximate the true values of with the functions 5 ( x i ,j - ! - x i + j - 1 the line elements hi,j - and 0;and t ( x i - l , j - x i - I , j + l ) which depend only on the intensity gradients across the line elements themselves. The approximated energy is 9
which can easily be minimized over I to give the following primal energy:
+
C 4( x i , j
- xi + 1 , j
9
xi. j + 1 - xi + 1, j + 1 )
(84)
3
i,j
where the neighbor interaction function
4(ll,t2)= min A ( l - s ) t : S
+ as - & a s t ( t 2 ) ,
tl,t2E R,s
does not depend on the particular site (i, j ) and is given by
E
(0,l)
134
L. BEDlNl ET AL.
with
The term \/a(1 - & ) / A is called suprathreshold. If the intensity gradient is greater than the threshold a line element will be created; if the intensity gradient is lower than the suprathreshold, the smoothness constraint will become active; if the gradient value falls between the suprathreshold and the threshold, the creation of a line will depend on the gradient across a neighboring line element. In Fig. 8, a surface plot of function (85) is reported for particular values of the parameters a,A, and
m,
E.
Analogous results can be obtained when U,(I) assumes a form which inhibits line parallelism; it suffices to substitute -&a with y in Eqs. (83) and (8617 ( X , , , t I - x, i I , , + I ) with ( X , + l . , - X,+Z.,)’ and ( X i + l . , - x, t I., + I ) with ( x , , , + I - x ,,,, 2 ) in (811, (83) and (84). To minimize the primal energy (84) using a GNC algorithm, a convex approximation F* (x) must first be provided, by constructing an appropriate neighbor interaction function &*. Following the criterion given by Blake and Zisserman (1987a1, in Bedini et al. (1994a) we proved that, when A = I, if 4* satisfies
a *&*
-(t,,t*) at;
2 -c*
where 0 < c* < 1/32a2, then the Hessian of F* is positive definite. For adjacent parallel line inhibition, the same result holds with 0 < c* < 1/24a2. In practice, to have &* as close as possible to 4, we set c* = 1/32a2. We first derived a two-parameter family of energy functions F ( p > ” ) , which are continuous and continuously differentiable, and identified from them a convex function, by applying inequalities (87). The F ( P * u )are ~ constructed by replacing 4 in (84) by suitable neighbor interaction func-
EDGE-PRESERVING IMAGE RECONSTRUCTION
135
FIGURE8. Typical surface plot of the neighbor interaction function (85).
tions 4 ( P v u ) , whose definition follows the same criteria adopted to derive functions 4“ in the previous section. In formulas, we have
otherwise, (88)
m,
where s = + P ( t ) is given by Eq. (67), with c* = 1 / 3 2 a 2 , and 4,P(t)is the same as 4P(t>,with a substituted by a ( l - E ) . Function U ( P , ~ ) ( ~is) given by
In Fig. 9, a surface plot of function (88) is shown, for particular values of the parameters A, a, E , p , and u. The neighbor interaction function 4(tl,t 2 ) can be recovered from Eqs. (88) and (89) in the limit of p to 0 and u to \lcu/h. The upper bounds for both p and u are given by those values p* and u* for which the corresponding F* = F ( P * * ’ * ) is found to be
136
L. BEDINI ET AL.
FIGURE9. Typical surface plot of a function in the form (881, which approximates function (85) (see Fig. 8 for a comparison).
convex. In Bedini et al. (1994a) we proved that p* and u* must satisfy the following inequalities: p* 2 1,
(90a)
2 ( 6-
d
m
6 ( u * - s)
2p*A
+ c*
) 9
8A s c * ( u * - s)
-
In order to find suitable values for u* and p* from these inequalities, we ) first chose u* such that (90b) is verified; then, substituting u* in ( 9 0 ~ and (90d), we looked for a p* 2 1 that verifies them. Such a p* will always exist because, in the limit of p* to infinity, the left-hand terms of both (904 and (90d) go to infinity. Moreover, the greater the u* chosen, the smaller the p* needed. The GNC algorithm begins by minimizing F* = F(P**'*).Then p is For every decreased from p* to 0, while u is decreased from u* to fl. value of p and u, F(P,") is minimized by a gradient descent, starting with ) . call this algorithm extendedthe local minimum of the previous F ( P * UWe GNC (E-GNC).
137
EDGE-PRESERVING IMAGE RECONSTRUCTION
C. Sigmoidal Approximation In Bedini et al. (1995), we proposed a way to eliminate the line process from a generic energy function E(x, I) of the following form:
+A
C ( x i , j -xi+*,j)*(1 i.j
- hi,j) + a
C h i . j + a C ui,j i,j
i,j
C Q h ( h i . j ~ h m , n ~ u m , n ~ ( ~ ~ Nn h) ( i y j ) ) i.j
+ C Qo(ui, j ' h m , n urn,,, ( m , n)
E
N ( i 9
d),
(91)
i7i
where the sixth and seventh term express constraints on the configurations of the discontinuities, through self-interactions of any order of the lines. The order of the interactions is determined by the size of the neighborhoods, Nh and N,, adopted for the generic horizontal and vertical line elements, respectively. We suggested substituting each line element in (91) with a function qT(r),depending on a parameter T and with values in [O, 11, where t is some measure related to the local intensity gradient. Function q T ( t ) is chosen with the following properties: (a) For any T > 0, q,(t) is increasing and continuously differentiable. (b) as T goes to zero, q T ( t ) converges to the function
where 8 is the step threshold, which may be site dependent. We chose our functions q T ( t )in the family of the sigmoid functions and assumed the difference between two adjacent pixels as a measure of the local intensity gradient. For the generic horizontal line element, hi,j , we set
138
L. BEDINI ET AL.
and, for the generic vertical line element, ui,j , 1
(92b) In Fig. 10, plots of the sigmoid function q T ( t )are reported for different values of temperature T and for 0 = 60. Given Eqs. (92a) and (92b), Eq. (91) assumes the form:
'- 4
T = 1000 1r = 2000 1r = 3000
p
0.a-
0.60.4-
t
O
0
20
, 40
60
b 80
FIGURE10. m i c a 1 plots of the sigmoid function of temperature.
, 100 120 q T ( t )of Eqs. (92) for different values
EDGE-PRESERVING IMAGE RECONSTRUCTION
139
ensures that there exists a function F(x), in general nonconvex and nondifferentiable, which is the limit of the sequence F,(x) as T goes to zero. Moreover, a value for T always exists such that the corresponding FT(x) is convex. Indeed, it can be immediately verified that, when T goes to infinity, the sigmoid function converges to the constant 1/2. In practice, a finite value, even if it is high, for T*, such that F J x ) is convex can be found, following the general criterion established by (87). The application of this criterion under the nonrestrictive hypothesis that the discrete gradient of x is bounded, leads to an inequality for T* which depends only on A and a. These conditions permit the application of GNC-type algorithms, based on the successive minimization of the various F,(x) via gradient descent techniques. Note the form that F ( x ) assumes in particular cases. Let us first assume Q,, = Q, = 0; in this case, Eq. (91) becomes the weak membrane energy. The extended form of Eq. (93) is now given by llg - AxlI2 FT(X) =
2a2 . j ) ~ e x -((x;,j p[ -xi+l.j)2 e 2 ) ) / ~+] a + cA(Xi+j- ~1i ++ Iexp[ -((xi,j r+l.j)2- e 2 ) / ~ ] -
-x.
i. j
which, in the limit of T to zero, becomes:
where +(t)
=
(
it2
if I ~ 0. (22) Our aim is to relate the beam wavefunction in the field-free output region,
+out(r)
=
+(rl
- 22
%It),
(23)
to the beam wavefunction in the field-free input region, +in(r) =
Jl(r1 , z
SZin),
(24)
so that the values of the observable beam characteristics in the output region can be related to their values in the input region using the wavefunction. To this end, the most desirable starting point would be a z-evolution equation for Jl(r I ,z ) linear in d / d z . So, first, we cast Eq. (15) in such a form using a method similar to the way in which the Klein-Gordon equation is written in the Feshbach-Villars form (linear in a/&), unlike the Klein-Gordon equation (quadratic in d / d t ) . (See Appendix A for the Feshbach-Villars form of the Klein-Gordon equation.) Let us write Eq. (16) as
k(r) defining
1/2
=
{,ti- i 2 ( r ) )
,
(25)
264
R. JAGANNATHAN AND S. A. KHAN
Now, let
Then, Eq. (27) is equivalent to
--(-k , i
p4)(
;;)
d - i
dz
with fi: = fi; + 8;.Hereafter, for notational convenience we shall not usually indicate the r-dependence of $, A, k , f , etc. Next, define
(:I)
$1
+
*2
I ( $1
-
*2)
1 =
Consequently, Eq. (29) can be written as
265
OPTICS OF CHARGED PARTICLES
Multiplying Eq. (31) throughout by -1 and taking the A, term on the left-hand side to the right-hand side we get
--(
i *+ k , d z 4-
)=A(;:),
( 32)
with
A = -a,+9+2,
(33)
where 1 is the 2 x 2 identity matrix and oy and a- are, respectively, the y and z components of the triplet of Pauli matrices
Let us note that Z-? has been partitioned, apart from the leading term -u,,into an:‘even’’ term k? which does not couple $+ and Jr- and an “odd” term 5‘ which couples JI+ and In order to understand 3. (32) better let us see what it means in the case of propagation of the beam through free space. In free space, with +(r) = 0 and A = (O,O, 01, we have RZ(r)= 0 and
+-.
”(*+)
4 k, dz
JI-
where V: = d 2 / d x 2 kOy,k,, 1, namely,
=
-1-7v: 2ko 1 -V: 2ki
+ d2/dy2. A
--v: 2ki 1
1 + -v; 2ki
plane wave with a given k,
=
(k,,,
266
R. JAGANNATHAN AND
S. A. KHAN
is associated with
It can easily be checked that this
(?) satisfies Eq. (37). For a quasi-
paraxial beam moving close to the +z-direction, with k,, > 0 and k,, = k,, it is clear from Eq. (39) that $+P $-. By extending this observation it can be seen easily that for any wavepacket of the form
$(r)
=
/d3kocp(k0)+k,,(r),
/d3k0tcp(k0)~*= 1
with lkol = k , , k , , = k , , k , , > 0, (40) representing a monochromatic quasiparaxial beam moving close to the ~. +z-direction,
=i
!d% cp(k0) #ko. + (r) ~ d % c p o ( k d ~-(r) ~,, with lkol = k,, k,, = k,, k,, > 0, (41)
is such that $+(r) P $-(r). Thus, in general, in the representation of Eq. (32) we should expect $+ to be large compared to (I- for any monochromatic quasiparaxial beam passing close to the +z-direction through the system supporting beam propagation.
OPTICS OF CHARGED PARTICLES
267
The purpose of casting Eq. (27) in the form of Eq. (32) will be obvious now when we compare the latter with the form of the Dirac equation
where
and
01 =
( a x ,ay,a,) and
P are the 4
X 4
Dirac matrices given by
As is well known, for any positive energy Dirac q in the non-relativistic situation (In1< mot) the upper components (Wu)Aarelarge compared to the lower components (*,I. The “even” operator &YDAdoes not couple the large and small components and the “odd” operator does couple them. Using mainly the algebraic property
PgD
= -$DP
(48)
the Foldy-Wouthuysen technique expands the Dirac Hamiltonian H D in a series with l/moc as the expansion parameter. This leads to a good understanding of the nonrelativistic limit of the Dirac equation by showing how the Dirac Hamiltonian can be seen as consisting of a nonrelativistic part and a systematic series of correction terms depending on the extent of deviation from the nonrelativistic situation (see Appendix B for a risum6 of the original Foldy-Wouthuysen theory). The analogy between Eq. (32) and the Dirac equation should be clear now. The correspondences are: positive forward propagation of the beam close to the +r-direction energy Dirac particle, paraxial beam (1- I I hk) nonrelativistic motion (In14 mot), deviation from paraxial condition (aberrating system) deviation from nyrelativistic situation (relativistic motion). Also, it may be noted that in H [sce Eq. (33)] a, plays the role of P and analogously to Eq. (48) we have a,@= -@a,. Hence, by applying a Foldy-Wouthuysen-
- - -
268
R. JAGANNATHAN AND S .
A.
KHAN
like technique to Eq. (32) it should be possible to analyze the beam propagation in a systematic way up to any desired order of approximation. In the Foldy-WouthFysen theory a series of transformation! are used to ?ake the odd term in H D as small as desired: in the present HD [Eq. (4411 BD is of first order in l/m,c$or +/rn,c) and the transformations applied lead to representations of HD in which the corresponding odd terms 5ontain only s2ccessive higher powers of l/m,c so that one can choose an HD with an 19, as small as desired for the purpose of approximation. We shall apply this !echnique to Eq. (32) to arrive at a representation in which the odd term B pill be as small as we would like. We can label the order of smallness of B by the lowest power of l/k,, the expansion parameter in this context, highzr order smallness corresponding to higher power. The smallness of B in Eq. (32) is of order l/k& Following the Foldy-Wouthuysen method, let us define
Equation (32) is now transformed into
where
1
-
P),
OPTICS OF CHARGED PARTICLES
269
with the odd term of order l/ki [see Appendix B for details of the calculation leading to Eqs. (50)-(53)]. By an2ther transformation 9f the same type as in Eq. (49) with b replaced by @(') we can transform H") to a I%(2) with an &2) of order l/k$ Since the even part of H(')already represents a sufficiently good approximation for our purpose, we shall not continue this process further. Hence, we write
with H"' = - a,
dropping the odd term. Note that Let us now look at
(2;)
+ $(I)
[a,h] =
-
a.
corresponding to the plane wave +,$)
(55)
in
free space. We get
1 1
-V2
4ki
+it!-
1 4ki
-V;
1
showing that +i'd,+P for a quasiparaxial beam. This result easily extends to the wavepacket of the form in Eq. (40). Thus, in general, we can take +$I) P +)I! in Eq. (54) for the beam wavefunctions of interest to us.
270
R. JAGANNATHAN AND S. A. KHAN
We can express this property that compared to $$I), as
$$'I
P
$?, or essentially $!) = 0
Then Eq. (54) can be further approximated to read
with
We want to get the z-evolution equation for $(r). To this end, let us now retrace the transformations and rewrite Eq. (58) in terms of which $,
=
$ [see Eq. (28)].Substituting in Eq. (58)
we get (see Appendix B for details of the calculation)
=
A.(
;;),
(::)
in
27 1
OPTICS OF CHARGED PARTICLES
where
Since h, has nonzero entries only along the diagonal, Eq. (61) describes the z-evolution of and J12 independently. We are interested in the z-evolution of II,= and hence we have i dl(l - - = I?+,
+,
ko
with
-
4 -1 - -A, hk0
dz
1 2ki
- -(D:
,
-i2)
1 dZ
dz
1 + -(b: Ski
2
-
P)
1
-
-([(b: 16ki
-
[b;, x4A , ] dz
dz
P),
272
R. JAGANNATHAN AND
S. A. KHAN
Multiplying Eq. (63) throughout by hk, we get
. a*
*
=&"* O '
lh-
dz
with
8 = -Po
- qA,
1 + -(+t +p)
2 Po
+ --+; +e2) - - ( [ ( a t 1
2
8Po
1
+jq,
16Po"
;it
where* we have used the re1a:ions p o = hk,, g 2 = h2k2, and = - h2D; . Now, we can identify as the optical Hamiltonian corresponding to ih d / d z or -fi, ( = - the canonical momentum in the z-direction). Since j d2rl Jl(r I ,z)I2, representing the probability of finding the particle in the xy-plane located at z , need not be a constant along the z-axis (only l&d2rlJl(rI ,z)I2 = 1 ) the z-evolution of +(rL , z ) , given by Eq. (651, is not necessarily unitary. This implies that representing ih d / d z need not be hermitian. Actually, one should expect a loss of intensity of the forward propagating beam along the z-axis since there will, in general, be reflections at the bouqdaries of the system. So it is not surprising to find non-hermitian terms in above [the term l / p i ; the general pattern in the above expansion procedure is that among the terms (even powers of l / p o ) alternate terms are nonhermitian]. Of course, the effect of these nonhermitian terms can be Txpected to be quite sm$l and negligible. Hence, we can approximate Zo,fucher, to a hermitian W, by dropping the nonhermitian terms (or, taking W, = +A?:)); later on, we shall analyze the small influence these nonhermitian terms have on the optics of the system. Thus, we write
8
-
)=
( ~ ( ~ ( ~ , Z O ) ) , Y ( ~ ( Z , Z O ) ) )
.
(123)
Substituting the result of Eq. (122) in Eq. (1211, we get Gp(rI ,z ; r ,,, z , )
=
e ( i ~ ' ) p o ( z - z o )I ~p (e(( rz , z , ) ) , z ; r I ,,, z , ) , ( 124)
where Gp(r * ( e( z z , )) z ; 9
9
I .o 9 2,)
=
(r I ( e( Z,Z,) )lop( z
9
2,)
Ir I ,o ) (125)
is the Green's-function corresponding to the time-dependent-oscillator-like Hamiltonian No, p. The exact expressions for the transfer operator op(z, 2,) and the Green's function Gp(rI (O(z,z,)), z ; r I ,,, z,), which, in fact, take into account all the terms in the infinite series in Eq. (771, can be written down. We shall
,
O
~
~
OPTICS OF CHARGED PARTICLES
287
closely follow the prescription by Wolf (19811, which can be used for getting the evolution operator and the corresponding Green’s function for any system with a time-dependent Hamiltonian quadratic in (r I ,fi ) (see Appendix F for some details). This is possible because of the Lie algebraic structure generated by the operators {r: ,at ,r -fiI + fi I -r I). The results in this case are:
if hp(z , z o ) # 0, (127) Gp(rJ.
where
(e(z’z0))~z;rI,09z0)
288
R. JAGANNATHAN AND
S. A. KHAN
with gp(z, z , ) and hp(z,z,) as two linearly independent solutions of either ( x and y) component of the equation r", ( z ) satisfying the initial conditions
+ F( z)r
g p ( z o , z o )= N p ( Z o J o ) and the relation
1,
=
I
( z ) = 0,
hp(Zo,Zo)=g;(Zo,Zo)
( 130) =
0, (131)
for any z 2 2., (132) gp(z, z,)Mp(z, 2,) - h p ( z ,z,)g;(z, 2,) = 1, As we shall see soon, Eq. (130) is the classical paraxial ray equation for the beam, modulo a rotation about the z-axis; with z interpreted as time, Eq. (130) is the equation of motion for the classical system associated with the p , namely, the isotropic two-dimensional harmonic oscillaHamiltonian tor with the time-dependent circular frequency Now, from Eqs. (120), (1241, (127), and (1281, it follows that
h0,
*(rl
dm.
9 2 )
if hp(z , 2 , )
f
0, (133)
if hp(z , z,)
=
0, (134)
representing the well-known general law of propagation of the paraxial beam wavefunction in the case of a round magnetic lens (Glaser, 1952, 1956; Glaser and Schiske, 1953). Equation (133) is precisely same as Eq. (58.42) of Hawkes and Kasper (1994) except for the inclusion of the extra phase factor e i 2 " ( z - z ~and ) ~ Athe ~ Larmor rotation factor in the final z-plane; these extra factors would not appear if we remove the axial phase
289
OPTICS OF CHARGED PARTICLES
factor in the beginning itself and introduce a rotated coordinate frame as is usually done. We shall not elaborate on the well-konwn practical uses of the general propagation law [Eq. (13311: it may just be mentioned that Eq. (133) is the basis for the development of Fourier transform techniques in the theory of electron optical imaging process (for details, see Hawkes and Kasper, 1994). As is clear from Eq. (1341, if h p ( z , z o )vanishes at, say, z = zi, i.e., hp(zi,z,) = 0, then we can write 1
.
+ ( r l , i , z i ) = -e'Yo(ZI*zo)+(rI,i(s)/M,z,), M withM=gp(zi,z,),6= e(zi,zo), 2lr ~ o ( z i , z o )= -[(zi
-20)
+gb(zi,~o)r:,i/2~],
(135)
A0
1
I+(rl,i,zi)12 =
zI+(r, ,i(s)/M,Zo)l
2
.
( 136)
This demonstrates that the plane at zi, where h p ( z , z , ) vanishes, is the image plane and the intensity distribution at the object plane is reproduced exactly at the image plane with the magnification M = gp(zi,z , ) [see Eq. (58.41) of Hawkes and Kasper, 19941 and the rotation through an angle
a = e(zi,z,)
=
1d z e y z ) zi
20
As is well known, the general phenomenon of Larmor precession of any charges particle in a magnetic field is responsible for this image rotation obtained in a single-stage electron optical imaging using a round magnetic lens. It may also be noted that the total intensity is conserved: obviously,
290
R. JAGANNATHAN AND
S. A. KHAN
We shall assume the strength of the lens field, or the value of ~ ( z )to , be such that the first zero of hp(z, zo) is at z = zi > z,. Then, as we shall see below, M is negative as should be in the case of a convergent lens forming a real inverted image. So far, we have looked at imaging by paraxial beam from the point of view of the Schrodinger picture. Let us now look at this single-stage Gaussian imaging using the Heisenberg picture, i.e., through the transfer maps ((r )(zo), (p )(zo)) + ((r )(z), ( p )(z)). Using Eqs. (83), (98), (118), and (126), we get
,
(I. )(z)
,
,
,
(*(z,)lfi;(z, z0).,4(z, z0)l@(zo)) zo)lfi;(z, = (z,) -sin
+ sin e ( z , zo)(y)(zo), e(z, zo)(x)(zo) + cos e(z, zo)(y)(zo)),
(P* ( - e ( z * zo)))(zo) = (COS e ( z , ~ ~ ) < p , > (+ z sin ~ ) e ( z , zo)(p,)(z0), -sin e(z, zo)(px)(zo) + cos e ( z , z,)( py)(z0)). (140) Similarly, we have
(P, )(z)
= pogb(z9 zo)(r.
(-e(z9 zo)))(zo)
+ h'p(z,zO)(~l(-e(z,zO)))(zO)'
(I41)
29 1
OPTICS OF CHARGED PARTICLES
At the image plane at z = zi, where hp(zi,z,) Eqs. (139) and (141) becomes
=
0, the transfer map in
(r, >(zi) = M ( r , ( - 6 ) ) ( z , ) ,
(P, )(zi) =PogL(zi,zo)(r, (-'))(zo)
+
(P, (-'))(zo)/MS
(142)
where 6 is given in Eq. (137), M = gp(zi,z,), and 1/M = h',(zi, z,) [see Eq. (132)l. The content of Eq. (142) is essentially the same as that of Eq. (136); i.e., at the image plane a point-to-point, or stigmatic, image of the object is obtained and the image is magnified M times and rotated through an angle 6. It may also be verified directly that, as implied by Eq. (142), (r, )(zi)
=
/d2ri r1,iI#(rl,i,zi)12
=
- /d2ri r , , i l # ( r ~ , i ( 6 ) / M , z ~ ) 1 2
1
M2
= M/d2ro
r l ,,( -6)I#(r*
,o*
z,)12
= M(r. ( -*))(zo). (143) Let us now see how (r )(z) and (p )(z) evolve along the z-axis. Since
,
d
i ,
A
- Up(2, 2,) dZ
a ,
=
- ~WOUP(Z z o.) ,
dZ U ,'( 2, 2,) =
it follows that d -(r I >(2)
dz
d
-(PI dz
=
1
,
nU,'( 2, z,)A,,
x(i + ( z , ) ~ f i ~ ( 2,)z , [Qo,p,r ,] fip(z9z,)I+(z,)), 1
(144)
(145)
> ( z ) = -fi ~ + ~ ~ o ) l f i , + ~ ~ , ~ o ~ [ Q , , p ~ ~ l (146) ]fip~~~~o
Explicitly, these equations of motion, Eqs. (145) and (146), become
292
R. JAGANNATHAN AND
T(Z) =
P(Z) =
I
0 0 -F(z) 0
-eyz) 0
S. A. KHAN
1 0 0 0
0 0 0 -F(z)
0 1 0 0
o
0
0
0
0
ep(z)
(149) *
where
-
-sin O( z , 2,) 0
cos f3( z , z , )
0
0
0
cos e( z , z , )
sin e( z , z , )
’
OPTICS OF CHARGED PARTICLES
293
If we now go to a rotated coordiante system such that we can write
with ( X , Y ) and (Px, P y ) respectively as the components of position and momentum in the new coordinate frame, then Eq. (150) takes the form
Note that xyz and XYz frames coincide at z = z o .Then, the equations of motion for ( R , ) ( z ) = ( ( X ) ( z ) ,( Y ) ( z ) ) and (P, ) ( z > = ((Px)(z), ( P y ) ( z ) )become
From Eq. (156) it follows that
which represent the paraxial equations of motion with reference to the rotated coordinate frame; now, compare Eq. (158) with Eq. (130). Equation (159) is not independent of Eq. (158) since it is just the consequence of the relation ( d / d z ) ( R ,) ( z ) = (P, ) ( z ) / p o [see Eq. (15611, and a solution for (R ,) ( z ) yields a solution for (P, ) ( z ) .
294
R. JAGANNATHAN A N D S. A. KHAN
Equation (156) suggests that, due to its linearity, we can write its solution, in general, as
where, as already mentioned above, the second relation follows from the first assumption in view of the first relation of Eq. (1561, namely, ( d / d r ) ( R , ) ( z ) = (P, )(z)/po.Substituting the first relation of Eq. (160) in Eq. (158) it follows from the independence of (R ) ( z , ) and (P, ) ( z , ) that
,
g;(zJo)
+ F ( z ) g , ( z , z , ) = 0,
h”,z,zo) + F ( z ) h , ( z , z , ) = 0
(161)
Since at z = z , the matrix in Eq. (160) should become identity we get the initial conditions for gp(z,z,) and hp(z, 2,) as
gp(z,,z,) = ~ p ( z o , z o=) 1,
=g6(z0,z,)
hp(Z,,Z,)
=
0. (162)
In other words, g ( z , z , ) and h,(z, z,) are two linearly independent solutions of either (X or Y)component of Eq. (158) subject to the initial conditions in Eq. (162). From the constancy of the Wronskian of any pair of independent solutions of a second order differential equation of the type in Eq. (158) we get g p b ZO)h‘,(ZY
20)
-h
p b
= gp(z0, Zo)h’p(zo, 2,)
zo)gb(z,
20)
- hp(Z0,
Z O k b ( Z 0 , 20) =
1,
for any z 2 zo. (163) Thus, it is seen that the solutions of Eq. (1301, g,(z, z , ) and h,(z, z,), contained in Eqs. (126)-(128), (1331, and (134) can be obtained by integrating Eq. (156). Note that we can formally integrate Eq. (156) by applying the formula in Eq. (72) in view of the analogy between Eq. (65) and Eq. (156): the amtrix in Eq. (160) can be obtained using Eq. (72) by replacing (-(i/h)$,) by the matrix in Eq. (156). The result obtained gives g,(z, 2,) and h,(z, 2), as infinite series expressions in terms of F ( z ) . Then, with
OPTICS OF CHARGED PARTICLES
295
and 9 ( z , z,) as given by Eq. (153), Eq. (150) is seen to be the matrix form of Eqs. (139)-(141). This establishes the correspondence between the transfer operators in the Schrodinger picture and the transfer matrices in the Heisenberg picture: e(i/fi)Wz9zo)Lz
fip(z, z o )
+9( z , z , ) ,
= e(i/fi)e(z.4zfi-p(G =
~
(
~
,
~
o
)
20)
~
Explicitly, g,(
2, 2,) =
1 - /z&l 20
/"'&F( 20
-
@p(zJo) - , . % ( Z 7 Z O ) ,
2)
s,(2 9 2 0 )
=( z~& z, , ~) ao( z)J , ) .
(165)
296
R. JAGANNATHAN AND S. A. KHAN
x / z 3 d z z jz2dz1F( zl)( 20
2,
-zo)}+
... .
(169)
zo
It is easy to verify directly that these expressions for gp(z, z,) and hp(z, 2,) satisfy Eqs. (161) and (162). The transfer operator defined by Eqs. (71)-(74) [or Eqs. (75)-(77)1 is an ordered product of the transfer operators for successive infinitesimal distances from the initial point to the final point, an expression of the Huygens principle in operator form. Hence it can be written as an ordered product of transfer operators for successive finite distances covering the entire distance from the initial point to the final point. Thus, we can write cp(Z,Zo)
c
> z,,
= ~ ~ , p ( ~ , ~ r ) ~ ~ , p ( ~ r , z ~ ) ~ D , pwith ( ~ ] 2,, ~ o Z) ,I , ~
(170)
where D refers to the drift in the field-free space and L refers to the propagation through the lens field. Consequently, one has
$(rl
9 2 )
= / ~ 2 ~ r / ~ z ~ ~ / ~ , 2z ;~r ~ o , ~, ~~ z ,1 )p ( ~ ~
x GL.p(r I , I f 21;
x
I , I , 21)
, 1+(r I ,o
~ D , p (1 r , I , 21; r I ,n z o
9
zo)
*
(171)
297
OPTICS OF CHARGED PARTICLES
Using the direct product notation for matrices,
where
a= e(t,z,)
=
e(Zl,rr).
Since F ( z ) = 0 outside the lens region, we have, from Eqs. (166)-(169),
with
Il as the 2 x 2 identity matrix. For the lens region
=
(gp*L gP,L
"IL)
h'P,L
8
R(6),
(175)
with g , , = gp(zr,zI), h , , = hp(zr,q), g;,, = g$z, Z ~ ) I ~ = ~lip,, ,, = h',(z, Z ~ ) I ~ = ~ , Then, . substituting Eqs. (17314175) in Eq. (170) we get the
298
R. JAGANNATHAN AND S. A. KHAN
identity
If we now substitute
then Eq. (177) becomes the familiar lens equation
u1 + ,1= T 1,
(179)
with the focal length f given by
Equation (178) shows that the principal planes from which the object distance ( u ) and the image distance ( u ) are to be measured in the case of a thick lens are situated at
OPTICS OF CHARGED PARTICLES
299
The explicit expression for the focal length is now obtained from Eqs. (168) and (180):
To understand the behavior of this expression [Eq. (18211 for the focal length, let us consider the idealized model in which B ( z ) = B = constant in the lens region and 0 outside. Then l/f = (qB/2p0) sin(qBw/2po) where w = (zr - z , ) is the width, or thickness, of the lens. This shows that the focal length is always nonnegative to start with and is then periodic with respect to the variation of the field strength. Thus, the round magnetic lens is convergent up to a certain strength of the field beyond which it belongs to the class of divergent lenses, although this terminology is never used due to the fact that the divergent character is really the result of very strong convergence (see Hawkes and Kasper, 1989a, p. 229). In practice, the common round magnetic lenses used in electron microscopy are convergent. The paraxial transfer matrix from the object plane to the image plane now takes the form
as is seen by simplifying Eq. (176) for z = zi using Eqs. (177)-(180). Note that in our notation both u and u are positive and M is negative, indicating the inverted nature of the image, as should be in the case of imaging by a convergent lens. Another observation is in order. When the object is moved to -03, i.e., u + w, u is just f. Hence, the focus is situated at zF = zPi + f = zr -k fgp,*.I (184) Now, with the object situated at any z , < z , the transfer matrix from the object plane to the back focal plane becomes (185) as is seen by substituting z
= zF
in Eq. (176) and simplifying using Eqs.
300
R. JAGANNATHAN AND S. A. KHAN
(178), (180), and (184). The corresponding wave transfer relation in Eq. (133) shows that, apart from unimportant phase factor and constant multiplicative factor, the wavefunction in the back focal plane is equal to an inverse Fourier transform of the object wavefunction at z , < z , (see Hawkes and Kasper, 1994, pp. 1248-1249 for more details). Let us now consider the lens field to be weak such that l:'dzF(z)
4
w = 2, - 2,.
l/w,
( 186)
I
Note that 1: d z F ( z ) has the dimension of reciprocal length and for the weak lens it is considered to be very small compared to the reciprocal of the characteristic length of the lens, namely, its width. In such a case, the formula for the focal length [Eq. (182)] can be approximated to give 1
- = /:"F(
q2
q2
1
7 dzB2(Z ) (187) 4Po Z I 4Po --m f l which, first derived by Busch (1927), is known as Busch's formula for a thin axially symmetric magnetic lens (see Hawkes and Kasper, 1989a, Chapters 16 and 17 for details of the classical theory of paraxial, or Gaussian, optics of rotationally symmetric electron lenses). A weak lens is said to be thin since in this case f*w ( 188) as seen from Eqs. (186) and (187). For the thin lens the transfer matrix can be approximated as Z) = 7 /"dzB2( z ) =
1 1 - - ( z - zp)
f
1
--
f
1
--(zp - z,)(z
f
1 1 - -(
f
1
- zp)
Zp - z o )
1 with zp = ~ ( z+ ,2,). (189)
In this case the two principal planes collapse into a single principal plane at the center of the lens. If imaging occurs at z = zi for a given zo then u = zp - z , and u = zi - zp satisfy the lens equation l/u + l / u = l/f
OPTICS OF CHARGED PARTICLES
301
and the transfer matrix from the object plane to the image plane becomes
( -y/f l;M)
8 R(6), with
M
=
-v/u.
From the structure of the
transfer matrix in Eq. (189) it is clear that apart from rotation and drifts through field-free regions in the front and back of the lens the effect of a thin lens is essentially described by the transfer matrix
(-bf
:)
which, as seen from Eq. (1341, corresponds to multiplication of the wavefunction by the phase factor exp(-(i?r/A,f)r:) as is well known. As has been emphasized by Hawkes and Kasper (1994) (see pp. 1249-12501, although the attractive paraxial theory is in full agreement with the corresponding classical corpuscular theory it is certainly wrong, since the inevitable lens aberrations and all diffraction at the beam-confining apertures are neglected. Let us now look at the aberrations due to the beam not being ideally paraxial.ATo this end, we shall treat the nonparaxial teArmsin the Hamiltonian W, as perturbations to the paraxial Hamiltonian W , and use the well-known technique of time-dependent perturbation theory in quantum mechanics utilizing the so-called interaction picture. In the classical limit this treatment tends to the similar approach pioneered by Dragt et al. (e.g., see Dragt and Forest, 1986; Dragt et al. 1988; Rangarajan et al., 1990, Ryne and Dragt, 1991; see also Forest et al., 1989; Forest and Hirata, 1992) for dealing with the geometrical optics of aberrating charged-particle optical systems. When the beam deviates from the ideal pacaxial condition, as is usually the case in practice, we have to retain in [Eq. (66)] terms of order higher than second in (r I ,fi *). Thus, going beyond tbe paraxial approximation, the next approximation entails retaining the 3 terms of order up to fourth in (r I ,fi I ). To this end, we substitute in [Eq. (6611
. Using the result
[see Eqs. (139)-(141)1, with fip = fip(z,zo), g = gp(z7z o ) , h = h&z, Zo)? g' = g&, z0), h' = h&, zo), and 8 = O(z, z,), and Eqs. (193) and (198), we find, after considerable but straightforward algebra, that
304
R. JAGANNATHAN AND S. A. KHAN
where { A ,B }
= AB
+ BA and 7/ l
C(Z,2,)
=
z
&{( a 4 - a a ” ) h 4 + 2a2h2hr2- W4},
LO
51 / z &{( a 4- aa”)gh3+ a’(gh)’hh’ + g’h’’),
K ( z , zo) =
20
[:&(
k(z,z,) =
1
A ( z , 2,)
=
5/
2
( i1 a ” -
1
&{( a 4 - a a ” ) g 2 h 2
zo
+2a2gg’hh‘ + g”h‘’ - a’), a ( z , z , ) =/‘&(($a’’ LO
1
F( z , 2,)
=
12&((
a 4-
- a 3 ) g h - ag’h’),
aa”)g2h2
zo
+ g”h” + 2 a 2 ) , 1 z = 5 / dz{( a4 - a a ” ) g 3 h + a2gg‘(gh)’+ gl’h’), +(u2(g2h’2 + g”h2)
D( z , z , )
=0
E( Z , 2,)
=
1 z 5 / &{( a 4 - a a ” ) g 4 + 2ag2g’’ + g’4}.
(203)
LO
From Eqs. (195) and (199), we have
I$(+
~p(~,z,)q:)(z,z,)I$(z,)~ (204) which represents the generalization of the paraxial propagation law in Eq. (1331, corresponding to the inclusion of the lowest order aberrations. Now the transfer map becomes (L’L
=
)(3)(Z) =
(q:ppk fipq:))(zo), (@C:$ cpqi))(
(205)
(p 1 )(3)( 2) = I 2,) (206) with = 4 : ) ( z ,z,) and ( * - - ) ( z , ) = ($(z,)l-.I$(zJ). The subscript (3) indicates that the correction to the paraxial (or first order) result
4;)
OPTICS OF CHARGED PARTICLES
305
incorporated involves up to third-order polynomials in (r I ,$ I). Explicitly, (X)(3)(4
where the geometrical aberrations, or the deviations from the paraxial
involving expectation values of homogeneous third-order polynomials in (rI ,$,). Hence the subscript (3) for ( A x ) ( ~ ) ( z )( ,A Y ) ( ~ ) ( z )etc., , and the
306
R. J A G A N N A M AND S. A. KHAN
name third-order aberrations, Note that, here, we are retaining only the single coymutator terms in the application of the formula in Eq. (98) to compute V,$&, qi;yqi), etc., since the remaining multiple commutator terms lead to polynomials in (r ,,fi I 1 which are only of degree 2 5 and are to be ignored in order to be consistent with the fact that we have retained only terms up,fo fourth order in (r ,,fi I 1 in the Hamiltonian and the transfer operator U. Obviously, the plane at which the influence of aberrations is to be known is the image plane at z = zi:
where
308
R. JAGANNATHAN AND S. A. KHAN
A =A(zi,z0),
Q
=Q(Z,,Z,), F =F(zi,z0),
D
d
= d( z i ,z 0 ) ,
= D( z i ,z 0 ) ,
E
= E(
zi, z 0 ) .
(214)
With reference to the aberrations of position [Eqs. (210) and (211)l the constants C,, K, k , A , Q, F, D , and d are known as the aberration coefficients corresponding, respectively, to spherical aberration, coma, anisotropic coma, astigmatism, anisotropic astigmatism, curvature of field, distortion, and anisotropic distortion [see Hawkes and Kasper (1989a) for a detailed picture of the effects of these geometrical aberrations on the quality of the image and the classical methods of computation of these aberrations; see Ximen (1991) for a treatment of the classical theory of geometrical aberrations using position, momentum, and the Hamiltonian equations of motion]. The gradient aberrations [Eqs. (212) and (21311 do not affect the single-stage image but should be taken into account as the input to the next stage when the lens forms a part of a complex imaging system. It is interesting to note the following symmetry of the nine
309
OPTICS OF CHARGED PARTICLES
aberration coefficients: under the exchange g h, the coefficients transform as C,c)E, K c) D, k t)d , A t)F , and a remains invariant. To see the connection A c) F we have to use the relation gh’ - hg’ = 1. Introducing the notations u =x
+ iy,
u
= ($x
+ i&)/po,
(215)
the above transfer map [Eqs. (209)-(213)1 can be written in a compact matrix form (see Hawkes and Kasper, 1989a, Chapter 27, for the aberration matrices in the classical context) as follows:
1 0
0 1
c, ik-K
2K ia-2A
D+id
2A+ia
2k -a -a
2d
X
F id-D
K+ik\ -F
I
310
R. JAGANNATHAN AND S . A. KHAN
Let us now look at the wavefunction in the image plane. From Eq. (204) we have
x/d2ro(rI,i(g)/MI~:)(zi,Zo)Irr
x +(rI ,O)
,o)
(217)
20).
lq&,
When there are no aberrations (r I,i(S) / M zo)lrI , ) = 6’(r + , - r l , i( S ) / M ) and hence one has the stigmatic imaging as seen earlier from Eqs. (135) and (136). It is clear from Eq. (217) that when aberrations are present the resultant intensity distribution in the image plane will represent only a blurred and distorted version of the image. Usually, is approximated by keeping only the most dominant aberration term, namely the spherical aberration term, which is independent of the position of the object point [see Eqs. (210) and (211)l. An important result to be recalled here in this connection is the celebrated Scherzer theorem (Scherzer, 1936) which shows that the spherical aberration coefficient C, is always positive and cannot be reduced below some minimum value governed by practical limitations [to obtain this result from the expression for C given in Eq. (2031, see Dragt and Forest (198611. Attempts to correct this aberration have a long history since Scherzer’s theorem and there seems to be much to be achieved yet in this direction (see Hawkes and Kasper, 1989a, b, 1994). Let us also note that, in practice, there are further modifications required to be incorporated in the general propagation law [Eq. (204)] (for details of practical transfer theory, using Fourier transform techniques, and aspects of the influence of diffraction and
4:)
OPTICS OF CHARGED PARTICLES
31 1
aberrations on resolution in electron microscopy, see Hawkes and Kasper, 1994, Chaper 65). For example, one has to take into account the following aspects: (i) the specimen may not be exactly in the plane conjugate to the (fixed) image plane so that a drift factor (&,) of the type in Eq. (91) with a suitable Az (known as defocus; Scherzer, 1949) will have to be considered in defining the actual object wavefunction, and (ii) the diffraction by the beam-confining aperture behind the lens. Now, we have to emphasize an important aspect of the aberrations as revealed by the quantum theory in contrast to the classical theory. We have identified the quantum mechanical expectation values ( r ,) ( z ) and (p I ) ( z ) / p , as the classical ray variables associated with position and gradient of the ray intersecting the z-plane. Then, with the expressions for the various aberration coefficients being the same as their respective classical expressions (of course, under the approximations considered), Eqs. (210)-(213) correspond exactly to the classical expressions for aberrations of position and gradient provided we can replace (fix$; ), ({fix,$ 2, - r , + r , .$,I), < ( x , i ” , ) , etc., respectively, by ( ~ , ) ( ( p , + ) ~( p , ) ), 4((~>(p,> +~( y ) ( p , ) ( p y ) ) , ~ ( ( X ) ( ( P , +) ~(p,,)’)), etc. But, that cannot beAdone. In quactum mechanics, in general, for any observable 0, (+lf(O)l+) =f((+lO)l+)) only when the state I+) is an eigenstate of 0 and, for any two observables, say 0, and O,, only when the state I +) is a simultaneous eigenstate of both 0, and 0, can we have (i,hlf(6,,6,>1+) =f((+16,1+),(+(1621*)). It is thus clear that for the wavepackets involved in electron optical imaging the above-mentioned replacement is not allowed. As a result we see that the aberrations depend not on& on ( r I ) and (p ) but also on the higher order central moments of the wavepackets. Thus, for example, contrary to the classical wisdom, coma, astigmatism, etc. cannot vanish when the object point lies on the axis. As an illustration, ( ( r . ,B:}Xz,), one of the terms contributing to consider the term coma [see Eqs. (210) and (211)] which, being linear in position, is the dominant aberration next to the spherical aberration. The corresponding classical term, ((dX/dzl2 + ( d y / d z ) 2 ) r , at z,, vanishes obviously for an object point on the axis. But, for a quantum wavepacket with (r I ) ( z , ) = (0,O) the value of ( ( r ,fi:})(z,) need not be zero since it is not linear in (r, )(zo). More explicitly, we can write, with S r , = r , - ( r ,) and
,
N
,
a, B, =
-(P,),
312
R. JAGANNATHAN AND S . A. KHAN
=
2(r, > ( z O ) < p > l (z0)’ + 2(r1 ) ( Z o ) ( ( W 2
+((ar,
+ 2(ISr,
,(sax)’ 9
+
+
(vy)’)(~o)
(VY)’})GO)
~ ~ J ( ~ o ) ( P A ~ o )
+ 2({% w y } ) ( ~ o ) ( P y ) ( ~ o ) ~ (218) showing clearly that this coma term is not necessarily zero for an object point on the axis, i.e., when (r )(zo) = (0,O).Equation (218) also shows how this coma term for off-axis points ((r ) ( z o )# (0, )) also depends on the higher order central moments of the wavepacket besides the position ((r I ) ( z o ) )and the slope ((p )(zo)/p,,) of the corresponding classical ray. When an aperture is introduced in the path of the beam to limit the transverse momentum spread one will be introducing un3
,
d m ,
Jm)
Ay = certainties in position coordinates (Ax = and hence the corres onding momentum uncertainties (Ap, = Apy = in accordance with Heisenberg’s uncertainty principle, and this would influence the aberrations. However, the schemes for corrections of the aberrations may not be affected very much since these schemes depend only on the matching of the aberration coefficients and the quantum mechanical expressions for these coefficients turn out to be, under the approximations considered, the same as the classical expressions. Before closing, we have to consider a few other points: If we go beyond the approximation in Eq. (66) to include higher order terms in then, in general, we will have
dm’,
d ~ ) ,
2,
n
OPTICS OF CHARGED PARTICLES
313
where ~ & ] ( Z , Z , ) is to be calculated using the formula in Eqs. (76) and (77) keeping in the corresponding ?&&, z,) only terms of order up to 2n in (r I ,fi ,). Using Eq. (2201, aberration beyond the third order can be computed following the same procedure used above for studying the third-order aberrations. Here again, in the application of Eq. (98) to calculate the transfer maps for r I and @ I [Eqs. (205) and (206)l the series of commutators on the right-hand side of Eq. (98) should be truncated in such a way that only terms of order up to (2n - 1) in (rI ,$,I are retained. Comments on the effect! of the hermitian and nonhermitian terms dropped from to obtain W, in Eq. (192) [or in general in Eq. (219)] are in order. The hermitian terms we have been dropping are terms of nonclassical origin proportional to powers of A, such that they will vanish in the geometrical optics (or classical) limit when we make the replacments fi I = - ihV, + p , and A, + 0. Under the approximation considered above the terms dropped are
8
where the superscript (A,) indicates the explicit A, dependence. Taking into account the influence of the above terms [Eqs. (221) and (222)] is straightforward. Note that A:?) is a paraxial term and should be added to A , while computing the paraxial transfer operator $ ( z , z,)JEq. (126)]. Using the prescription outlined in Appendix F, one gets for gp(z,z , ) the same expression as in Eq. (126), but having (g$z, z o ) - Aig,(z, z,)F’(z)/ 161~’)and (h’’(z, 2,) - Aih,(z, zo)F’(z)/l6.rr2) instead of g$z, z , ) and h’,,(z,z,), respectively, and, with g,(z, z , ) and h,(z, z , ) satisfying the modified paraxial equation
r’y
+
(
F(z)
-
A: 4 F”(z)- -
16w2 2561~~ replacing Eq.(1301, and the initial conditions
314
R. JAGANNATHAN AND S. A. KHAN
The relation g,h’, - h,gl, = 1 is true at all z as before. Consequently, the paraxial properties of the system are slightly modified and the changes are easily computed. Since the additions are proportional to powers of A, they are essentially small compared to the clas$cal parts and vanish in, the geometrical optics limit (A, + 0,.The term W?$ has to be added to W0,(4) to compute the corresponding L& and this leads to the modification of the aberration coefficients. For example, the modified spherical aberration coefficient turns out to be
l’iI
c, = - 1‘dz
-
2
(ff4 -
ffff”)
z,
A4, +y ( a4a” + 3 2 ~
[ +[
+
(Y(Y’~CY”
1
+ a2a’”’’’)h4 A:
~ ( ~ C Y-Cah” X‘ ad“‘) - 7 f f 4 :4 1 6 ~
1
3 A: 2 a 2 + y ( ~ ” f ’ h2N2 ’ 32-
d 2
1
h3h’
where h = h,(z, z , ) satisfies Eqs. (223) and (224). Since the nonclassical A,-dependent contributions to are very small compared to the dominant classical part, Scherzer’s theorem would not be affected. Let us now consider the nonhermitian term
c,
5
which is really an %ntihermitianterm. Since it a paraxial term its effect will be to modify V,(z, z , ) when we add it to H,,. If we retain any such antihermitian term in the paraxial Hamiltonian the reulstant transfer operator .5$z, z , ) obtained using the formula in Eqs. (73) and (741, will, in general, have the form
315
OPTICS OF CHARGED PARTICLES
where i,(z, z , ) and ZA(z, z,) are, respectively, the hermitian and antihermitian correction terms to theAmainpart ?(z, zok it may be npted that any term of the type ( i / h ) [ A ,B ] is hermitian when both A and k are hermitian or antihermitian and such a term is antihermitian if one of the two operators is hermitian and the other is antihermitian. When $,p(z, 2,) is used to calculate the transfer maps
(a,&
( r l >(2,)
-+
( r L>(z)
it is seen that the hermitian correction term modifies the paraxial map while the antihermitian correction term leads to an overall real scaling factor 1/( $ ~ z o ~ l e ~ ~ * i ' ~ ~ f iaffecting ~ l + ~ z othe ~ ) ,image magnification, as a consequence of nonconservation of intensity, and contributes to ,e ( - i i ~ / cand ) e(-i'~/*)'$ aberrations since the terms like e(-i'A/fi)tr e(-i'A/*)lead on expansion, respectively, to hermitian terms of the fork r + nonlinear terms in (r ,, and $ + nonlinear terms in ( r , ) only. In the present case, the term .$Ao) in Eq. (226) does not lead to any hermitian correction term (note that pd?;)(z(')), ~;;)(Z(~))I = 0 for any z(l) and d2))and its contribution to the optics is only through the antihermitian correction term affecting the conservation of intensity and adding to the aberrations. Since the effects of the &-dependent hermitian and antihermitian terms are quite small, as found above in a preliminary analysis, we in proposed that all such terms may be treated as perturbations and clubbed with the aberration terms to be dealt with using the interaction picture. In the computation of the corresponding transfer operator &-dependent terms may be retained up to any desired order of accuracy. Thus, for example, in the present case, to obtain the effects of the terms @$),A$& and 4,"$ we may replace z,) in Eq. (204) by a %r(z, z,) which is to be computed by using the formula in Eqs. (73) and (74) with = A:,,, + fi(Ao)l 0. P + @,?if +d,$)' and keeping the commutator terms up to the desired level of accuracy in terms of powers of A, and such that the resultant polynomial in (r ,$ 1 is only of order four. It should also be
-
8
qi)(z,
8
316
R. JAGANNATHAN AND S. A. KHAN
noted that the precise forms of the A,-dependent correction terms depend on the order of approximation, in terms of powers of p , , chosen to expand the Hamiltonian H in Eq. (32) to arrive at the optical Hamiltonian W, in Eq.(66). We shall not elaborate on this topic further since the calculations are straightforward [more details will be available elsewhere (Khan, 199611. 3. Some Other Examples In this section, we consider a few other examples of the application of the general formalism of the scalar theory of charged-particle wave optics. The examples we shall treat briefly are the magnetic quadrupole lens, the axially symmetric electrostatic lens, and the electrostatic quadrupole lens (see Hawkes and Kasper, 1989a, b, for the practical utilizations of these lenses). The straight optic axes of these lenses are assumed to be in the z-direction. Let us briefly recapitulate the essential aspects of the general framework of the theory. We are interested in the study of the z-evolution of a quasimonochromatic quasiparaxial charged-particle b5am being transported through the lens system. The Hamiltonian Z, of the system, governing the z-evolution of the beam through the optical Schrodinger equation
can be written as
where p , is the magnitude of the design momentum corresponding to the mean kinetic energy with which a constituent particle of the quasimonoenergetic beam enters the systlem, from the field-free input region, in a path close to the + z direction, W,,is the hermitiy paraxial Hamiltonian [in general a quadratic expression in (r I ,fi W,,is the hermitian aberration (or perturbation) Hamiltonian [a polynomial of degree > 2 in (r I ,fiL)], and 2$Ao) is a sum of hermitian and antihermitian expressions with explicit A, dependence containing paraxial as well as nonpara$ai terms. In the geoemtrical optics limit (A, + 0)2$^0) vanishes, unlike W,, and Aqa, which tend to the corresonding classical expressions in this limit. From Eqs. (229) and (2301, we have that l@(z)> = @ z ,
Z,)l@(Z,)),
(231)
OPTICS OF CHARGED PARTICLES
317
with
where aexp( -(i/tL)lL dz( 1)) is the path-ordered exponential to be computed using Eqs. (73) and (74). When A , is a sum of r: , {r I .fi I +fi I -rI), and , as in the case of the examples we are considering, Up(z,z o ) may be computed “exactly,” in the same form as in Eq. (126) and as exactly as gp(z,z , ) and hp(z,z , ) can be tbtained, using the procedure outlined in Appendix F. The expression for $ ( z , z,) can be calculated up to the desired order of approximation consistent with the approgmation made in obtaining the nonparaxial and A,,-dependent parts of &”, in Eq. (230). Then, using Eq. (230, the behavior of the system can be understood by analyzing the average values of r l and p l at any final plane of interest, namely,
at
(235)
in terms of the state l@(zo)) at any desired initial plane. Wh-en .$z,zo) in Eq. (231) is approximated by the dominenat paraxial part, Up(z,zo), alone [see Eq. (23211 one gets the ideal, or the desirable, behavior of the system expected on the basis of the paraxial (or Gaussian) geometrical optics: in this case, the transfer map ((r I )(zo),((p I ) ( z o ) ) + ((r I Xz), ((p ) ( z ) ) , for any z-plane, is linear in ((r )(zo), ((p A )(zo)). Here, we shall treat briefly the magnetic quadrupole lens, the electrostatic round lens, and the electrostatic quadrupole lens according the scheme outlined above. In each case, we shall explicitly consider only the ideal behavior of the system in order to identify the essential characteristics of the system. The deviations from the ideal behavior leading to the various classical and nonclassical aberration effects as well as A,-dependent corrections to the paraxial optics can be studied exactly in the same
318
R. JAGANNATHAN AND S. A. KHAN
way as in the case of the magnetic round lens, which has been treated above the some detail.
a. Magnetic Quadmpole Lens. Let us consider the ideal magnetic quadrupole lens, with the optic axis along the z-direction, consisting of the field B = ( - Q ~ Y -, Q m x , o ) , constant in the lens region ( zI Iz Iz,), (236) Qm= 0 outside the lens region (2z,)
(
corresponding to the vector potential
Since there is no electric field in the lens region we can take +(r) Then, from Eq. (66) the optical Hamiltonian & is obtained as
& = -Po + A P + kio,a 0.
+/@*o),
=
0.
(238)
Since A,,, is independent of z , the exact expression for the unitary paraxial transfer operator can be immediately written down: with Az = (z - z,),
319
OPTICS OF CHARGED PARTICLES
analogous to Eq. (170) in the case of the round lens. The corresponding paraxial transfer map for (r I ,p I) becomes
TXL
cosh( @ w ) =
\ @ sinh(@
w)
I
o=(:
:),
1
-sinh( @ w )
@ cash(@ w ) 1
in( fiw )
cos( @ w )
~ , ( d ) =1 ( ~ d1 ) ,
K=-
(244) Po
It is readily seen from this map that the lens is divergent (convergent) in the xz-plane and convergent (divergent) in yz-plane when K > 0 ( K < 0). In other words, a line focus is produced by the quadrupole lens. In the weak field case, when w 2 4 1/IKI [note that K has the dimension of (length)-'] the lens can be considered as a thin lens with the focal lengths given by 1 1 -= -(245) f(X) f ( Y ) 3 -wK. Study of deviations from the ideal behavior [Eq. (244)l due to
A,,
and
2@*0) is straightforward using the scheme outlined above [Eqs. (231)-(235)1
and we shall not consider it here. In the field of electron optical technology, for particle energies in the range of tens or hundreds of kilovolts up to a few megavolts, quadrupole lenses are used, if at all, as components in abelration-correcting units for round lenses and in devices required to produce a line focus. Quadrupole lenses are strong focusing: their fields exert a force directly on the electrons, toward or away from the axis, whereas in round magnetic lenses, the focusing force is more indirect, arising from the coupling between B,
320
R. JAGANNATHAN AND S. A. KHAN
and the aximuthal component of the electron velocity. So it is mainly at higher energies, where round lenses are too weak, that the strong focusing quadrupole lenses are exploited to provide the principal focusing field (see Hawkes and Kasper, 1989a, b for more details). Magnetic quadrupole lenses are the main components in beam transport systems in particle accelerators [for details see, e.g., Month and Turner (1989) and the textbooks by Conte and MacKay (1990, and Wiedemann (1993, 1995) and references therein]. b. The Axially Symmetric Electrostatic Lens. An electrostatic round lens, with axis along the z-direction, consists of the electric field corresponding to the potential
inside the lens region ( z , s z 5 zr).Outside the lens + ( z ) = 0. Using this value of 4(r) in Eqs. (26) and (66), with A = (0, 0, O), the optical Hamiltonian of the lens takes the form,
OPTICS OF CHARGED PARTICLES
321
The unitary paraxial transfer operator l&(z,z,) can be obtained as outlined in Appendix F, in terms of minus the first term ( -pol which contributes only a multiplicative phase factor to the wavefunction. In this case, unlike the situation for the magnetic round lens, the coefficient of is seen to depend on 2. The calculation is straightforward and the paraxial transfer map reproduces the well-known classical results (see Hawkes and Kasper, 1989a). Here we have just demonstrated that @ can be brought to the general form, as required by Eq. (2301, for application of the scheme of calculation outlined above. It may be noted that we have assumed the lens potential +(r I ,z ) to vanish outside the lens region. In other words, we have considered the unipotential (einzel) lens having the same constant potential at both the object and the image side. There is no loss of generality in this assumption of our scheme, since the so-called immersion lens, with two different constant potentials at the object and the image sides, can also be treated using the same scheme simply by considering the right boundary (zr) of the lens to be removed to infinity and including the constant value of the potential on the image side in the definition of 4(r I ,2 ) .
A,,
c. The Electrostatic Quadrupole Lens. For the ideal electrostatic quadrupole lens with 1
44r) = z Q e ( x 2 - y 2 ) , constant in the lens region 0 outside the lens region
% = -Po
( 2 ,Iz Iz , ) ,
(z
z r ) ,
(252)
(253)
322
R. JAGANNATHAN AND S. A. KHAN
1
9=
E
+ moc2
,
qQ, l =-
CPO CPO and there ae! no A,-dependent terms-up to this approximation. Simply by comparing W,,in Eq. (253) with the W,,of the magnetic quadrupole lens [Eq. (23911 it is immediately seen that a thin electrostatic quadrupole lens, of thickness w = z, - z , , has focal lengths given by
1
-=---
1
I
wqQe(E + mot')
(257) CZP,2 Again, it is straightforward to study the deviations from the ideal behavior using the scheme outlined above. f(Z)
f‘Y’
111. SPINOR THEORY OF CHARGED-PARTICLE WAVEOPTICS
A. General Formalism: Systems with Straight Optic Axis
The developments of a formalism of spinor electron optics (Jagannathan et al., 1989; Jagannathan, 1990; Khan and Jagannathan, 1993) has been mainly due to a desire to understand electron optics entirely on the basis of the Dirac equation, the equation for electrons, since in the context of electron microscopy the approximation of the Dirac theory to the scalar Klein-Gordon theory seems to be well justified (Ferwerda et al., 1986a, b), under the conditions obtaining in present-day electron microscopes, and accelerator optics is almost completely based on classical electrodynamics (see e.g., Month and Turner, 1989; Conte and MacKay, 1991; Wiedemann 1993, 1995, and references therein). The algebraic structure of this spinor formalism of electron optics, built with a Foldy-Wouthuysen-like transformation technique, was later found (Khan and Jagannathan, 1994, 1995) to be useful in treating the scalar theory of charged-particle wave optics based on a Feshbach-Villars-like representation, as we have already seen in the earlier sections. Now, we shall present the essential details of the wave optics of the Dirac particles (spin- particles) in the case of systems with straight optic axis along the z-axis and demonstrate its application by considering the magnetic round lens and the magnetic quadrupole lens.
OPTICS OF CHARGED PARTICLES
323
We shall use the same notations as in the previous sections, for describing the optical system, the wavefunction (now, with four components), the Hamiltonian (now, a 4 X 4 matrix), etc., which will be clear from the context. Let us start with the time-dependent Dirac equation written in the dimensionless form
A
H,
=
p
+ 8,.,+ 8,, A
As is well known, in the nonrelativistic situation (In1+ mot), for any positive-energy the upper components are large compared to the lower components The even operator 8, does not couple q,, and q , and the odd operator g, couples them. Further, one has to note the algebraic relations
pgD = -gDp,
(262) Let us consider the optical system under study to be located between the planes z = zI and z = z,. Any positive-energy spinor wavefunction obeying Eq. (258) and representing an almost paraxial quasimonoenergetic Dirac particle beam being transported through the system in the +z-direction would be of the form
P
=
A p (Po,
lpl,
E(p) =
p&D = & D p .
+ d n , IPII ' P
P
=
(P, , P ,
=
+Fz)
(264)
324
R. JAGANNATHAN AND S. A. KHAN
p p ( l a + ( P ) 1 2+ lU-(P)12)
=
(265)
1,
where {u *(p) exp[(i/hXp * r - E(p)t)]) are the standard positive-energy free-particle plane-wave spinors (see, e.g., Bjorken and Drell, 1964). We are interested in relating the scattering state wavefunction 9 ( r I ,z; t ) at different planes along the z-axis. To this end, we shall assume the relation
+(r I ,z(2); p )
=
ck ld2r(1)(ry)l.$k(z(2),
z ( l ) ;p)Ir(l))+k(r(j), z(l);p
j,k
=
),
1 , 2 , 3 , 4 , (266)
for +(r I ,z ; p ) , such that we have
I*(
z(*), t ) )
=
in the paraxial case ( Ap
z ( ~ )z('); , po)l*( z(');t ) ) ,
2:
0). (267)
It is obvious that the desired z-propagator z('); pol, corresponding to p o t the mean value of p for the beam, is to be gotten by integrating for z-evolution the time-independent equation q4J
QL.+.
P-m,cz+-a,
m0c
(
ih-
m0c d
az
)I
+qA, $ ( r l , z ; p o ) = 0,
obtained for Wr, t ) = +(r I ,z ; pol expi - ( i / h ) E ( p o ) t ) . Now, multiplying Eq. (268) by mOca,/pOthroughout from the left and rearranging the
325
OPTICS OF CHARGED PARTICLES
terms we get
where E is the kinetic energy of the beam particle entering the system from the field-free input region [i.e., moc2+ E = E(po)l. Noting that, with I as the 4 X 4 identity matrix, 1 1 - ( I + X % ) P X % ( I - xa,) = P , $ 1 + x % ) ( l - xu,) = 2 (272) 1 9
let us define a transformation *-b*'=
M
M+,
1
=
-(Z
a
Then, V ,I satisfies a Dirac-like equation ih, d+' - (MAM- 1) +' 27T dz
=
+ xa,).
(273)
A'+',
(274)
7)44
.+
1 - -@,, Po 1 I - ,moq4JXcYz,
E
+ mOc2
8 =- p
CPO
A
1 8 = -xa Po
Po
(277)
7)= CPO
For a monoenergetic quasiparaxial beam, with IpI = p o , p z > 0, and pz = p,,, entering the optical system from the field-free input region, +' has its upper pair of components large compared to the lower pair of components as can be v:rifiFd using the form of i+h(! ,z 5 z , ) given in Eqs. !264) aFd (265). In H', 8 is an even operator, B is an odd operator, PU = -BP, and Pg = &3, Now we can apply the Foldy-Wouthuysen transformation
326
R. JAGANNATHAN AND S. A. KHAN
technique to reduce the strength of the odd operator 8^ to any desired level taking l/p, as the expansion parameter. The first transformation
leads to the result
= @- -pg2 - 21
= - : p2 i
81
[[ 8,
[@,@I A
ih, ( d g ) ) ] 1 ++ -g4, (282) 21r d z 8
A
2($))
[2,@] +
1 ,
- -83. 3
(283)
There are a few technical points to note here. The Hamiltonian fi' is not hermitian: this is related to the fact that Cg_ ld2rl+$r I ,z)I2 need not be conserved along the z-axis. The transformation in Eq. (279) is not unitary. The equations (279)-(283) can be written down from the corresponding equations of Appendix B [(B6) and (BlO)] simply by using the analogy t + z , moc2 + -1, and h + hO/27r, which follows from a comparison of Eqs. (274) and (275) with Eqs. (Bl) and (B2). It may also be noted that having the equations in dimensionless form is helpFI for symbolic algebraic manipulations in the above calculations. Now, contains only higher power of l/po compared to @. The second transformation,
leads to the result
327
OPTICS OF CHARGED PARTICLES
with g2containing only higher powers of l/p, compared to another such transformation,
2,. After
we have
fi(3) =
+ g3+ g3,
-p
(290)
g3= gl(c + g2,g+ g2), g3= g1(2+ &,&
--$
g2),
(291)
with g3containing only hig$er powers of l/po compared to at this stage and omitting 83,we can write
ih, -272
go)= 2 - -pg2 1 2
+ - 1p
fii(3)+(3),
dz -
[(
1 8, A
8
( ,. [ ..
g2.Stopping
“)I]
”,g] + ih,( A
2a
+ in,(
dz
6 4 + [8,8]- dg)]’) A
2a
8
dz
(293)
It can be shown that the above transformations make the lower composuccessively smaller and smaller compared to the upper nents of components for a quasiparaxial beam moving in the +z-direction. In other words, one can write
+
p+‘3’
Now, tis) is found to be of the form
+‘3’.
(294)
328
R. JAGANNATHAN AND S. A. KHAN
Taking into account Eq. (2941, we can approximate Eq. (289) further, getting
To enable physical interpretation directly in terms of the familiar Dirac wavefunction let us return to the original Dirac representation by retracing the transformations: .
+(3)
+
.
*
*
+ = M-le-iSle-iS2e-i.f3
(3)
~-1~-i5$,(3) 9
(297)
1
il+ i2+ i3- ?([i1,i2] + [i1,i3] + [i2,i3]) 1
--“i1,i2],i3] 4 . *.*
Implementing this inverse transformation in Eq. (2961, with calculation done up to the desired level of accuracy in terms of l/p, (here, up to l/pi), and, finally multiplying throughout by p o we get
a ih ~ I J ~ ( Z )=%I+(z)), ) e-i.Gp,Aei4
-i h - i y -
ei9)
az
(299)
)M
The resulting optical Hamiltonian of the Dirac particle will have the form
% = -Po +
+ l171,a
+&(*a)
+2@*o*q
(301)
where A,,,, A,,,, and are scalar terms ( - I ) and % ( * o * ~ ) is a 4 x 4 matrix term which also vanishes in the limit A, + 0, like Now, the performance of the optical system under study, corresponding to the assumed values of the potentials +(r) and Ah), can be calculated using the
329
OPTICS OF CHARGED PARTICLES
same scheme $s in Eqs. $231)-(235); the matrix term 2$Ao* can also be clubbed with W,,and q ( * O ) and treatingysing the interaction picture. It is found that the optical Hamiltonians W0)in the Klein-Gordon theory and the Dirac theory do not differ in their 'classical'parts (Ao,, + A,,). Thus the Klein-Gordon t4eory without the term &(Ao) and the Dirac theory without the terms and 2 $ A o v u ) are identical, effectively, as seen below. Note that for an observable 0 of the Dirac particle, with the corresponding hermitian operator 6 given in a 4 x 4 matrix form, the expectation value is defined by (O)(Z) =
-
( @( 2 ) l a @( 2))
(@(z)l@( 2))
l d 2 r $ * ( r l 7 ~ ) 6 j k @ k ( r l9 2 ) c;=,/ d 2 r * * ( r * , Z ) * ( ' L , z )
c;,k=l
Hence, the map ( 0 ) ( z o ) 4 ( O ) ( z )becomes ( O ) (z )
'
(302)
330
R. JAGANNATHAN AND S. A. KHAN
-
When the terms and are dropped from the Dirac optical Hamiltonian it becomes I and the corresponding transfer operator also becomes I with respect to the spinor index: i.e.,$Jz, z,) = $2, z,)Sj,. Then, although all four components of (+,, t,b2, &, +,,I contribute to the averages of r I ,p I ,etc., as seen from the above definitions, one can think of them as due to a single component effectively, since the contributions from the four components cannot be identified individually in the final results. Thus, in this case, there would be no difference between the “classical” transfer map for ((c I ) ( z ) ,( p I ) ( z ) )[Eqs. (205)-(208)] and the corresponding transfer map in the Dirac theory. In this sense, the Dirac theory and the Klein-Gordon theory are identical scalar theories when A,-dependent terms are ignored in the Klein-Gordon theory and A,-dependent scalar and matrix terms are ignored in the Dirac theory. We shall consider below, very briefly, a few specific examples of the above formalism of the Dirac theory of charged-particle wave optics.
-
+ +,
B. Applications
1. Free Propagation: Difiaction
For a monoenergetic quasiparaxial Dirac beam propagating in free space along the +z-direction Eq. (274) reads
with
(p o 9 ’ y = ( p i - J q z . Thus, p o d f = -po p + xa I -fi I can be identified with the classical optical Hamiltonian for free propagation of a monoenergetic quasiparaxial beam, with the square root taken in the Dirac way. Although in the present case it may look as if one can take such a square root using only the three 2 X 2 Pauli mmatrices, it is necessary to use the 4 X 4 Dirac matrices in order to take into account the two-component spin and the propagations in the forward and backward directions along the z-axis considered separately. It can be verified that for the paraxial planewave solutions of Eq. (306) corresponding to forward propagation in the + z direction, with p , > 0 and Ip I I 4 pr = p o t the upper pair of components are large compared to the lower pair of components, analogous to
d m ,
33 1
OPTICS OF CHARGED PARTICLES
the nonrelativistic positive-energy solutions of the free-particle Dirac equation. In the same way as the free-particle Dirac Hamiltonian can be diagonalized by a Foldy-Wouthuysen transformation (see Appendix B) the odd part in fit can be completely removed by a transformation: with
we have
1
=
--(I&E)P.
(310)
Po
Now, invoking the fact that JI" will have lower components very small compared to the upper components in the quasiparaxial situation, we can write iho a+"
21T dz
Po
Then, making the inverse transformation
+ = M-leBxal'iLe
$
(312)
9
Eq. (311) becomes
2 -( d m ) =
1
3
-Po
+ @-2Pll ;
1
+ 8Po
+
'.*
9
(314)
332
R. JAGANNATHAN AND S. A. KHAN
+
exactly as in the scalar case [see Eq. (8811 except for the fact that now has four components. Then it is obvious that the diffraction pattern due to a quasiparaxial Dirac-particle beam will be the superposition of the patterns due to the four individual components (JI,, J12, J13, +J of the spinor representing the beam: for a highly paraxial beam the intensity distribution of the diffraction pattern at the xy-plane at z will be given by [see Eq. (9511
+
where the plane of the diffra$ting object is at z,. It is clear th:t when the presence of a fieldAmakes &", acquire a matrix component ~ ( A o O . u )the l transfer operator f l z , z , ) would have a nontrivial matrix structure leading to interference between the diffracted amplitudes (+,, +b2, +3, JIJ. When the monoenergetic beam is not sufficiently paraxial to allow the approximations made above one can directly use the free z-evolution equation
obtained by setting Eq. (3161, we have
I+(.)>
=
C#J =
0 and A
=
(0, 0,O) in Eqs. (26914271). Integrating
il:
exp - Az(poPxa, + i(ZXbY- zybx)))l+(zo)L
AZ = (Z - z,),
(318)
the general law of propagation of the free Dirac wavefunction in the +z-direction, showing the subtle way in which the Dirac equation mixes up the spinor components [for some detailed studies on the optics of general free Dirac waves, in particular, diffraction, see Rubinowicz (1934, 1957, 1963,1965), Durand (19531, and Phan-Van-Loc (1953,1954,1955,1958a, b, 196011.
OpfICS OF CHARGED PARTICLES
333
2. The Axial& Symmetric Magnetic Lens
In this case, following the procedure of obtaining & as outlined above, we get A
‘% = -Po + H 0 , p A
+
Ao,a
+$(AD)
+ 2 ( A D ’ d ,
(319)
335
OPTICS OF CHARGED PARTICLES
+
@oAo a”( 2 ) I iP0 A: a( 2 ) a’”(2 ) 3 2 ~ 64r2
( --
Po A: + ,xPaz( 64P
2 a ’ ( $r:
1 2
- -a’( 2 ) Q”( 2)r:
i
-
Comparing with the scalar case it is seen that the difference in the scalar part ( I ) lies only in the A,-dependent term. Thus as already noted, even the scalar approximation of the Dirac theory is, in principle, different from the Klein-Gordon theory, although it is only a slight difference exhibited in the A,-dependent terms. The matrix part in $ in the Dirac theory, ( * o S u ) , adds to the deviation from the Klein-Gordon theory. Without further ado, let us just note that the position aberration ( 8 r 1 2 ( 3 J ~ 0 ) gets additional contributions of every type from the matrix part g(Ao.u). For example, the additional spherical aberration type of contribution is
&
where h is the “classical” h,(z, zo). Obviously, such a contribution, with unequal weights for the four spinor components, would depend on the nature of I+I(z,)) with respect to spin.
336
R. JAGANNATHAN AND S. A. KHAN
3. The Magnetic Quadrupole Lens
Now, for the ideal magnetic quadrupole lens,
is different Again, it is seen that, the &-dependent scalar term, R(Ao) from the corresponding one in the Klein-Gordon theory.
IV. CONCLUDING REMARKS In fine, we have reviewed the quantum mechanics of charged-particle beam transport through optical systems with a straight optic axis at the single-particle level. To this end, we have used an algebraic approach which molds the wave equation into a form suitable for treating quasimonoenergetic quasiparaxial beams propagating in the forward direction along the axis of the system. We have considered both the Klein-Gordon theory and the Dirac theory with examples. In particular, we have dealt with the magnetic round lens and the magnetic quadrupole lens in some detail. It is found that in the treatment of any system a scalar approxima-
OPTICS OF CHARGED PARTICLES
337
tion of the Dirac spinor theory would differ from the Klein-Gordon theory, but with the difference being only in terms proportional to powers of the de Broglie wavelength such that in practical electron optical devices there is no significant difference between the two treatments. The spin-dependent contributions in the Dirac theory are also found to be proportional only to powers of the de Broglie wavelength. So the contributions to the optics from such terms dependent on the de Broglie wavelength and spin could be expected to be visible only at very low energies. This vindicates the conclusion of Ferwerda et al., (1986a, b) that the reduction of the Dirac theory to the Klein-Gordon theory is justified in electron microscopy. Perhaps the extra contributions of the Dirac theory could be relevant for low-energy electron microscopy (LEEM) where the electron energies are only in the range 1-100 eV (see Bauer, 1994, for a review of LEEM). Regarding some other approaches to the quantum mechanics of charged-particle optics, we note the following: a path integral approach to the spinor electron optics has been proposed (Liiiares, 1993); a formal scalar quantum theory of charged-particle optics has also been developed with a Schrodinger-like basic equation in which the beam emittance plays the role of h (Dattoli et al., 1993). In the context of probing the small differences between the KleinGordon and Dirac theories, another aspect that should perhaps be taken into account is the question of proper definition of the position operator in relativistic quantum theory related to the problem of localization (Newton and Wigner, 1949). It should be interesting to study the transfer maps using the various proposals for the position operators for the spin-0 and spin- particles (e.g., see Barut and Rqczka, 1986). Throughout the discussion we have kept in mind only the application of charged-particle beams in the low-energy region compared to accelerator physics. However, the frameworks of the scalar and spinor theories described above are applicable to accelerator optics as well. In particular, the formalism we have discussed should be well suited for studying the quantum mechanical features of accelerator optics, since its structure has been adapted to handling beam propagation problems (for a quantum mechanical analysis of low-energy beam transport using the nonrelativistic Schrodinger equation see Conte and Pusterla, 1990). Also, as is well known, in accelerator optics the spin dynamics of beam particles is traditionally dealt with using the semiclassical Thomas-BargmannMichel-Telegdi equation (see, e.g., Montague, 1984). As has been shown by Ternov (19901, it is possible to derive this traditional approach to spin dynamics from the Dirac equation and also to get a quantum generalization of it. It should be worthwhile to study spin dynamics using the beam optical representation of the Dirac theory described above.
338
R. JAGANNATHAN AND S. A. KHAN
An important omission in the discussion is the study of systems with a curved optic axis such as bending magnets, which are essential components of charged-particle beam devices (see Hawkes and Kasper, 1989b, Part X). In these cases, the coordinate system used will have to be naturally the one adapted to the geometry, or the classical design orbit, of the system. Then in the scalar theory one has to start with the Klein-Gordon equation written in the suitably chosen curvilinear coordinate system and the two-component form of the wavefunction will have to be introduced in such a way that one component describes the beam propagating in the forward direction along the curved optic axis and the other component describes the beam moving in the backward direction. Starting with such a two-component representation one can follow exactly the same approach as above using the Foldy-Wouthuysen technique, to filter out the needed equation for the forward-propagating beam. The rest of the analysis will follow the same scheme of calculations as described above. Similarly, for the Dirac theory we can start with the Dirac equation written using the chosen set of curvilinear coordinates following the method of construction of the Dirac equation in a generally covariant form (see, e.g., Brill and Wheeler, 1957). Then the treatment of the given system follows in the same way, via the Foldy-Wouthuysen transformations, as discussed above (for some preliminary work along these lines, see Jagannathan, 1990). There are also other important omissions from our account of the quantum mechanics of particle optics: coherence, holography,. . . . For such matters we refer the reader to Hawkes and Kasper (1994). Any physical system is a quantum system. If it exhibits classical behavior, it should be understandable as the result of an approximation of a suitably formulated quantum theory. We have seen that the classical mechanics of charged-particle optics, or the geometrical charged-particle optics, follows from identifying, Ci. la Ehrenfest, the quantum expectation values of observables, like r I ,p I , and polynomials in (r I ,p I), with the corresponding classical ray variables. The quantum corrections to the classical theory, at the lowest level of approximation, leaving out the effects depending on the de Broglie wavelength and spin (if # 01, arise from the dependence of the aberrations on not only the quantum averages of r I and p I but also the higher order central moments of polynomials in (r I ,p I1. This implies, for example, that the off-axis aberrations, considered to vanish for an object point on the axis according to the classical theory, would not vanish, strictly speaking, due to the quantum corrections. Another way in which the classical theory can be recovered from the quantum theory is to describe the action of the transfer operator on the quantum operators, in the Heisenberg picture, in the classical language using the correspondence
OPTICS OF CHARGED PARTICLES
339
principle by which we make the replacements h + 0, the quantum operators + the classical observables, the commutator brackets ( ( l / i h ) [ A ,i l l + the classical Poisson brackets ((A, B)). Then, the formalism described tends to the Lie algebraic approach to the geometrical charged-particle optics pioneered by Dragt et al. (e.g., see Dragt and Forest, 1986; Dragt et al., 1988; Rangarajan et a!., 1990; Ryne and Draft, 1991; see also Forest et al., 1989; Forest and Hirata, 1992). In the context of understanding the classical theory of charged-particle optics on the basis of the quantum theory, it should also be mentioned that a phase-space approach to the quantum theory of charged-particle optics, using the Wigner function, may prove useful. Use of the Wigner function in the scalar theory of paraxial electron optics has been found (Jagannathan and Khan, 1995) to have attractive features (see also Castafio, 1988, 1989; Castafio et al., 1991; Polo et al., 1992; Hawkes and Kasper, 1994, Chapter 78 and references therein). In this connection, it may also be noted that the Wigner function can be extended to the relativistic case in a natural gauge-covariant way using an operator formalism and such an approach admits a straightforward second quantization leading directly to a many-body theory (Elze and Heinz, 1989). It should be worthwhile to see how such an approach can be used in the quantum theory of charged-particle beam optics so that one can take into account the many-body effects also.
APPENDIX A. The Feshbach-Villars Form of the Klein-Gordon Equation
The method we have followed to cast the time-independent Klein-Gordon equation into a beam optical form linear in d / d z , suitable for a systematic study, through successive approximations, using the Foldy-Wouthuysenlike transformation technique borrowed from the Dirac theory, is similar to the way the time-dependent Klein-Gordon equation is transformed (Feshbach and Villars, 1958) to the Schrodinger form, containing only a first-order time derivative, in order to study its nonrelativistic limit using the Foldy-Wouthuysen technique (see, e.g., Bjorken and Drell, 1964). Defining d
a=-*, dt
340
R. JAGANNATHAN AND S. A. KHAN
the free particle Klein-Gordon equation is written as
Introducing the linear combinations ih
iii
the Klein-Gordon equation is seen to be equivalent to a pair of coupled differential equations:
Equation (A41 can be written in a two-component language as
with the Feshbach-Villars Hamiltonian for the free particle, bY
kow,given
s2u, + '-ay. s2 +-
= moc2uz
2m0
2m0
For a free nonrelativistic particle with kinetic energy 4 moc2 it is seen that is large compared to Y -. In presence of an electromagnetic field, the interaction is introduced through the minimal coupling
*+
OPTICS OF CHARGED PARTICLES
341
The corresponding Feshbach-Villars form of the Klein-Gordon equation becomes
I?'"
=
moc2uz+ 2 + 2,
As in the free-particle case, in the nonrelativistic situation 1I'+ is large compared to 1I'-. The even term 2f does not couple 1I'+ and 1I'- whereas and 1I'-. Starting from Eq. (A@, the nonrelativistic limit of the Klein-Gordon equation, with various correction terms, can be understood using the Foldy-Wouthuysen technique (see, e.g., Bjorken and Drell, 1964). It is clear that we have just adopted the above technique for studying the z-evolution of the Klein-Gordon wavefunction of a charged-particle beam in an optical system comprising a static electromagnetic field. The additional feature of our formalism is the extra approximation of dropping a- in an intermediate stage to take into account the fact that we are interested only in the forward-propagating beam along the z-direction.
2 is odd, which couples 1I'+
B. The F o e - WouthuysenRepresentation of the Dirac Equation The main framework of the formalism of charged-particle wave optics, used here for both the scalar theory and the spinor theory, is based on the transformation technique of the Foldy-Wouthuysen theory which casts the Dirac equation in a form displaying the different interaction terms between the Dirac particle and an applied electromagnetic field in a nonrelativistic and easily interpretable form (Foldy and Wouthuysen, 1950; see also Pryce, 1948; Tani, 1951; see Acharya and Sudarshan, 1960, for a
342
R. JAGANNATHAN AND S . A. KHAN
general discussion of the role of Foldy-Wouthuysen-type transformations in particle interpretation of relativistic wave equations). In the FoldyWouthuysen theory the Dirac equation is decoupled through a canonical transformation into two two-component equations: one reduces to the Pauli equation in the nonrelativistic limit and the other describes the negative-energy states. Analogously, in the optical formalism the aim has been to filter out from the nonrelativistic Schrodinger equation, or the Klein-Gordon equation, or the Dirac equation, the part which describes the evolution of the charged-particle beam along the axis of an optical system comprising a stationary electromagnetic field, using the FoldyWouthuysen technique. Let us describe here briefly the standard Foldy-Wouthuysen theory so that the way it has been adopted for the purposes of the above studies in charged-particle wave optics will be clear. The Dirac equation in presence of an electromagnetic field is
=
moc*p + 2 + g,
(B2)
with Z? = qc$ and = c a 3. In the nonrelativistic situation the upper pair of components of the Dirac spinor q are large compared to the lower pair of components. The operator k? which does not couple the large and small components of q is called even and 2 is called an odd operator which couples the large to the small components. Note that
p&=
-&,
pk? = Z?p.
033)
Now the search is for a unitary transformation, q + the equation for V does not contain any odd operator. In the free particle case (with 4 = 0 and 3 = Wouthuysen transformation is given by
=
OW,such that
a> such
9+
=
a Foldy-
OPTICS OF CHARGED PARTICLES
343
This transformation eliminates the odd part completely from the free-particle Dirac Hamiltonian reducing it to the diagonal form: jh
-*‘ d
at
= eif(moc$
+
fi)e-ifq’
In the general case, when the electron is in a time-dependent electromagnetic field it is not possible to construct an exp(i$) which removes the odd operators from the transformed Hamiltonian completely. Therefore, one has to be content with a nonrelativistic expansion of the transformed Hamiltonian in a power series in l/m,c2 keeping terms through any desired order. Note that in the nonrelativistic case, when IpI 4 m0c2 th? transformation operator fiF = exp(iS^) with s^ = -iP&/2moc2, where @ = c a fi is the odd part of the free Hamiltonian. So in the general case we can start with the transformation
-
Then, the equation for
W1) is
344
R. JAGANNATHAN AND
S. A.
KHAN
where we have used the identity
Now, using Eq. (98) and the identity “ a 1 -( e - i ( l ) ) = (1 + A ( t ) + - i ( t ) z dt 2!
x =
(
1
x
-( d
at
1
1
1
-A(t) + -A(ty 2!
-
1 -.) -A(t), 3!
1
1
+A(t)+ -A(t)z + -A(t), 2! 3! dA(t) --
dt
1 3!
--(
1 +2!
1
+ -3!~ < t ) 3
{
dA(t) -A(t)
*
dt
***
1 dmd t
A
+A(t)-
dA(t) dA(t) ---a(t)2 + A(t)-A(t) dt at
1
A
”””)
+A(t )z - ...)
3
-[ A(t),-4 dA(
-a&) - 1 at 2! 1
--
3!
&),
- -1 &), 4! with
A = i$,
we find
at
t)
[A(t),%I] [A W ,[AO),*I] .
(B8)
OPTICS OF CHARGED PARTICLES
345
Substituting in Eq. (B9), AD = moc2/3+ & + d,simplifying the right-hand side using the relations P& = -& and P& = @, and collecting everything together, we have
f i g ) = moc2P + &,
+ dl,
with $?,and 2, obeying the relations /3d, = -dl P and P&, = g1/3 exactly like & and d.It is seen that while the term d in I-?, is of orper zero with respect to the expyxion paramfter l/moc2 [i.e., U = O ( ( l / m , ~ ~ ) ~the ) ] odd part of H g ) , namely U,, tontains only terms of order l/moc2 and higher powers of l/moc2 [i.e., Hl = O((l/moc2))1. To reduce the strength of the odd terms further in the transformed Hamiltonian a second Foldy-Wouthuysen transformation is applied with the same prescription: .\1r(2) = e&.\Ir(l)
%=--
i
&,
After this transformation,
a
ifc -*(a at
= f i WD. \ I r ( 2 ) ,
f i g ) = moc2P + g2+ i2,
where, now, g2= O((l/moc2)2). After the third transformation
A
A
A
g3= g2= 8,,
where
A
@3
”’)
(
= - [g2,g2] + i h - at 2m,c2
(B14)
’
S3= O ( ( l / m , ~ ~ ) ~So) .neglecting g3, 1
fig) = m,c$ + ,i+? -pb’ 2m,c2 -
[ (
a81
-8,[ b,4 + ih 1
8m2,c4
at
ItAm$ybe noted that starting with the second transformation successive ( g ,d)pairs can be obtained recursively using the rule
and retai$ng only the r$evant terms of desired order at each step. With B = qc#J and d = c a - 6,the final reduced Hamiltonian [Eq. (B15)I is, to the order calculated,
--
”’
8rn;c’
divE
with the individual terms having direct physical interpretations. The terms in the first parentheses result from the expansion of showing the effect of the relativistic mass increase. The second and third terms are the electrostatic and magnetic dipole energies. The next two terms, taken together (for hermiticity), contain the spin-orbit interaction.
-4
347
OPTICS OF CHARGED PARTICLES
The last term, the so-called Darwin term, is attributed to the zitterbewegung (trembling motion) of the Dirac particle: because of the rapid coordinate fluctuations over distances of the order of the Compton wavelength (2.rrh/m0c) the particle sees a somewhat smeared-out electrical potential. It is clear that the Foldy-Wouthuysen transformation technique expands the Dirac Hamiltonian as a power series in the parameter l/moc2 enabling the use of a systematic approximation procedure for studying the deviations from the nonrelativistic situation. Noting the analogy between the nonrelativistic particle dynamics and the paraxial optics, the idea of the Foldy-Wouthuysen form of the Dirac theory has been adopted to study the paraxial optics and deviations from it by first casting the relevant wave equation in a beam optical form resembling exactly the Dirac equation [Eqs. (Bl)-(B2)] in all respects [i.e., a multicomponent 1J' having the upper half of its components large compa!ed to the lower 5omponents and the Hamiltonian having an even part (g),an odd part (a),a suitable expansion parameter characterizing the dominant forward propagation and a leading term with a /Mike coefficient commuting with i? and anticommuting with g].The additional feature of our formalism is to return finally to the original representation after making an extra approximation, dropping p from the final reduced optical Hamiltonian, taking into account the fact that we are interested only in the forward-propagating beam.
C. The Magnus Formula The Magnus formula is the continuous analogue of the famous Baker-Campbell-Hausdorff (BCH) formula
aeLi
=
,a+ Li + l/Z[ a. Li]+
1/12([
a,[ A,i l l + [ [ a, Li],B])+.
' '
(C1)
Let it be required to solve the differential equation d -dut ( t )
=i(t)u(t)
(C2)
to get u ( T ) at T > to, given the value of u(to).For an infinitesimal A t , we can write ~ ( t , , ~ t = )eA~a(~o)u(t,). (C3) Iterating this solution we have u( to + 2 A t ) = e A f ~ ( f ~ + A l ) e A l ~t ( f ~ ) ~ ( 01, U ( t o+ 3 A t ) = e A l a ( ~ , + 2 A ~ ) e A l a ( l , + A ' ) e A ~ a ( ' o ) u( t o ) , and so on. (C4)
+
348
R. JAGANNATHAN AND S. A. KHAN
If T = to + N A t we would have
Thus, u ( T ) is given by computing the product in Eq. (C5) using successively the BCH formula [Eq. (Cl)] and considering the limit At 4 0, N + 03 such that N A t = T - to. The resulting expression is the Magnus formula (Magnus, 1954): u( T ) =
*
T , t,)u( t o )
9
To see how Eq. (C?) is obtained let us substitute the assumed foTm of the solution, u ( t ) = S t , to)u(to),in Eq. (C2). Then it seems that St,t o ) obeys the equation
a ,4t,to)
=i2 + T46(r
PI3 +
T28(r
PI4* ( 109)
By mathematical procedures similar to those shown in Section 111, i.e., substituting Eqs. (102)-(109) into Eqs. (41)-(44), we can derive a series of equations for calculating both Tiand S i j (i, j = 2,4,6,8).
390
JIYE XIMEN
A. Normalized Fourth-OrderHamiltonian Function in Terms of T40
9
T22 9 s40
9
s22
Obviously, Ta and S4, exactly coincide with T4 and S4 as presented in Section II1,A. Moreover, we can derive dimensionless H , from Eqs. (7) and (8): = Pr2
H,
p
=
+ Qp2,
1 1 -k3b3 -kb” 2 8 ’
Q =
1 -kb. 2
(110)
Consequently, one can derive T, and S,, by the following equation:
H22 -
1 1 T2, = ?kbrp - -kb’r2, 4
S,
1 + -kb” 8
= k3b3
(112)
B. Normalized Sixth-OrderHamiltonian Function in Terms of T60 9 T42
9
T24
s60
s42
9
s24
Obviously, Tm and S,, exactly coincide with T6 and Section 1II.B.
aT24 aT24 -
dr
dp
dH, dr
dH2 ap
-
s 6
as presented in
391
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
C. Normalized Eighth-OrderHamiltonian Function in Terms of T80 9 T62
7
T44
7
T26
7
'80
7
'62
7
s44
7
s26
Obviously, T,, and S,, exactly coixide with T, and S, as presented in Section II1,C. H62
- [ T 2 2 9 H601 - iT407 H 4 2 1 -
-
dr
dp
dH2 -
dH2 -
ar
aT,
-dz
H221 - [ q 2 9 H 4 ~ I
= S,r4,
dp aT26 aT26 -
ar dH2 dr
ap dH2 ap
a
- - =T26
dz
S2,r2.
D. Normalized Tenth-Order Hamiltonian Function in Terms of T I O O , T 8 2 , T647 T467 T28, '1007
'827 '649 '467 ' 2 ,
Obviously, TI,, and S,,, exactly coincide with T I , and S,, as presented in Section II1,D.
- [T22, -[T80,
- [T407 H221
- [T629
H621 H401
- iT607 H4,1
-
[T429 H601
392
aT64 aT64 -
ar aH2 ar
ap aH2 ap
- - aT, =
dz
s64r6,
(119)
aT46 aT46 -
ar
aH2 dr
ap dH2 dp
a T46 az
- s46r4,
In summary, we may classify generalized integration transformation into five groups: (i) TZ, S22;TZ4,S24; T26, s26; T28, satisfy the integration transformation similar to that shown in Eqs. (111)-(112). (ii) Tm,S40; T42, S42;T44,S44;T46,s 4 6 satisfy the similar integration transformation as shown in Section III,A. (iii) Tho, s60; T62, S62; Ta4,S,, satisfy the integration transformation similar to that shown in Section II1,B. (iv) TEo,SEo;T82, s 8 2 satisfy the integration transformation similar to that shown in Section III,C. (v) Tlo0,S,,, satisfy exactly the same integration transformation for TI,, S,, as shown in Section III,D. Therefore, in principle, one can calculate both intrinsic and combined aberrations in up to the ninth-order approximation, including isotropic aberrations containing the zero or even power of the product (r x p), and anisotropic aberrations containing the odd power of the product (r X p).
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
393
VI. EIKONAL INTEGRATION TRANSFORMATION IN GLASER’S BELL-SHAPED MAGNETIC FIELD In Glaser’s bell-shaped magnetic field (Glaser, 1952; Ximen 1983, 1986; Hawkes and Kasper, 19891, the axial distribution of magnetic induction can be expressed by an analytical formula: B(z) =
BO
1
+ (2/a)2 ’
where Bo is the maximum value of the axial magnetic induction at the center of the lens ( 2 = 01, a is the half-width of the magnetic field. Glaser’s bell-shaped magnetic field is a very important theoretical model, because not only can its Gaussian trajectory equation be solved analytically, but also its primary-third-order aberrations can be exactly expressed by analytical formulae (Glaser, 1952). Based on the results provided in the present study, one can confidently conclude that higher and ultrahigh-order aberrations in Glaser’s bell-shaped magnetic field can also be completely expressed in analytical formulae. In fact, by using the convention of dimensionless notations, i.e., Eq. (16)’ the dimensionless axial distribution of magnetic induction b ( z ) and its derivatives can be derived as follows: 1
I
b(2) = 1 +z2’ 22
b ’ ( z )= (1
+ z2)2 ’
394
JIYE XIMEN
b‘6’(z )
4 6 0 8 0 ~ ~ 5 7 6 0 0 ~ ~ 172802’
=
(1 bC”( z )
-
+ z2)7
(1
+
6 ’
6 4 5 1 2 0 ~ ~ 9676802’
= -
+ zz))”
(1 b@’(z )
=
+
(1
(1
z2)
+ z2)7
-
720
-
+ 2)’
(1
403200~~ (1
+z2)
6 +
+
z2)4’
403202 (1
+ z2)5
’
10321920~~ 18063360~~ 9 6 7 6 8 0 0 ~ ~
-
(1
-
1612800~’
(1
+
(1
+z2)9
+
+z2)6
+
Z’f
+ 2)’
(1
40320
(1
+z2)5’
In a bell-shaped magnetic field, the Gaussian trajectory equation can be derived from Eq. (35): r’ = p ,
p’
rff+
k’
-k2b2r,
=
(1 + z ’ )
’ r = 0,
where k’ is a dimensionless lens-strength parameter (Glaser, 1952). Substituting z
=
cp = arcctg z,
ctg cp,
z,
=
ctg Q,,
cp, =
arcctg z , (125)
and defining w = (1
+ k’)l’*
one can obtain two particular solutions of Eq. (124): rJz) rs( z )
=
sin cp,
1
rL(z) = rb( z )
=
w
sin Q,
sin cp,[
w
cos P,
+-[--0cos 0
=
-
1 sin w(cp w sin Q, sin Q
sin cp
cos w ( cp
p,)
Q,) 9
( cp - q,) +-cos pa sin wsin cp
9
0
[ - w cos w ( cp - (p,) sin cp + sin w ( cp - 9,) cos cp] ,
sin w ( cp - 9,) sin cp w ( cp
+ cos o(Q - cp,)
- pa) sin cp
cos ‘91
+ sin w( cp - q,) cos c p ] . (127)
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
395
Obviously, particular solutions r&), rs ( z ) satisfy initial conditions Eq. (77) and Wronsky determinant Eq. (78). By using Eq. (761, the first-order trajectory and momentum can be expressed as follows: rg = rsra + rapa,
p g = rhr,
+ rLpa.
( 128)
It is to be noted that, substituting Eq. (123) into Eqs. (491, (571, (65), and (741, we obtain field distribution functions S,, s6, s8, S,, in correspondingorder normalized eikonals.
s4 =
k’ 24(1
k’ s6 =
1440( 1
+ z’)
+ 2’) ( -201
4(1
+ 8k2 + 7z2),
+ 740k’ + 288k4
+3510z2 - 2769k2z2- 2769z4), kZ s8 =
40320(1
+ z’)
( -9018
(130)
+ 23478k’ + 17636k4+ 5760k6
+252396z2 - 220449k’~’ - 77812k4z2
+
+
- 5 4 5 6 3 4 ~ ~ 174819k2z4 160632z6), (131)
s,,
k’ =
3628800( 1 + z ’ ) l 0 X(
-968895
+ 2195451k’ + 2303748k4 + 1441304k6+ 403200k’
+41960700z2 - 37379520k2z2- 23633514k4z2- 7093928k6z2
+
+ 22493010k4z4
- 174877650~~76761495k2z4
+ 147583980~~ - 26072910k2z6- 228903752’).
(132)
Evidently, by substituting Eqs. (12914132) into Eqs. (9014931, higher and ultrahigh-order position and momentum aberrations in Glaser’s bell-shaped magnetic field can be completely calculated and expressed in analytical formulae, From Eqs. (129)-(132), it is to be seen that field-distribution functions S,, decrease rapidly with increase of the number n(n = 2,3,4,5). Therefore, weights of higher ad ultrahigh-order aberrations with respect to the total aberration decrease remarkably. It is expected that these theoretical results are useful for estimating effects of ultrahigh-order aberrations in magnetic lenses.
396
JIYE XIMEN
VII. GENERALIZED hTEGRATION TRANSFORMATION ON EIKONAIS IN ELECTROSTATIC LENSES In the present section, a rotationally symmetric pure electrostatic system will be discussed. The Hamiltonian function H is defined and expanded as follows (Glaser, 1952; Sturrock, 1955; Hawkes and Kasper, 1989; Ximen 1990a, b, 1991, 1995):
H = -{4-(P-P)p2,
+
H = H , + H , + H4 H6 + + H,, + . (133) In order to establish the canonical aberration theory in up to the tenthorder approximation, the electrostatic potential 4 is expanded into power series (Glaser, 1952; Ximen, 1983, 1986, 1995):
4
=
V(Z)
-a
- + a , ~ ( ~ ) (rl2r - a , ~ ( ~ )- (r13 r r14 - uloV1o)(r- r)’ + , (134)
, ~ ( z ) ( r r)
+ a,V’)(r 1
a2=q,
*
*.-
1
1 a4=64,
U6 =
36 X 64 ’
1 1 (135) ‘lo = 36 x 64 x 64 x 100 ’ 36 x 64 x 64 ’ where V ( z ) is the axial distribution of the electrostatic potential, and a, =
ff,
I
-j/1/2.
In a rotationally symmetric pure electrostatic system, an electron trajectory is not rotated by a magnetic field, thus there is no (r X p) term in the Hamiltonian function. Therefore, the Hamiltonian function can be simplified and expressed in physical units instead of in dimensional form. In order to describe the canonical aberration theory in up to the tenth-order approximation, we have to list all nonvanishing field-distribution functions with respect to H2 in Eq. (61, H4 in Eq. (71, H6 in Eq. (91, H8 in Eq. (11) and H , , in Eq. (13) as follows (Ximen, 1995):
ULTRAHIGH-ORDERCANONICAL, ABERRATIONS
I -
1
- 24576V5/2
397
( 105Vrr3- 45W”V‘4’ + 2V2V‘6’),
The Gaussian trajectory equation in an electrostatic system is given by: d d V” p’ = - - d (H 2 ) = - 2 M r = -r’ = - ( H , ) = p/~’/2, r 4V/’/2I-. dP ( 140)
398
JIYE XIMEN
In following paragraphs, we have further performed a generalized integration transformation on eikonals in a rotationally symmetric electrostatic system, and then derived a set of different-order normalized eikonals, which are position-dependent and momentum-independent. Thus we can also calculate intrinsic and combined aberrations by the same method as shown in Section IV. A. Normalized Fourth-OrderEikonal in Terms of T4 and S,
According to Eqs. (71, (40, and (136) we obtain: T4 = t 3 1 p 3 r+ t 2 , p 2 r 2+ t1,pr3 + tO4r4, 1
t31 =
8V’
t,,
V’
=
16V3/’ ’
ti3 =
(141)
1 V” V t 2 - 7)’ 32( V +
( 142)
t,=-
Obviously, these results coincide with those presented in the literature (Seman, 1955, 1958). B. Normalized Sixth-Order Eikonal in Terms of T, and S,
According to Eqs. (9), (421, and (137) we obtain:
+ tZ4p2r4+ t,,prS + tO6r6, 1 t,, = -( - 2VI2 + W ”,} 48V3
T6 = t4,p4r2+ t,,p’r3 V’ t42 =
--
t,,
-(210V4 - 33OW”V”
=
32V5/’ ’
1 7680V4
(144)
+ 84V’V”’
+39V2V’V‘3’ - 26V3V4’}, t,
=
1 (840V” - 2040W’3V” 948V2V’V’’’ 46080V9/’ +408V2V’2V‘3’- 182V3V”V‘3’- 95V’V’V‘4’ + 26V4V”’}, (145)
+
399
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
s,
1
=
(756OVt6 - 22O50Wf4Vf‘+ 16080V2V’2V’f2 92160V ‘ ‘ I 2
+ 6120V2V‘3V(3’- 5853V3V’V’V(3’ +364V4(V(3))2- 1191V3v12V(4)+ 296V4VfV(4) - 1464V3V”’
+216V4V’V(’) - 32V5V(6)).
(146)
C. Normalized Eighth-Order Eikonal in Terms of T, and S , According to Eqs. (10, (431, and (138) we obtain:
T,
=
t,,
+ t62p6r2+ t,,p5r3 + t4,p4r4 + t3Sp3r5+ t2,p2r6+ tl,pr7 + to,r8,
t,,p’r
1 128V3 ’
= --
t,,
t44 =
t3’
=
=
I62 =
-
( 147)
5 V’ 256 V 7/2 ’
1 1536V4
- -( 2 1 v 2 + 5W“),
1 2048V9/2 ( -4OW‘V”
1 ( -840V4 61440V5
+ 9V2V3’},
+ 9 O W 2 V ” + 34V2V”’
+219V2V‘V‘,) - 46V3V‘4’), t26 =
1 (-756OV” 368640V “I2
+ 11760W’3V” - 549OV2V’V’‘
- 1065V2V’2V‘3’ + 1091V3Vf’V‘3)+ 249V3V’V‘4’ - 58V4V‘5’), 1 t17 =
5160960V
( -8316OVf6 - 1638O0W4V” - 95760V2V’2V’f2 + 11574V3Vf’, - 30975V2V’3V‘3’ + 31774V3V’V’fV(3) -2182V4( V(,))’ -672V4V’V(5)
+ 6435V3V’2V(4)- 5806V4V”V(4)
+ 556V5V@)),
400 t,
JIYE XIMEN =
1 4128768OVl3/’ X
( -49896OVf7 + 119448OW5V” - 804720V2V‘3V”2 + 119520V3V’V” - 287700V2V f 4F3’+ 338040V3V 2V”V‘3’ -40211V4V”2V(3)- 36138V4V’(V ( 3 ) ) 2+ 65400V3V’3V(4) -57873V4V/‘V“V4’+ 9330V5V”’V‘4’ - 7779V4V’2V‘5’ +3888V5V”V”’
s -
-
+ 2068VsV‘V‘6’ - 556V6V7’),
(148)
( - 6486480V” + 19792080W’6V“ - 18461520V2V’4V’”2+ 5186160V’V’2V”3
82575360V15/2
- 136800V4V”4- 4978260V2V’5V/‘3’
+ 7762860 V
V’ V“V 3 ) 2130135V4V’V” V ( 3 ) -856770V4Vf2( + 224392V5Vf’(V ( 3 ) ) 2
+962640V3V’4V(4)- 1236345V4V’2V‘rV(4) + 105744V5V‘f2V(4) 275184V5V‘V(3)V(4) -7404V6( V(4))2- 169695V4Vr3V(5)
+
+ 155838VsV’V”V‘s’ - 26436V6V‘3’V‘5’ +25122V5V”2V(6’- 4O88V6V”V6’ -4692V6VV‘7’
+ 832V7V8’).
( 149)
D. Normalized Tenth-Order Eikonal in Terms of T,, and S,, According to Eqs. (13), (44) and (139) we obtain:
+ t6,p6r4+ t S 5 p 5 r 5 + t4,p4r6+ t37p3r7+ t z a p 2 r 8+ t,,pr9 + tO1OrlO,(150) 1 7 vr 1 = -t73 = - { V 2+ W ” } , t,, = 1 2 8 ~ ’4 512V9I2’ 128V5 TI,
t91
=
t , , p 9 r + t8,p8r2 + t,,p7r3
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
‘*
=
1 { -6174V2V‘V’’2 73728OVl3l2
+
+ 2008V3V”V‘3) +891V3V’V‘4’
t37 =
1 (4158OVf6 - 99540W’4V‘’ 5160960V7 -4464V3V‘”
40 1
- 144V4V‘5’},
+ 35532V2V’zV’’2
+ 15582V2V/’3V‘3’+ 1116V3V’v”V(3’
+ -915V4V‘V5) + 86V5P6)),
-917V4( V ( 3 ) ) 2 2715V3V’2V(4)- 3 6 2 ~ ~ v ” v ( ~ )
t28 =
1 27525120V15/2
x (36O360Vt7 - 99792OW”V”
+ 756000V2~’3v2
+ 214200V2V’4V‘3’ - 217448V3V’2V’’V‘3’ - 22428V3V3V4’ +49345V4V”2V(3)+ 9468V4V’( V3))’ + 33579V4V‘V’’V‘4) - 4796V5V‘3’V‘4’ - 1047V4V’2V(5) -3030VsV”V‘5’ - 338V5V’P6’ + 128V6V‘”), - 188664V3V’V”3
1 ‘19 =
1486356480V8
x (16216200Vt8 - 51060240W’6V” + 48036240V2V’4V’2 - 15150240V3V’2V’’3+ 852912V4Vtt4+ 13056120V2V’5V(3) - 18966528V3V’3v“V(3)
+ 6302001V4V/’V”2Vv‘3’
+ 1503516V4V’’( V ( 3 ) ) -2 618588V5V”(W3))’ - 1573236V3V’4V‘4’+ 2256231V4V’zVzV(4) -532200VsVt’2V(4) - 422562VsV’V(3)V(4)+ 49944v6( ~ ( ~ 9 ’ + 112581V4V’3V/‘5’- 240192V5V‘V’’V(5’ + 46956V6V‘3’V‘5’ - 25668VsV’2V(6) + 42528V6V”V(6) +3180V6V‘V‘’’ - 2728V7V‘8)),
402
JIYE XIMEN
1 to10 =
14863564800V I 7 l 2 X
(1297296OOVf9- 459459000W’7V“ + 5 189184oOV V” V 2- 207431280V V’3V” + 19266912V4V’V”4 + 129396960V2V’6V(3) - 240“5120V3V’4V’’V‘3’ + 107398704V4V’2V ” 2 V ( 3 ) - 5629134V5v”3V(3) + 24980592V4Vt3(Y ( 3 ) ) 2 - 15910194V5V’V”( V ( 3 ) ) 2 + 618588V6( V 0 ) ) 3 - 17747100V3V’5V‘4’ + 26229000V4V”V” P4) - 6790878V’V’
V4)- 5645673V’ V” V(3)V(4)
+ 1595250V6V”V(3”c/(4)+ 361422V6V’(V(4))2
+2023560V4V‘4V(5)- 306”75V5V’2V’’V(5) +418062V6Vf’2V(5)+ 756666V6V’V(3)V(5) - 92916‘v7V4’V‘’’ - 360945V5V’3V‘6’ + 424050V6L”V”V(6’ - 84444V7V3)V6’+ 32028V6V‘2V(7)- 22236V7V”V‘7’
- 1O948V7V’V@)+ 2728V8V9)), 1 ‘lo
=
(151)
29727129600V 1912 X
(2205403200V”0 - 9145936800W‘8V“
+ 129080952O0V2Vr6Vff2- 7196500080V3V’4V”3 + 1318651488V4V’2V’’4- 28353024V’V”’ + 2601078480V2V’7V(3)- 6203652840V3V’5~‘’V(3) + 4031128836V4V’3V”2V(3)- 585020205V5V’V”3V‘3’ + 7O4915568V4Vf4(V3))’- 683332146V5V2V~‘(V39’ + 62109492V6V”2( + 34913328V6V‘( V ( 3 ) ) 3
+ 851509260V4V’4V”V‘4’ + 14061024V6V’”3V‘4’ - 186324579V5V’3V‘3’V(4)+ 11”46580V6V’V’’V(3’V(4) - 6509268V7( V(3))2Vc4) + 12166956V6V2(V(4))2 - 1461864V7Vf(V(4))2 + 53706240V4V’5V(5) - 439043220V3V’6V‘4’
- 383907699V’V’2V”2V‘4’
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
403
- 89523000V5V’3V’fV(5)+ 26601246V6V’V“2V(5)
+ 21200226V6V’*Y(3)V(5) - 61413OOV’ V ” V ( 3 ) V ( 5 ) -3116808V7V’V(4)V(5) + 185832V8( V ( 5 ) ) 2 - 5515335V5V‘4V‘6’
+ 8821530V6V’2V ” V ( 6 )
- 595344V V” V @ ) 24182O4V V’V‘3’V‘6’
+ 73200V8V(4)V(6)+ 882030V6V’3V(7)
- 1027020V7V’V’’V(7’+ 213360V8V‘3’V(7) -1
+
2 2 1 0 0 ~ ~ ~i 4~ 9~ 2~ 8( ~~ ~) ~ ” ~ ( ~ )
+ 24624V8V’V(9)- 4448V9V(*0)).
(152)
So far, in rotationally symmetric pure electrostatic systems, we have performed a generalized integration transformation on eikonals and derived a set of different-order normalized eikonals, which are position dependent and momentum independent. These normalized eikonals greatly facilitate calculating intrinsic and combined aberrations by the same method as shown in Section IV. However, it is to be emphasized that, for rotationally symmetric electrostatic lenses, only isotropic aberrations exist, but no anisotropic aberration appears. VIII. CONCLUSION Based on the ultrahigh-order canonical aberration theory (Ximen, 19951, we have derived the power-series expressions for Hamiltonian functions up to the tenth-order approximation in rotationally symmetric magnetic and electrostatic systems. In the ultrahigh-order abberation calculations, the key point is that the derivatives r’ and p’ must retain necessary high-order terms in the total derivative of the integration factor T,,. It is the author’s contribution that the ultrahigh-order derivative equation (34) and the ultrahigh-order Poisson brackets [T,, ,H,, I have been introduced into the generalized integration transformation on eikonals for deriving ultrahighorder canonical aberrations. For investigating magnetic systems, by transforming physical quantities into corresponding dimensionless ones, we have derived the canonical power-series expressions for dimensionless eikonal functions up to the tenth-order approximation. Obviously, in power-series expressions of Hamiltonian functions, dimensionless field-distribution functions with the even power of the magnetic field describe isotropic aberrations, and the
404
JIYE XIMEN
field-distribution functions with the odd power of the magnetic field describe anisotropic aberrations. We have successfully performed a series of generalized integration transformations on eikonals independent of the constant product (r X p) and on eikonals associated with the constant product (r X p), thus obtaining a set of different-order normalized eikonals, which are position dependent and momentum independent. According to canonical aberration theory, knowing different-order eikonal functions enables us to calculate both intrinsic and combined aberrations up to the ninth-order approximation by means of a gradient operation on the corresponding-order eikonal function in a rotationally symmetric magnetic system. Because normalized eikonals are position dependent and momentum independent, it is much easier to performing their higher and ultrahigh-order gradient operations. Therefore, in principle, we can calculate not only isotropic but also anisotropic, intrinsic, and combined aberrations in up to the ninth-order approximation. Precisely speaking, third-, fifth-, seventh-, and ninth-order canonical position and momentum aberrations have been completely expressed in concise and explicit form. By a similar theoretical method, we have also performed a series of generalized integration transformation on eikonals in electrostatic systems, thus obtaining a set of different-order normalized eikonals which are position dependent and momentum independent. Therefore, we can calculate intrinsic and combined aberrations in up to the ninth-order approximation by means of a gradient operation on a corresponding-order eikonal function in a rotationally symmetric electrostatic system. It is to be emphasized that this progress facilitates numerically calculating ultrahigh-order canonical aberrations in practical rotationally symmetrical magnetic and electrostatic systems. As an application, we have calculated higher and ultrahigh-order position and momentum aberrations and expressed them in analytical formulae for Glaser’s bell-shaped magnetic field. For such a bell-shaped magnetic field, weights of higher and ultrahigh-order aberrations with respect to the total aberration decrease remarkably with increase of the aberration order n ( n = 3,5,7,9). It is expected that the present theoretical results will be useful for estimating effects of ultrahigh-order aberrations in magnetic lenses. The canonical aberration theory has several main advantages: the momentum aberrations are much simpler than the same-order slope aberrations; the normalized eikonal expressions enable us to calculate position and momentum aberrations, including axial and off-axial aberrations, at any observation plane in magnetic or electrostatic systems with rectilinear or curvilinear axes. In principle, the canonical aberration theory can be utilized to calculate higher than ninth-order canonical aberrations, including intrinsic and combined position and momentum aberrations, in rota-
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
405
tionally symmetric magnetic or electrostatic systems. It is evident that the calculation of ultrahigh-order canonical aberrations is very complicated. However, the theoretical features of the canonical aberration theory, i.e., its conciseness and simplicity, position dependence but momentum independence to a certain extent, symmetrical property, and recursive structure, give us the attractive possibility of calculating ultrahigh-order canonical aberrations with the computer software MATHEMATICA.
APPENDIX In Eqs. (201, (231, (26), and (29), dimensionless field-distribution functions L, M, N,L,, J,,, I,, have been presented in detail. Based on the previous chapter (Ximen, 1995), we will list other dimensionless field-distribution functions of H2,(n = 2,3,4,5) that appeared in Eqs. (7), (9), (111, and (13).
(a) N,
=
3 1 - k 4 b 4 - -k2bb", 4 8
JL1=
JM, =
-
1 -kb" ' 64
JL, =
15 3 1 -k6b6 - -k4b3bIf + - k 2 ( 16 8 128
JM2 =
JNl =
15 -k3b3 16
3 N - -k2b2, 2 - 4
15 1 -k4b4 - -k2bb" , 8 8
5 3 -k5b5 - -k3b2b" , 4 16
1 N - - k 3 b 3 , (A3) 3-2
5 -kb, 16
(A41
1 b f t ) 2+ -k2bb(4), 192 15 JM3 = -16 k2b2,
5 Jn2 = -k3b3, 4
(W 5
JN3 =
- k 4 b 4 , (A6) 8
406
JIYE XIMEN
Obviously, dimensionless field-distribution functions with the even power of the magnetic field kb describe isotropic aberrations, and dimensionless field-distribution functions with the odd power of the magnetic field kb describe anisotropic aberrations. ACKNOWLEDGMENT
This work was supported by the Doctoral Program Foundation of the Institute of Higher Education of China.
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
407
REFERENCES Arnold, V. I. (1978). “Mathematical Method of Classical Mechanics.” Springer-Verlag, New York. Glaser, W. (1933a). Z. Physik 81, 647. Glaser, W. (1933b). Z. Physik 83, 104. Glaser, W. (1952). “Grundlagen der Elektronenoptik.” Springer, Vienna. Goldstein, H. (1980). “Classical Mechanics,” 2nd ed. Addison-Wesley, Reading, MA. Hawkes, P. W. (1966/67). Optik 24, 252-262, 275-282. Hawkes, P. W., and Kasper, E. (1989). “Principles of Electron Optics.” Academic Press, London. Plies, E., and ’Qpke, D. (1978). Z. Naturforsch. 33a, 1361. Scherzer, 0. (1936a). Z. Physik 101, 23. Scherzer, 0. (1936b). Z. Physik 101, 593. Seman, 0. I. (1955). Trudy Inst. Fir. Astron. Akad. Nauk Eston SSR No. 2, 3-29, 30-49. Seman. 0. I. (1958). “The Theoretical Basis of Electron Optics.” Higher Education Press, Beijing. Sturrock, P. A. (1955). “Static and Dynamic Electron Optics.” University Press, Cambridge. Ximen, J. (1983). “Principles of Electron and Ion Optics and Introduction to Aberration Theory.” Science Press, Beijing. Ximen, J. (1986). Aberration theory in electron and ion optics. In “Advances in Electronics and Electron Physics” (P. W. Hawkes, Ed.), Suppl. 17. Academic Press, New York. Ximen, J. (1990a). Oprik 84, 83. Ximen, J. (1990b). J . Appl. Phys. 68,5963. Ximen, J. (1991). Canonical theory in electron optics. I n “Advances in Electronics and Electron Physics” (P. W. Hawkes and B. Kazan, Eds.), Vol. 81, p. 231. Academic Press, Orlando, FL. Ximen, J. (1995). Canonical aberration theory in electron optics up to ultrahigh-order approximation. In “Advances in Imaging and Electron Physics” (P. W. Hawkes, Ed.), Vol. 91, p. 1. Academic Press, San Diego.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL.97
Erratum and Addendum for Physical Information and the Derivation of Electron Physics B. ROY FRIEDEN Optical Sciences Center, University of Arizona Tucson, Ariwna 85721
Soon after the publication of Frieden (1999, it was found that some key equations were off by a factor of c, the speed of light. The corrected equations lead to a new physical interpretation of Fisher information I. Also, some improvements have been made in the physical model for the information approach that is the basis for the chapter. These will be briefly mentioned. Equation (VII.19a) should have an extra factor of c,
Correspondingly, Eqs. (VII.19b) should read
Then Eq. (VII.20) reads
I
=
($)//dpdEP(p,E)
(
-p2
;:)
+-
(VII.20)
and Eq. (VII.21) becomes (VII .21) The lack of a c in the first factor then obviates the following remark about c five lines below: “In the first factor, quantity c is shown elsewhere (Section IX) to be constant.” The key result of these corrections is as follows. Information Z in Eq. (VII.26) becomes
I =J
=
(2rnc/iQ2 = (2/2qZ7 409
(VII .26)
Copyright 0 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.
410
B. ROY FRIEDEN
where 2’is the Compton wavelength for the particle. Now, by Eq. (III.lOb) of the chapter, I relates to the minimum mean square error of estimation of the particle four-position, e&, as ezff = I/z.
(1II.lOb)
Hence, Eq. (VII.26) predicts that the minimum root-mean-square error e is one-half the Compton wavelength. This is reasonable, since the Compton wavelength is a limiting resolution length in the measurement of particle position. The upshot is that the information-based derivation (now) makes a reasonable prediction on resolution, as well as deriving the Klein-Gordon and Dirac equations (the main thrust of the chapter). The improvements in the model for the information procedure are twofold. The first is as follows. It previously had to be assumed axiomatically that the total physical information (I - J ) is zero at its extremum. In fact it was recently found (Frieden and Soffer, 1995) that the zero, and the extremization, may be explained on the basis of a zero-sum game of information transfer that transpires between the data measurer and nature. The information Z preexisting in the data has to come from somewhere. That “somewhere” is the physical phenomenon (nature) underlying the measurement. Nature’s version of Z is the information form J. Thus, whereas the data information Z is expressed abstractly as Eq. (VI.71, 4
1= 4
C jdroqn(r) n=l
*
nqn(r),
(VI .7)
in terms of the “mode functions” qn defining probability density p, nature’s information J is I expressed in terms of the physical parameters governing the measurement. Since I = J the game is zero sum, and since the measurer and nature both “want” to maximize their information states, the variation S(I - J ) = 0 as required.’ The second improvement in the theory lies in the physical basis for the form (VI.7). It previously had to be assumed that the mode functions are in an idealized state during the single gedanken measurement that underlies the theory. This state was called the “characteristic state” and corresponds to the situation where the mode functions qJr) have no overlap of their support regions r. [Such mode functions allow for an additivity of information I, as expressed by the summation in form (V1.7).] Unfortunately, the characteristic state is unphysical in many problems, such as the quantum mechanical free-field particle in a box. ‘Most recently, this was found to follow from the perturbation of the system wave function at the measurement. See B. R. Frieden and B. H. Soffer, “Extreme physical information as a natural process,” Phys. Reu. E (submitted).
ERRATUM AND ADDENDUM
41 1
We recently found’ that the same form (VI.7) follows if, instead of the single gedanken measurement, many independent measurements of the desired parameter are made. At measurement n, the system is in the “prepared” state qn. Modes q,, are physically realizable, since they are the solutions to the very differential equation (Schr6dinger wave equation, Dirac equation, etc.) that the information procedure derives. In this way, the unphysical assumption of nonoverlap of modes q,,(r) is avoided. The theory has been significantly strengthened in this way.
REFERENCES Frieden, B. R. (1995). Physical information and the derivation of electron physics. I n “Advances in Imaging and Electron Physics” (P. W. Hawkes, Ed.), Vol. 90,pp. 123-204. Academic Press, San Diego. Frieden, B. R., and Soffer, B. H. (1995). Lagrangians of physics and the game of Fisherinformation transfer. Phys. Reu. E 52, 2274.
*B. R. Frieden and W. J. Cocke, “Foundation for Fisher information-based derivations of physical laws,” Phys. Reu. E (in press).
This Page Intentionally Left Blank
Index
A
B
Accelerators, optics, 337 Algebraic reconstruction (ART),image reconstruction, 160-162 Algorithms block matching algorithms, 235-237 compression algorithm, 193 edge-preserving reconstruction algorithms, 91-93, 118-129 extended-GNC algorithm, 132- 136, 171-175 generalized expectation-maximization algorithm, 93, 127-129, 153, 162-166 graduated nonconvexity algorithm, 90, 91,93,124-127, 153, 168-175 EZW algorithm, 221, 232 Gibbs sampler algorithm, 119-120, 123 image discontinuities, 89-91, 108-118 low-bit-rate video coding, 237-240 Metropolis algorithm, 119, 120 mixed annealing minimization algorithm, 122-123, 158 overlapped block matching algorithm, 235, 237-252 SA-W-LVQ algorithm, 220-226 simulated annealing minimization algorithm, 120-122, 155 suboptimal algorithms, 124-127 Approximation scaling factor, 21 1 Approximation vector, 211 Arithmetic coding, 194
Baker-Campbell-Hausdorff formula, 347 Barnes-Wall lattice, 217-218 Bayesian approach, regularization, 87-88, 98-104 Bayesian classification, pixels, 68 Biorthogonal functions, Gabor expansion, 27-29 Blind restoration problem, 141 Block matching algorithms, motion estimation, 235-237 Block transforms, joint space-frequency representation, 11 Boltzmann machine (BM), 150-151, 153
413
C
Canonical aberration theory, ultrahigh order, 360-406 Characteristic state, 410 Charged-particlewave optics, 257-259, 336-339 Feshbach-Villars form of Klein-Gordon equation, 263, 322, 339-341 Foldy- Wouthuysen representation of Dirac equation, 267-269,322, 341-347 Green’s function for nonrelativistic free particle, 280, 350-351 for system with time-dependent quadratic Hamiltonian, 351-355
414
INDEX
Charged-particlewave optics (Continued) Klein-Gordon equation, 259,276,337,338 Feschbach-Villars form, 263,322, 339-341 Magnus formula, 347-349 matrix element of rotation operator, 351 scalar theory, 316-317 axially symmetric electrostatic lenses, 320-321 axially symmetric magnetic lenses, 282-316 electrostatic quadrupole lenses, 321-322 free propagation, 279-282 general formalism, 259-279 magnetic quadrupole lenses, 317-320 spinor theory, 258 axially symmetric magnetic lenses, 333-335 free propagation, 330-332 general formalism, 322-330 magnetic quadrupole lenses, 226 Clifford-Hammersley theorem, 90 Cliques, neighborhood system, 105-106 Codebooks lattice codebooks, 218-220, 227-230 multiresolution codebooks, 202 regular lattices, 218-220 successive approximation quantization, 214-216 Coding, see Image coding Combined aberrations, 360-361 Complex spectrogram, conjoint image representation, 9-10 Compression, see Image compression Compression algorithm, 193 Compton wavelength, 410 Computed tomography, image formation, 59-60 Conjoint image representation, 2-4, 5 Gabor wavelets, 19-37 Continuous signals, exact Gabor expansion, 23-30 Cost functions, 88 optimal estimators based on, 101-103
D Daugman's neural network, image reconstruction, 31, 52 DCI', 51,52,194
Deblurring, image reconstruction, 155- 159 Decoder, digital coding, 192 Denoising, image enhancement, 56-58 Difference of Gaussian (DOG), receptive field, 44-45 Differential pulse code modulation (DPCM), 51 Diffraction, charged-particle beam scalar theory, 279-282 spinor theory, 330-332 Digital coding, 192-194 Dirac equation, Foldy-Wouthuysen representation, 267-269,322, 341-347 Discontinuities image processing, 89-91 image reconstruction, 108-118 duality theorem, 91, 115-118 explicit lines, 110-115, 154-166 implicit lines, 108-110, 166-181 line continuation constraint, 130-141, 142 Discrete cosine transform, 11 Discrete signals, exact Gabor expansion, 30-33 Discrete spectrogram, 11 Duality theorem, image processing, 91, 115-118
E Edge detection Gabor functions, 7 wavelets, 63-64 Edge-preserving reconstruction algorithms, 91-93, 118-129 extended GNC algorithm, 132-136, 171-175 generalized expectation-maximization (GEM) algorithm, 93, 127-129, 153, 162-166 graduated nonconvexity (GNC) algorithm, 90,91, 93, 124-127, 153, 168-175 Electron optics canonical aberration theory, 360-406 lenses axially symmetric electrostatic lenses, 320-321 axially symmetric magnetic lenses, 282-316,333-335 magnetic quadrupole lenses, 317-320, 336
415
INDEX Electron wave optics, see Charged-particle wave optics Electrostatic lenses charged-particle wave optics, 320-321 integration transformation, 396-403 Encoder, digital coding, 192 Entropy coding, 193-194 Expectationmaximization (EM) approach, image processing, 127-129 Explicit lines, image reconstruction, 110-115, 154-166 Extended-GNC (E-GNC) algorithm, 132-136, 171-175 EZW algorithm, 221,232
F Filtered backprojection (FBP), image reconstruction, 160-162 Fingerprint database, image compression, 54 Finite state scalar quantization, 203 Fisher information, new interpretation, 409 Foldy-Wouthuysen representation, Dirac equation, 267-269,322,341-347 Fractal dimension, image, 69-70 Free propagation, charged-particle beam scalar theory, 279-282 spinor theory, 330-332 G
Gabor-DCT transform, 52 Gabor expansion biorthogonal functions, 27-29 exact Gabor expansion, 23-30 image enhancement, 54-55 quasicomplete, 34-37 Gabor functions (Gabor wavelets, Gaussian wave packets, GW), 3-4,5, 7 applications, 78 human visual system modeling, 41-45 continuous signal, 23-27 biorthogonal functions, 27-29 Zak transform, 29-30 discrete signals, 30-33 Daugman’s neural network, 31-32 direct method, 32-33 drawbacks, 6-7 image analysis and machine vision, 61-78
image coding, 50-54 image enhancement, 54-59 image reconstruction, 59-60 machine vision, 61-66 mathematical expression, 5 orthogonality, 6, 11, 13, 22 quasicomplete Gabor transform, 34-37 receptive field of visual cortical cells, 41-45 vision modeling, 17, 34-35, 41-45 Gabor transform, quasicomplete, 34-37 Gaussian derivatives edge detection, 64 vision modeling, 17, 19, 45 Gaussian Markov random fields (GMRFs), 107 Gaussian wavelets machine vision, 61, 63 texture analysis, 64-68 Gaussian wave packets, see Gabor functions Generalized Boltzmann machine (GBM), 150-151 Generalized expectation-maximization (GEM) algorithm image processing, 93, 153 tomographic reconstruction, 162-166 Generalized integration transformation, eikonals electrostatic lenses, 396-403 magnetic lenses, 369-381, 389-392 Gibbs distributions, Markov random fields, 106-108 Gibbs sampler algorithm, 119-120, 123 Graduated nonconvexity (GNO algorithm, image processing, 90,91, 93, 124-127, 153, 168-175 Green’s function for nonrelativistic free particle, 280, 350-351 for system with time-dependent quadratic Hamiltonian, 351-355
H Hadamard matrix, 217-218 Hexagonal-oriented quadrature pyramid, joint space-frequency representations, 19,20 H u m a n coding, 51, 194
416
INDEX
Human vision Gabor functions, 17,34-37,41-45 joint representations, 16-19, 37-50 receptive field, 40-44 sampling, 45-50 Hyperparameters MRF hyperparameters, 146-149 regularization, 141-143
line continuation constraint, 130-141, 142
duality theorem, 91, 115-118 expectation-maximization(EM) approach, 127- 129
generalized expectation-maximization (GEM) algorithm, 93, 127-129, 153, 162-166
graduated nonconvexity (GNC) algorithm, I
Image analysis, 61-63 edge detection, 7, 63-64 motion analysis, 72-74 stereovision, 74-75, 76-78 texture analysis, 64-72 Image coding algorithms EZW coding, 221,232 SA-W-LVQ, 221-226 arithmetic coding, 194 digital coding, 192-194 entropy coding, 193-194 Gabor expansion, 50-54 Huffman coding, 51, 194 low-bit-rate video coding, 232-252 partition priority coding, 201 predictive coding, 51, 193 regularization, 147-148 still images, 226-232 transform coding, 51, 193 wavelets, 198-205 Image compression, 192 applications, 50-51, 54 fingerprint database, 54 methods, 51 standards, 51, 194 wavelet transforms, 52-53, 194, 198-205 Image deblurring, 155-159 Image discontinuities, see Discontinuities Image enhancement denoising, 56-58 Gabor expansion, 54-55 image fusion, 58-59 Image fusion, image enhancement, 58-59 Image processing discontinuities, 89-91, 108-118 duality theorem, 91, 115-118 explicit lines, 110-115, 154-166 implicit lines, 108-1 10, 166-181
90,91, 93, 124-127, 153, 168-175
iterated conditional modes, 92 theory, 2-3 Image quality, measuring, 61 Image reconstruction, 86-87, 181-184 algebraic reconstruction (ART), 160-162 applications, 153-154 explicit lines, 110-115, 154-155 implicit lines, 91, 108-110, 166-181 blind restoration problem, 141 Daugman’s neural network, 31, 52 deblurring, 155-159 discontinuities duality theorem, 91, 115-118 explicit treatment, 110-115, 154-166 implicit treatment, 108-110, 166-181 line continuation constraint, 130-141, 142
edge-preservingalgorithms, 91-93, 118-129
extended GNC algorithm, 132-136, 171-175
GEM algorithm, 93,127-129,153. 162-166
GNC algorithm, 90, 91, 93, 124-127, 153, 168-175
edge-preserving regularization, 93-94 theory, 104-118 filtered backprojection, 160-162 inverse problem, 94-98,99-101 regularization, 87-89 Bayesian approach, 87-88, 98-104 discontinuities, 89-91, 108-118 inverse problem, 94-98, 99-101 three-dimensional, 59-60 tomographic reconstruction, 159-166 Image representation, 75, 78-79 Gabor schemes, 19-23 continuous signals, 23-30 discrete signals, 30-33 quasicomplete Gabor transform, 34-37
417
INDEX image analysis, 61-63 edge detection, 7, 63-64 motion analysis, 72-74 stereovision, 74-75, 76-78 texture analysis, 64-72 image coding, see Image coding image compression, 192 applications, 50-51, 54 fingerprint database, 54 methods, 51 standards, 51, 194 wavelet transform, 52-53, 194, 198-205 image enhancement and reconstruction, 37,54-56 denoising, 56-58 Gabor expansion, 54-55 image fusion, 58-59 image quality metrics, 10, 61 three-dimensional reconstruction, 59-60 joint space-frequency representations, 3, 8 block transforms, 11 complex spectrogram, 9-10 multiresolution pyramids, 13-16 vision-oriented models, 16- 19 wavelets, 11-13 Wigner distribution function, 9 machine vision, 61-78 orthogonality, 6-7, 11, 13, 22 theory, 2-7 vision modeling Gabor functions, 17.34-37,41-45 sampling in human vision, 45-50 visual cortex image representation, 37-41 Implicit lines image processing, 91 image reconstruction, 108-110, 166-181 Informational uncertainty, 10 Integration transformation electrostatic lenses, 396-403 Glaser’s bell-shaped magnetic field, 393-395 magnetic lenses, 369-381, 389-392 Inverse problem, image reconstruction, 94-98, 99-101 Isolated zero, 224 Isotropic intrinsic aberrations, 360-361 Iterated conditional modes (ICM), image processing, 92
J Joint space-frequency representations, 3, 8 block transforms, 11 complex spectrogram, 9-10 multiresolution pyramids, 13-16 vision-oriented models, 16-19 wavelets, 11-13 Wigner distribution function, 9 JPEG, 51,54
K Klein-Gordon equation charged-particlewave optics, 259, 276,337, 338 Feschbach-Villars form, 263,322,339-341
L Laplacian pyramid, image compression, 51-52 Lapped orthogonal transform, 11 Lattice codebooks, 218-220, 227-230 Lattice packing, 216 Lattice vector quantization, 194 Likelihood function, 88 Line continuation constraint, 130 extended GNC,132-136 mean field approximation, 131-132 sigmoidal approximation, 137-141 Logons, 4 Low-Balian theorem, 23, 28 Low-bit-rate video coding, 232-252 algorithm, 237-240
M Machine vision, 61-78 Gabor function, 61-66 Gaussian wavelets, 61, 63 Magnetic lenses canonical aberrations, 381-388 charged-particle wave optics axially symmetric lenses, 282-316, 333-335 quadrupole lenses, 317-320, 336 integration transformation, 369-381, 389-392
418
INDEX
Magnetic lenses (Continued) power-series expansions eikonal, 366-369 Hamiltonian function, 361-366 Magnus formula, 347-349 MAP (maximum a posteriori) estimate, 88 edge-preservingalgorithm, 92,102-103, 104 Mapping, 51, 193 image compression, 51-52 Marginal posterior mean, cost function, 102 Markov random fields (MRFs) Gibbs distributions, 106-108 image processing, 90,105 Maxima of the posterior marginals estimate, see MPM estimate Maximum a posteriori estimate, see MAP estimate Maximum likelihood (ML) criterion, 104,145, 149 Maximum pseudolikelihood (MPL) estimate, 148 Metropolis algorithm, 119, 120 Mixed annealing minimization algorithm, 122-123, 158 ML criterion, see Maximum likelihood criterion Modularity, human visual system, 39 Monte Carlo methods, image regularization, 119-120 Morozov’s discrepancy principle, 143 Motion analysis, vision systems, 72-74 MPL, see Maximum pseudolikelihood estimate MPM (maxima of the posterior marginals) estimate, 88, 102 MRFs, see Markov random fields Multiresolution codebooks, 202 Multiresolution pyramids, joint space-frequency representation, 13-16 Multishell lattice codebooks, 219
N Neighborhood system, 105 Neighbor interaction function, 109 Neural networks Daugman’s neural network, 31,52 generalized Boltzmann machine (GBM), 150-151 optimization, 88-89
Neuron, receptive field (RF), 40-41 Noise removal deblurring, 155-159 image enhancement, 56-58 Noise shaping, 198 0
Optics accelerators, see Accelerators, optics charged particles, see Charged-particle wave optics Optimization, neural networks, 88-89 Orthogonality,image representation, 6-7,11, 13,22 Orthogonal wavelets, 13-16 Overlapped block matching algorithm, 235, 237-252
P Parallelism, human visual system, 39 Partition priority coding (PPC), 201 Physical information, erratum and addendum, 409-411 Posterior density, 100-101 Postprocessing, digital coding, 192-193 Predictive coding, 51, 193 Preprocessing, digital coding, 192-193 Primal sketch, 64 Prior density, 88 Prior information, 100 Probability density function, states of information, 98-101 Propagation, charged-particle beam scalar theory, 279-282 spinor theory, 330-332 Psychophysics, vision modeling, 35,39,40,43
Q Quadrature pyramid, joint space-frequency representations, 19, 20 Quantization defined, 51, 193 image compression, 5 1 scalar quantization, 193, 194 finite state scalar quantization, 203 wavelets, 200-201, 202-203
419
INDEX successive approximation quantization convergence, 211-214 orientation codebook, 214-216 scalar case, 205-206 vectors, 207-211 successive approximation wavelet lattice vector quantization (SA-W-LVQ), 191-194, 252 coding algorithm, 220-226 image coding, 226-232 theory, 193-220 video coding, 232-252 vector quantization, 51, 193 wavelets, 201-202,203 Quantum theory charged particle wave optics, 257-259, 336-339 aberrations, 311-312 scalar theory, 259-322 spinor theory, 258, 322-336 Quasicomplete Gabor transform, 34-37
R Receptive field (RF) Gabor function, 41-43 neuron, 40-41 Reconstruction, see Image reconstruction Redundancy, temporal redundancy, 235 Redundancy removal, 192, 193 Regularization, 87-89 Bayesian approach, 87-88, 98-104 discontinuities, 89-91, 108-1 18 duality theorem, 115-118 dual theorem, 91 explicit treatment, 110-115, 154-166 implicit treatment, 108-110, 166-181 line continuation constraint, 130-141, 142 edge-preserving algorithms, 91-93, 118-129 extended GNC algorithm, 132-136, 171- 175 GEM algorithm, 93, 127-129, 153, 162- 166 GNC algorithm, 90, 91, 93, 124-127, 153, 168-175 edge-preserving regularization, 93-94 Markov random fields, 90,105, 106-108 theory, 104-118 hyperparameters, 141-143
inverse problem, 96-98, 99-101 Gaussian case, 103-104 optimal estimators based on cost functions, 101-103 posterior density, 100-101 prior information, 100 states of information, 98-101 Regularization parameter, 96, 143-146 Regular lattices, 216 Risk for estimation, 144 Risk for protection, 144 S
Sampling vision modeling, 3-4, 45-50 visual cortex, 45-50 SA-W-LVQ, see Successive approximation wavelet lattice vector quantization Scalar quantization, 193, 194 finite state scalar quantization, 203 wavelets, 200-201, 202-203 Scalar theory charged-particle wave optics, 316-317 axially symmetric electrostatic lenses, 320-321 axially symmetric magnetic lenses, 282-316 electrostatic quadrupole lenses, 321-322 free propagation, 279-282 general formalism, 259-279 magnetic quadrupole lenses, 317-320 Signal processing, 192 compression, 51, 192-194 digital coding, 192-194 Gabor functions, 5 theory, 2-3 Signal redundancy, 192 Signal uncertainty, 10 Simulated annealing minimization algorithm, 120-122,155 Single-shell lattice codebooks, 218 Smoothness, image processing, 89,97 Spatial sampling, visual cortex, 47-50 Spectrogram complex spectrogram, 9-10 discrete spectrogram, 11 reconstructing signal from, 19 Sphere packing, 216 Spin dynamics, 337
420
INDEX
Spinor theory charged-particle wave optics, 258 axially symmetric magnetic lenses, 333-335
free propagation, 330-332 general formalism, 322-330 magnetic quadrupole lenses, 336 States of information, regularization, 98-99 Stereo vision, 74-75 Still image coding, 226-232 Stochastic integration, image regularization, 119
Suboptimal algorithms, 124-127 Successive approximation quantization convergence, 211-214 orientation codebook, 214-216 scalar case, 205-206 vectors, 207-211 Successive approximation wavelet lattice vector quantization (SA-W-LVQ), 191-194, 252
coding algorithm, 220-226 image coding, 226-232 theory, 193-220 successive approximation quantization, 205-220
wavelet transforms, 195-205 video coding, 232-252
T Temporal redundancy, 235 Texture analysis, Gaussian wavelets, 64-68 Three-dimensional image reconstruction, 59-60
Tomography image formation, 59-60 image reconstruction, 159-166 TPM (thresholded posterior means) estimate, 88, 120 Transform coding, 51, 193 Two-dimensionalwavelet transforms, 197
U Ultrahigh-order canonical aberration theory, 360-406 Uncertainty, informational uncertainty, 10
V Vector quantization, 51, 193 successive approximation wavelet vector quantization (SA-W-LVQ), 191-194 coding algorithm, 220-226 image coding, 226-232 theory, 193-220 video coding, 232-252 wavelets, 201-202, 203 Vector wavelet transform, 202 Video coding, low-bit-rate, 232-252 Video signals, 192 Vision modeling Gabor functions, 17, 34-37,41-45 joint representations, 16-19,37-50 receptive field, 40-44 sampling, 3-4,45-50 Visual cortex image representation, 37, 39-45 sampling, 45-50 Visual psychophysics, 35,39,40,43
W Wavelet coefficients scalar quantization, 200-201, 202-203 vector quantization, 201-202, 203 Wavelets edge detection, 63-64 signal and image processing, 3-5, 11-13, 52-53
Wavelet transforms, 12, 52-53, 194 defined, 195 image compression, 52-53, 194, 198-205 theory, 195-197 two-dimensional, 197 Wiper distribution function, 2-3, 9
X X-ray transmission tomography, image reconstruction, 159 Z
Zak transform, Gabor expansion, 29-30 Zero-tree root, 224 Zero-trees, 202-203
This Page Intentionally Left Blank