Academic Press is an imprint of Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands 32 Jamestown Road, London NW1 7BY, UK 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA 525 B Street, Suite 1900, San Diego, CA 92101-4495, USA First edition 2009 c 2009, Elsevier Inc. All rights reserved. Copyright No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email:
[email protected]. Alternatively you can submit your request online by visiting the Elsevier web site at http://www.elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material. Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12-374769-3 ISSN: 1076-5670 For information on all Academic Press publications visit our Web site at elsevierdirect.com Printed in the United States of America 09 10 11 12 10 9 8 7 6 5 4 3 2 1
Contents Preface Contributors Future Contributions
1. Surface Plasmon-Enhanced Photoemission and Electron Acceleration with Ultrashort Laser Pulses
vii ix xi
1
P´eter Dombi 1. Introduction 2. Electron Emission and Photoacceleration in Surface Plasmon Fields 3. Numerical Methods to Model Surface Plasmon-Enhanced Electron Acceleration 4. Experimental Results 5. The Role of the Carrier-Envelope Phase 6. Conclusions Acknowledgments References
2. Did Physics Matter to the Pioneers of Microscopy?
2 3 7 16 21 23 24 24
27
Brian J. Ford 1. Introduction 2. Setting the Scene 3. Traditional Limits of Light Microscopy 4. Origins of the Cell Theory 5. Pioneers of Field Microscopy 6. The Image of the Simple Microscope Acknowledgments References
3. Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
27 28 30 39 58 70 84 85
89
J´erˆ ome Gilles 1. Introduction 2. Preliminaries 3. Structures + Textures Decomposition 4. Structures + Textures + Noise Decomposition 5. Performance Evaluation 6. Conclusion Appendix A. Chambolle’s Nonlinear Projectors References
90 90 102 110 123 128 130 135
v
vi
Contents
4. The Reverse Fuzzy Distance Transform and its Use when Studying the Shape of Macromolecules from Cryo-Electron Tomographic Data
139
Stina Svensson 1. Introduction 2. Preliminaries 3. Segmentation Using Region Growing by Means of the Reverse Fuzzy Distance Transform 4. Cryo-Electron Tomography for Imaging of Individual Macromolecules 5. From Electron Tomographic Structure to a Fuzzy Objects Representation 6. Identifying the Subunits of a Macromolecule 7. Identifying the Core of an Elongated Macromolecule 8. Conclusions Acknowledgments References
5. Anchors of Morphological Operators and Algebraic Openings
140 142 151 153 160 161 165 167 168 168
173
M. Van Droogenbroeck 1. Introduction 2. Morphological Anchors 3. Anchors of Algebraic Openings 4. Conclusions References
173 182 195 199 200
6. Temporal Filtering Technique Using Time Lenses for Optical Transmission 203 Systems Dong Yang, Shiva Kumar, and Hao Wang 1. 2. 3. 4. 5.
Introduction Configuration of a Time-Lens–based Optical Signal Processing System Wavelength Division Demultiplexer Dispersion Compensator Optical Implementation of Orthogonal Frequency-Division Multiplexing Using Time Lenses 6. Conclusions Acknowledgment Appendix A Appendix B Appendix C References
203 206 211 216 219 226 227 227 228 229 231
Contents of Volumes 151–157
233
Index
235
Preface
Ben Kazan, 1982 Before describing the contents of this volume, let me first say a few words about Benjamin Kazan, one of the Honorary Associate Editors of these Advances, whose death on January 14 2009 was mentioned briefly in the preface to volume 157. He was editor of the Academic Press series, Advances in Image Pickup and Display from 1974 to 1983, after which the title was absorbed into Advances in Electronics and Electron Physics (the earlier title of these Advances). Ben Kazan, born in New York in 1917, received his B.S. degree from the California Institute of Technology, Pasadena, in 1938 and his M.A. from Columbia University, New York, in 1940. In 1961, he was awarded the D.Sc. degree by the Technical University of Munich. From 1940 to 1950, he was Section Head at the Signal Corps Engineering Laboratories, working on the development of new microwave storage and display tubes. For the next eight years, he was engaged in work on colour television tubes and solidstate intensifiers at the RCA Research Laboratories. From 1958 to 1962, he was head of the Solid-state Display Group at Hughes Research Laboratories, after which he moved to Electro-Optical Systems, an affiliate of the Xerox Corporation, again working on solid-state and electro-optical systems. From 1968–1974, he was employed at the IBM Thomas J. Watson Research Center. His last position was head of the Display Group at the Palo Alto Research Center of the Xerox Corporation. A dinner was held in his honour at Xerox, as the person holding the most patents at Xerox. In addition to his editorship of Advances in Image Pickup and Display, he was co-author of two books (notably, Electronic Image Storage with M. Knoll,
vii
viii
Preface
Academic Press, New York 1968) and was also editor of the Proceedings of the Society for Information Display. He was a Fellow of this Society as well as a member of the American Physical Society. In his leisure hours, he played the violin and enjoyed books about music and medical topics, biographies and many other subjects. He was man of great kindness and generosity and will be greatly missed by his family and friends. On behalf of the publishers and myself, we extend our sincerest condolences to Gerda Mosse-Kazan, his widow. The present volume contains six chapters on very different subjects, ranging from the early history of the microscope to mathematical morphology, time lenses, fuzzy sets and electron acceleration. We begin with a study of surface-plasmon-enhanced photoemission and electron acceleration using ultrashort laser pulses by P. Dombi. This is a very young subject and P. Dombi explains in detail what is involved and the physics of these complicated processes. This is followed by a fascinating article on the development of (light) microscopy by B.J. Ford, with the provocative title ‘Did physics matter to the pioneers of microscopy?’ He has chosen to work back to Hooke and van Leeuwenhoek, starting with the microscopes we know today. I do not need to do more than urge all readers of these Advances to plunge into this chapter, which is truly ‘unputdownable’! How can an image be decomposed into its various structural and textural components? This is the subject of the chapter by J. Gilles, who provides a very lucid account of recent progress in this area. The mathematical preliminaries, which cover all the newer kinds of wavelets – ridgelets, curvelets and contourlets – form an essential basis on which the remainder reposes. The fourth chapter, by S. Svensson, brings together two different topics: fuzzy distance transforms and electron tomography. Once again, the opening sections provide a solid mathematical basis for the application envisaged and I am certain that this full introductory account to these techniques will be heavily used. The next chapter will appeal to mathematical morphologists: here, M. van Droogenbroeck describes the notion of anchors of morphological operators and algebraic openings. This concept is placed in context and the chapter forms a self-contained account of this particular aspect of mathematical morphology. The volume ends with another new subject, time lenses for optical transmission systems, by D. Yang, S. Kumar and H. Wang. Spatial imaging has a perfect analogy in the time domain and this is exploited for temporal filtering. The authors introduce us to the subject before going more deeply into the possible ways of pursuing this analogy. As always, I thank the authors for all the trouble they have taken to make their work accessible to a wide readership. Peter W. Hawkes
Contributors
P´eter Dombi Research Institute for Solid-State Physics and Optics, Budapest, ´ Hungary Konkoly-Thege M. ut,
1
Brian J. Ford Gonville & Caius College, University of Cambridge, UK
27
J´erˆ ome Gilles ˆ d’Or, DGA/CEP - EORD Department, 16bis rue Prieur de la Cote Arcueil, France
89
Stina Svensson Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden
139
M. Van Droogenbroeck University of Li`ege, Department of Electrical Engineering and Computer Science, Montefiore, Sart Tilman, Li`ege, Belgium
173
Dong Yang, Shiva Kumar, and Hao Wang Department of Electrical and Computer Engineering, McMaster University, Canada
203
ix
Future Contributions
S. Ando Gradient operators and edge and corner detection K. Asakura Energy-filtering x-ray PEEM W. Bacsa Optical interference near surfaces, sub-wavelength microscopy and spectroscopic sensors C. Beeli Structure and microscopy of quasicrystals C. Bobisch and R. M¨ oller Ballistic electron microscopy G. Borgefors Distance transforms Z. Bouchal Non-diffracting optical beams A. Buchau Boundary element or integral equation methods for static and timedependent problems B. Buchberger ¨ Grobner bases E. Cosgriff, P. D. Nellist, L. J. Allen, A. J. d’Alfonso, S. D. Findlay, and A. I. Kirkland Three-dimensional imaging using aberration-corrected scanning confocal electron microscopy T. Cremer Neutron microscopy A. V. Crewe (Special volume on STEM, 159) Early STEM A. Engel (Special volume on STEM, 159) STEM in the life sciences A. N. Evans Area morphology scale-spaces for colour images xi
xii
Future Contributions
A. X. Falc˜ao The image foresting transform R. G. Forbes Liquid metal ion sources C. Fredembach Eigenregions for image classification J. Giesen, Z. Baranczuk, K. Simon, and P. Zolliker Gamut mapping A. G¨ olzh¨auser Recent advances in electron holography with point sources M. Haschke Micro-XRF excitation in the scanning electron microscope P. W. Hawkes (Special volume on STEM, 159) The Siemens and AEI STEMs L. Hermi, M. A. Khabou, and M. B. H. Rhouma Shape recognition based on eigenvalues of the Laplacian M. I. Herrera The development of electron microscopy in Spain H. Inada and H. Kakibayashi (Special volume on STEM, 159) Development of cold field-emission STEM at Hitachi M. S. Isaacson (Special volume on STEM, 159) Early STEM development J. Isenberg Imaging IR-techniques for the characterization of solar cells K. Ishizuka Contrast transfer and crystal images A. Jacobo Intracavity type II second-harmonic generation for image processing B. Jouffrey (Special volume on STEM, 159) The Toulouse high-voltage STEM project L. Kipp Photon sieves G. K¨ ogel Positron microscopy T. Kohashi Spin-polarized scanning electron microscopy
Future Contributions
xiii
O. L. Krivanek (Special volume on STEM, 159) Aberration-corrected STEM R. Leitgeb Fourier domain and time domain optical coherence tomography B. Lencov´a Modern developments in electron optical calculations H. Lichte New developments in electron holography M. Mankos High-throughput LEEM M. Matsuya Calculation of aberration coefficients using Lie algebra S. McVitie Microscopy of magnetic specimens J. Mendiola Santiba˜ nez, I. R. Terol-Villalobos, and I. M. Santill´an-M´endez (Vol. 160) Connected morphological contrast mappings I. Moreno Soriano and C. Ferreira Fractional Fourier transforms and geometrical optics M. A. O’Keefe Electron image simulation D. Oulton and H. Owens Colorimetric imaging N. Papamarkos and A. Kesidis The inverse Hough transform K. S. Pedersen, A. Lee, and M. Nielsen The scale-space properties of natural images E. Rau Energy analysers for electron microscopes P. Rudenberg (Vol. 160) ¨ The work of R. Rudenberg R. Shimizu, T. Ikuta, and Y. Takai Defocus image modulation processing in real time S. Shirai CRT gun design methods A. S. Skapin and P. Ropret (Vol. 160) The use of optical and scanning electron microscopy in the study of ancient pigments
xiv
Future Contributions
K. C. A. Smith (Special volume on STEM, 159) STEM in Cambridge T. Soma Focus-deflection systems and their applications P. Sussner and M. E. Valle Fuzzy morphological associative memories L. Swanson and G. A. Schwind (Special volume on STEM, 159) Cold field-emission sources I. Talmon Study of complex fluids by transmission electron microscopy M. E. Testorf and M. Fiddy Imaging from scattered electromagnetic fields, investigations into an unsolved problem N. M. Towghi I p norm optimal filters E. Twerdowski Defocused acoustic transmission microscopy Y. Uchikawa Electron gun optics K. Vaeth and G. Rajeswaran Organic light-emitting arrays V. Velisavljevic and M. Vetterli (Vol. 160) Space-frequence quantization using directionlets S. von Harrach (Special volume on STEM, 159) STEM development at Vacuum Generators, the later years J. Wall, M. N. Simon, and J. F. Hainfeld (Special volume on STEM, 159) History of the STEM at Brookhaven National Laboratory I. R. Wardell and P. Bovey (Special volume on STEM, 159) STEM development at Vacuum Generators, the early years M. H. F. Wilkinson and G. Ouzounis Second generation connectivity and attribute filters P. Ye Harmonic holography
Chapter
1 Surface Plasmon-Enhanced Photoemission and Electron Acceleration with Ultrashort Laser Pulses P´eter Dombi
Contents
1. Introduction 2. Electron Emission and Photoacceleration in Surface Plasmon Fields 2.1. Emission Mechanisms 2.2. Emission Currents 2.3. Electron Acceleration in Evanescent Surface Plasmon Fields 3. Numerical Methods to Model Surface Plasmon-Enhanced Electron Acceleration 3.1. Elements of the Model 3.2. Model Results 4. Experimental Results 4.1. Surface Plasmon-Enhanced Photoemission 4.2. Generation of High-Energy Electrons 4.3. Time-Resolved Studies of the Emission 5. The Role of the Carrier-Envelope Phase 5.1. Light-Matter Interaction with Few-Cycle Laser Pulses, CarrierEnvelope Phase Dependence 5.2. Carrier-Envelope Phase-Controlled Electron Acceleration 6. Conclusions Acknowledgments References
2 3 3 5 7 7 7 11 16 16 18 19 21 21 22 23 24 24
´ Hungary Research Institute for Solid-State Physics and Optics, Budapest, Konkoly-Thege M. ut, Advances in Imaging and Electron Physics, Volume 158, ISSN 1076-5670, DOI: 10.1016/S1076-5670(09)00006-8. c 2009 Elsevier Inc. All rights reserved. Copyright
1
2
P´eter Dombi
1. INTRODUCTION It was shown recently that ultrashort, intense laser pulses are particularly well suited for the generation of electron and other charged particle beams both in the relativistic and the nonrelativistic intensity regimes of laser-solid interactions (Irvine, Dechant, & Elezzabi, 2004; Leemans et al., 2006, and references therein). One method to generate well-behaved, optically accelerated electron beams with relatively low-intensity light pulses is surface plasmon polariton (SPP)-enhanced electron acceleration. Due to the intrinsic phenomenon of the enhancement of the SPP field (with respect to the field of the SPP-generating laser pulse), substantial field strength can be created in the vicinity of metal surfaces with simple, high-repetition-rate, unamplified laser sources. This results in both SPPenhanced electron photoemission and electron acceleration in the SPP field. SPP-enhanced photoemission was demonstrated in several experimental publications. Typical photocurrent enhancement values ranged from ×50 to ×3500 achieved solely by SPP excitation (Tsang, Srinivasan-Rao, & Fischer, 1991). In addition to SPP-enhanced photoemission, the electrons in the vicinity of the metal surface can undergo significant cycle-by-cycle acceleration in the evanescent plasmonic field. This phenomenon, termed SPP-enhanced electron acceleration, was discovered recently and was experimentally demonstrated to be suitable for the production of relatively high-energy, quasi-monoenergetic electron beams with the usage of simple femtosecond lasers (Irvine et al., 2004; Kupersztych, Monchicourt, & Raynaud, 2001; Zawadzka, Jaroszynski, Carey, & Wynne, 2001). In this scheme, the evanescent electric field of SPPs accelerates photo-emitted electrons away from the surface. This process can be so efficient that multi-keV kinetic energy levels can be reached without external direct current (DC) fields (Irvine and Elezzabi, 2005; Irvine et al., 2004). This method seems particularly advantageous for the generation of well-behaved femtosecond electron beams that can later be used for infrared pump/electron probe methods, such as ultrafast electron diffraction or microscopy (Lobastov, Srinivasan, & Zewail, 2005; Siwick, Dwyer, Jordan, & Miller, 2003). These time-resolved methods using electron beams can gain importance in the future by enabling both high spatial and high temporal resolution material characterization at the same time. They will become particularly interesting if the attosecond temporal resolution domain becomes within reach with electron diffraction and microscopy methods, as suggested recently (Fill, Veisz, Apolonski, & Krausz, 2006; Stockman, Kling, Krausz, & Kleineberg, 2007; Varro´ and Farkas, 2008). Moreover, studying the spectral properties of femtosecond electron beams has the potential to reveal ultrafast excitation dynamics in solids and to provide the basis for a single-shot measurement tool of the carrier-envelope (CE) phase (or the optical waveform) of ultrashort laser
Surface Plasmon-Enhanced Electron Acceleration with Ultrashort Laser Pulses
3
pulses, as we suggested recently (Dombi and R´acz, 2008a; Irvine, Dombi, Farkas, & Elezzabi, 2006). Other waveform-sensitive laser-solid interactions that have already been demonstrated (Apolonski et al., 2004; Dombi et al., ¨ 2004; Fortier et al., 2004; Mucke et al., 2004) suffer from low experimental contrast; therefore, it is necessary to look for higher-contrast tools for direct phase measurement. Motivated by these possibilities, it was shown numerically (and also partly experimentally) that surface plasmonic electron sources can be ideally controlled with ultrashort laser pulses so that they deliver highly directional, monoenergetic electron beams readily synchronized with the pump pulse (Dombi and R´acz, 2008a; Irvine et al., 2004, 2006). We developed a simple semiclassical approach for the simulation of this process analogous to the three-step model of high harmonic generation (Corkum, 1993; Kulander, Schafer, & Krause, 1993). In this chapter, we review the basic elements of this model and prove that it delivers the same results as a much more complicated treatment of the problem based on the rigorous, but computationally time-consuming, solution of Maxwell’s equations. Results gained with this latter method showed very good agreement with experimental electron spectra (Irvine, 2006). We also provide new insight into the spatiotemporal dynamics of SPP-enhanced electron acceleration, which is also important if one intends to realize adaptive emission control methods (Aeschlimann et al., 2007).
2. ELECTRON EMISSION AND PHOTOACCELERATION IN SURFACE PLASMON FIELDS 2.1. Emission Mechanisms Laser-induced electron emission processes of both atoms and solids are determined by the intensity of the exciting laser pulse. At low intensities where the field of the laser pulse is not sufficient to distort the potential significantly, multiphoton-induced processes dominate at visible wavelengths. These nonlinear processes can be described by a perturbative approach in this case. Light-matter interaction is predominantly nonadiabatic and it is governed by the evolution of the amplitude of the laser field, or, in other words, the intensity envelope of the laser pulse. Tunneling or field emission takes over at higher intensities. This emission regime is determined by the fact that the potential is distorted by the laser field to an extent that it allows tunneling (or, at even higher intensities, above-barrier detachment) of the electron through the modulated potential barrier, the width of which is determined by the laser field strength. The interaction is determined by the instantaneous field strength of the laser pulse; the photocurrent generated in this manner follows the field evolution
4
P´eter Dombi
(a)
Vacuum niveau
~ 5 eV typically Fermi niveau
(b) Vacuum niveau
Fermi niveau
FIGURE 1 Schematic illustration of photo-induced electron emission processes in different laser-intensity regimes when the work function of the metal is more than twice the photon energy (typical for most metals and for near-infrared wavelengths). (a) Multiphoton-induced photoemission. (b) Tunneling or field emission through the potential barrier the width of which is modulated by the laser field.
adiabatically. This interaction type is also referred to as the strong-field regime of nonlinear optics. The difference between multiphoton-induced and field emission is illustrated in Figure 1. There are, of course, intermediate intensities where the contribution of multiphoton and field emission processes can become comparable. This case is termed as non-adiabatic tunnel ionization and its theoretical treatment is considerably more complicated (Yudin and Ivanov, 2001). It should be mentioned that at significantly higher intensities characteristically different plasma and relativistic effects can also contribute to the light-matter interaction process. This regime, however, is not discussed here. It follows from simple considerations that the average oscillation energy of an electron in the field of an infinite electromagnetic plane wave is
Up =
e2 El2 , 4mω2
(1)
where the electron charge and rest mass are denoted by e and m, respectively, ω is the angular frequency, and the field strength of the laser field is given by El . This quantity is called ponderomotive potential in the literature. The analysis by Keldysh (1965) yielded the perturbation parameter γ , which proved to be an efficient scale parameter to describe bound-free transitions induced by laser fields. Its value is given by
Surface Plasmon-Enhanced Electron Acceleration with Ultrashort Laser Pulses
W γ = = 2U p 2
!2 √ ω 2mW , eEl
5 (2)
where W is the binding energy of the most weakly bound electron in an atom (ionization potential) or the work function of the metal. It can be shown that for the case γ 1, multiphoton-induced processes dominate. On the other hand, the γ 1 condition indicates the dominance of field emission. The intensity corresponding to γ ∼ 1 signifies the transition regime between multiphoton-induced and field emission (Farkas, Chin, Galarneau, ´ & Yergeau, 1983; Toth, Farkas, & Vodopyanov, 1991) and this parameter region is sometimes termed the non-adiabatic tunnel ionization regime (Yudin and Ivanov, 2001). It can also be shown that γ = τt ω holds where τt is ¨ ¨ the Buttiker–Landauer traversal time for tunneling (Buttiker and Landauer, 1982).
2.2. Emission Currents 2.2.1. Multiphoton-Induced Emission As suggested by the previous considerations, the time dependence of the electron emission currents can be described by different formulas in the multiphoton and the field emission cases. During multiphoton-induced emission the energy of n photons is converted into overcoming the work function of the metal and into the kinetic energy of the freed electron: nh¯ ω = E kin + W . In this case, the probability of the electron generation is proportional to the nth power of the intensity of the laser field: j (t) ∝ I n (t).
(3)
This formula yields a very good approximation of the temporal emission profile, provided that no finite-lifetime intermediate states exist. For example, the full quantum mechanical description of the multiphotoninduced photoemission process yielded a very similar dependence recently ¨ (Lemell, Tong, Krausz, & Burgdorfer, 2003), although with a somewhat asymmetric temporal profile. Thus, it can be seen that in this case it is the momentary amplitude of the field oscillation that determines the emission probability. As a result of formula (3), for example, if we take a Gaussian laser pulse profile, I (t), the electron √ emission curve, j (t), has a full width at half maximum (FWHM) that is n times shorter than the FWHM of the original I (t) curve (Figure 2). 2.2.2. Field or Tunneling Emission The case of field or tunneling emission can be described by more complex equations. Depending on the model used, several tunneling formulas have
6
P´eter Dombi Field envelope
1.0
Three-photon-induced
Photocurrent (arb. u.)
photoemission Field emission
0.8
0.6
0.4
0.2
0.0 –8
–6
–4
–2
0
2
4
6
8
Time (fs)
FIGURE 2 Examples of electron emission temporal profiles for a few-cycle laser pulse with a duration of 3.5- fs (intensity full width at half maximum (FWHM)). The dotted curve depicts the field envelope evolution. The dashed curve is the photocurrent temporal distribution for a three-photon-induced photoemission. The solid curve is the photocurrent profile for tunneling electron emission from the surface, determined by the Fowler–Nordheim equation (see text for further details).
been proposed. The one used most generally for metals both for static and for oscillating laser fields is the so-called Fowler–Nordheim equation (Binh, Garcia, & Purcell, 1996; Hommelhoff, Sortais, Aghajani-Talesh, & Kasevich, 2006), where the electric field dependence of the tunneling current is described by ! √ 8π 2mW 3/2 e3 El (t)2 j (t) ∝ exp − v(w) , 3he |El (t)| 8π hW t 2 (w)
(4)
where El (t) denotes the laser field strength, e and m the electron charge and mass, respectively, and h is the Planck constant. W stands for the work function of the metal. v(w) is a slowly varying function taking into account the image force of the tunneling electron with 0.4 < v(w) < 0.8, and the value of the function t (w) can be taken as t (w) ≈ 1 for tunneling emission √ with w = e3/2 El /4πε0 /W. The characteristic form of the j (t) curve for this case is shown in Figure 2. The electron emission is concentrated mainly in the vicinity of those instants when the field strength reaches its maximum value. Note that the experimental investigation of pure field emission is very limited for metals (at visible wavelengths) since the damage threshold of bulk metal surfaces and thin films lies around an intensity of 1013 W/cm2 , which is very close to the intensity value where the γ ∼ 1 condition is met. A practical approach
Surface Plasmon-Enhanced Electron Acceleration with Ultrashort Laser Pulses
7
to circumnavigate this problem is needed, to be able to investigate these processes experimentally. The exploitation of far-infrared sources proved suitable for this purpose where the γ ∼ 1 condition can be met at much lower intensities (Farkas et al., 1983). In addition, plasmonic field enhancement can be exploited in the visible spectral region so that γ 1 can be achieved for metal films without damaging the surface. This latter method is also more advantageous due to the lack of ultrashort laser sources in the far-infrared domain. The phenomenon of plasmonic field enhancement is described in detail in the next section.
2.3. Electron Acceleration in Evanescent Surface Plasmon Fields After photoemission had taken place from the metal surface, the electrons travel in vacuum dressed by the SPP field. This situation can be approximated by solving the classical equations of motion for the electron in the electromagnetic field of the surface plasmons. This concept is somewhat similar to the three-step model of high harmonic generation on atoms where the electron is considered as a free particle after tunneling photoinonization had taken place induced by the electric field of the laser pulse (Corkum, 1993; Kulander et al., 1993). We adapted a model similar to the SPP environment where instead of a single atom, a solid surface is involved that determines the conditions for recollision. Because of the presence of the surface, many electrons recollide or cannot even accelerate because the Lorentz force points toward the surface at the instant of photoemission, or, in other words, at the instant of the “birth” of the electron in vacuum. This latter situation is also modeled by recombination; therefore, these electrons must be disregarded when the properties of the electron bunch are determined. The rest of the electrons experience cycle-by-cycle kinetic energy gain and become accelerated along the electric field gradient. This mechanism is the same if the envelope of the laser pulse is made up of only few optical cycles; however, the final kinetic energy will not be composed of a large number of incremental, cycle-by-cycle kinetic energy gain portions as in the case of long pulses. Due to the reduced time the electrons spend in the field of the fewcycle SPPs, however, the expected final kinetic energy will be lower. These intuitive predictions are confirmed numerically in the upcoming sections.
3. NUMERICAL METHODS TO MODEL SURFACE PLASMON-ENHANCED ELECTRON ACCELERATION 3.1. Elements of the Model As discussed previously, SPP-enhanced electron acceleration involves distinct physical processes such as (i) the coupling of the incident light and surface plasmonic electromagnetic fields, (ii) the photoinjection of the electrons into vacuum from the metal layer, and (iii) the subsequent
8
P´eter Dombi
acceleration of free electrons by the decaying SPP field on the vacuum side of the surface. The elements of the model that we used correspond to these individual steps of the process; therefore, they are presented in separate sections below. 3.1.1. Solution of the Field In order to determine SPP fields accurately, Maxwell’s equations can be solved with the so-called finite difference time-domain (FDTD) method. This approach was used for the Kretschmann–Raether SPP coupling configuration in previous studies (Irvine and Elezzabi, 2006; Irvine et al., 2004). In this case, the components of the electric field, the electric displacement, and the magnetic intensity vectors are solved for a grid placed upon the given geometry. Since the FDTD method provides the complete numerical solution of Maxwell’s equations, it is computationally rather intensive and more complex geometries cannot be handled with simple personal computers due to the increased processor times required. Therefore, we proposed analytic formulas to describe SPP fields (Dombi and R´acz, 2008a). Based on the well-known fact that these fields decay exponentially by moving away from the surface (Raether, 1988), we took an analytic expression for the SPP field components on the vacuum side of the metal layer in the form of E ySPP (x, y, t) = ηE 0 E env (x, t) cos (kSPP x − ωt + ϕ0 ) exp(−αy) (5a) π SPP E x (x, y, t) = ηa E 0 E env (x, t) cos kSPP x − ωt − + ϕ0 exp(−αy), (5b) 2 where E 0 is the field amplitude, E env (x, t) is an envelope function determined by the temporal and spatial beam profiles of the incoming Gaussian pulse, η is the field enhancement factor resulting from plasmon coupling (Raether, 1988), kSPP is the SPP wave vector, ω is the carrier frequency, ϕ0 is the CE phase of the laser pulse, and α is the decay length of the plasmonic field in vacuum given by s α −1 =
ω2 2 − kSPP c2
(6)
(Irvine and Elezzabi, 2006). For laser pulses with a central wavelength of 800 nm, the evanescent decay parameter α = 247 nm−1 follows from Eq. (6). We used the value of a = 0.3 according to the notion that the amplitudes of the x- and y-components of the plasmonic field have this ratio according to the numerical solution of Maxwell’s equations (Irvine and Elezzabi, 2006). It can be concluded that the field given by Eqs. (5a) and (5b) approximates the exact SPP field with very good accuracy by comparing our results to
9
Surface Plasmon-Enhanced Electron Acceleration with Ultrashort Laser Pulses 0.5 0.4
x
y (micron)
y
0.3 0.2 0.1 0 –0.5
–0.25
0 x (micron)
0.25
0.5
FIGURE 3 Illustration of the setup for the generation of electron beams by surface plasmon- enhanced electron acceleration with the distribution of the electric field amplitude on the vacuum side of the surface, field vectors (inset) and electron trajectories. For further details, see text. (Source: Dombi and R´acz (2008a).)
those of Irvine and Elezzabi (2006). The distribution of the field amplitude in the vicinity of the surface is shown in Figure 3 which shows very good agreement with the above-mentioned calculation. We also succeeded in reproducing the vector representation of the field depicted in Figure 3 of Irvine and Elezzabi (2006) with this method. The representation of the vector field that can be calculated with our model is depicted in the inset of Figure 3. 3.1.2. Electron Emission Channels and Currents Induced by Plasmonic Fields After the determination of the field, a point array can be placed along the prism surface and the spatial and temporal distribution of the photoemission (induced by the SPP field) along the surface can be examined, assuming that field emission takes place at higher intensities. To this end, we applied the Fowler–Nordheim equation routinely used in studies involving electron emission from metal nanotips (Hommelhoff, Kealhofer, & Kasevich, 2006; Hommelhoff, Sortais et al., 2006; Ropers, Solli, Schulz, Lienau, & Elsaesser, 2007). This describes the instantaneous tunneling current based on the fact that plasmonic fields carry substantial field enhancement factors (up to ×100) compared to the generating field. One can gain a spatially and temporally resolved map of tunneling probabilities determined by the SPP field this way. The temporal distribution, for example, can be seen in Figure 2. Similar probability distribution curves also result for the spatial coordinates. According to these probabilities, each photoemitted and SPPaccelerated electron that is examined can be assigned a corresponding weight. This weight must be used to accurately determine the final kinetic energy spectrum of the electron beam.
10
P´eter Dombi
(a)
τ0 = 5 fs
(b)
τ0 = 30 fs
40
12
35
10
30 25
8
t (fs)
6
20 15
4
10
–40 –20 x (n m)
5 0 0
)
–150
50
–100 –50
x (n
m)
0 0
nm
–60
200 150 100
5 0
y(
0
nm )
25 20 15 10
2
y(
t (fs)
14
FIGURE 4 Two selected electron trajectories for a 5 fs-long SPP exciting laser pulse (a) and a 30 fs-long pulse (b) illustrating the difference between the few-cycle and the multicycle case. The central wavelength of the laser pulse is 800 nm in both cases. (Courtesy of P. R´acz.)
3.1.3. Particle Acceleration in the Evanescent Field As a final step in the numerical model, each vacuum electron trajectory of photoemitted electrons in the plasmonic field is investigated for each point in the above-mentioned array and for several emission instants. This is done by solving free-electron equations of motions numerically in the SPP field given by Eqs. (5a) and (5b). Some representative trajectories are shown in Figure 3 (gray curves). Two selected trajectories for 5-fs long exciting pulses (FWHM) as well as for 30-fs long pulses are depicted in Figure 4, illustrating the difference between the acceleration process in the few-cycle and in the multicycle case. In some cases, the electron trajectories involve a recollision with the metal surface; when this happens, no electron emission is assumed. In all other cases, the final kinetic energies and directions of the photoemitted and photoaccelerated electrons are placed in a matrix for each emission point in space and for each emission instant. Figure 5 illustrates the temporal distribution of the final kinetic energies as a function of the electron “birth” instant for a maximum plasmonic field strength of 5.8 × 1010 V/m and for electrons emitted from the central spot of the illuminated surface in case of a 5-fs long exciting pulse with 800-nm central wavelength. Figure 5 demonstrates similarities to the corresponding kinetic energy distributions of atomic electrons after being accelerated by the ionizing laser field (Reider, 2004). As opposed to that case, it is important to note here that only roughly one-fourth of all emission instants contribute to the acceleration process. This is due to the symmetry breaking of the metal surface and the associated electron recollision and reabsorption processes, as discussed in Section 2.3. Macroscopic emission distributions and electron spectra can be calculated
Surface Plasmon-Enhanced Electron Acceleration with Ultrashort Laser Pulses
11
Kinetic energy (eV)
40
0
–40
–5
0 Time (fs)
5
FIGURE 5 Surface plasmon-accelerated electron energy as a function of the birth instant of the electrons (scatterplots). The electric field of the plasmon generating 5-fs laser pulse (illustrated with solid and dashed lines) has either a ‘‘cosine’’ (dashed) or ‘‘minus cosine’’ waveform (solid) under the same envelope. The corresponding electron energies for the cosine waveform are depicted as circles, whereas for the minus cosine waveform as squares. See text for further pulse parameters.
after the assessment of each trajectory by integrating the above-described emission maps along the spatial and/or temporal coordinates.
3.2. Model Results 3.2.1. Electron Acceleration with Multiphoton-Induced Emission We checked first whether the modeling results reproduce former measurement and simulation spectra (published in Irvine et al. (2004, 2006), Irvine and Elezzabi (2006)) to gain confidence in our simplifed 3-step model. To this end, we carried out simulations for the same parameters as those published in these papers. Athough for the time being we assume multiphoton-induced electron emission for these simulations (as previously used in these references), we must mention that it does not necessarily hold for higher intensities. However, our purpose in this case was to reproduce former results; therefore, the spatiotemporal distribution of photoemission was described by j (t, x) ∼ I n (t, x), according to Eq. (3). n = 3 is used here according to the 4. . . 5 eV work function of most metal surfaces and films and the 1.5 eV photon energy at 800 nm. Figure 6a depicts macroscopic electron spectra gained with our model for peak plasmonic fields of 1.9 × 1011 V/m, 2.7 × 1011 V/m, and 3.7 × 1011 V/m, respectively (the FWHM duration of the input Gaussian laser pulse was 30 fs with a central wavelength of 800 nm). Thereby, this figure can be directly compared with the results in Irvine and Elezzabi (2006), (see Figure 6b). The characteristics
12
P´eter Dombi
(a)
1.9 x109 V/cm 2.7
x109
(b) 1.2
V/cm
1.9 x109 V/cm
1.0
2.7 x109 V/cm
3.7 x109 V/cm
0.8
Counts (a.u.)
Electron counts (a.u.)
1.0
0.6 0.4 0.2
3.7 x109 V/cm
0.8 0.6 0.4 0.2
0.0 0
1000 2000 3000 Kinetic energy (eV)
4000
0.0 0.0
1.0
2.0 3.0 4.0 5.0 Kinetic energy (keV)
6.0
FIGURE 6 (a) Macroscopic electron spectra at peak plasmonic fields of 1.9 × 1011 V/m (solid line), 2.7 × 1011 V/m (dashed line), and 3.7 × 1011 V/m (dotted line) for a Gaussian input laser pulse of 30-fs FWHM duration with a central wavelength of 800 nm. The model used was based on the simplified SPP field description given by Eqs. (5a)–(5b). (b) Electron spectra for the same input parameters with the field calculated with a full FDTD-based simulation. (Source of (b): Irvine and Elezzabi (2006).)
of the electron spectra are very well reproduced, as well as the linear scaling of the kinetic energies of the most energetic electrons with intensity. Slight differences in the peak and cutoff positions can be attributed to the approximate nature of the SPP field expression [Eqs. (5a) and (5b)] used in our case in contrast to the more accurate numerical field solution used by Irvine and Elezzabi (2006). In another comparative simulation we changed the input pulse length to 5 fs FWHM, and assumed that this pulse is focused to a spot with 60-µm diameter on the prism surface. The field peak was 1.9 × 1011 V/m. Figure 7 shows that the spectrum of the electron beam gained with this approach reproduces the spectrum computed with other methods, such as the one in Irvine and Elezzabi (2006) (depicted with a dashed curve in Figure 7). Slight differences in the cutoff positions can still be observed; however, all spectral features and the position of the main peak are exactly the same. Thus, the applicability of analytic field expressions [Eqs. (5a) and (5b)] and the robustness of our approach are confirmed by these examples. 3.2.2. Electron Acceleration with Field Emission We now turn our attention to modeling electron spectra by assuming field emission from the metal surface, which is a more realistic assumption for higher-intensity input beams, approaching the damage threshold of thin metal films. The experimental motivation of this study is driven by the fact that high-repetition-rate, ultrafast laser output delivering focused intensity in this range is achievable with simple titanium:sapphire oscillators with an extended cavity, as we demonstrated recently (Dombi and Antal, 2007;
Surface Plasmon-Enhanced Electron Acceleration with Ultrashort Laser Pulses
13
Electron counts (a.u.)
1.0 0.8 0.6 0.4 0.2 0.0 0
200
400 Kinetic energy (eV)
600
FIGURE 7 Electron spectrum for a 5-fs generating pulse with a peak plasmonic field strength of 1.9 × 1011 V/m, assuming multiphoton photoemission calculated with simplified numerical methods (solid curve) and electron spectrum for the same input parameters with the electric field calculated with a full FDTD-based simulation (source for dashed curve: Irvine and Elezzabi (2006)). See text for further details.
¨ Dombi, Antal, Fekete, Szipocs, & V´arallyay, 2007; Naumov et al., 2005). We then used the Fowler–Nordheim formula, as given by Eq. (4) and resolved the photoaccelerated electron beam both angularly and spectrally, assuming a maximum input field of 5.8×1010 V/m, which is a rather realistic maximum value considering the damage threshold of gold and silver films. We also assumed a tunneling time of 600 attoseconds which, in our model, describes the delay between the actual distortion of the potential by the field and the corresponding appearance of the electron in the continuum. Several emission maps are presented in the following text, using realistic parameters to reveal the fine structure of the acceleration process and to search conclusions about macroscopically observable properties of the electron beams generated. We examined the final kinetic energy distribution of SPP-accelerated electrons along the plasmon propagation direction (xaxis, representing emission locations along the surface) for a few-cycle interacting pulse with a Gaussian pulse shape, 15-fs and 5-fs intensity FWHM, ϕ0 = 0 CE phase (which means that envelope and field maxima coincide). The central wavelength was 800 nm. The pulse was assumed to be focussed on a spot with a diameter of 4 µm on the prism surface so that a peak plasmon field strength of 5.8 × 108 V/cm (Keldysh-gamma of 0.31) was reached. With this effective intensity value we have already taken into account that substantial field enhancement factors (up to ×100) can be achieved with respect to the SPP generating field. The spatial and spectral distribution of the emitted electrons along the plasmon propagation direction was calculated with these simulation
14
P´eter Dombi
FIGURE 8 Normalized photoacceleration maps (kinetic energy distribution of electrons emitted at different points of the surface: (a), (d) and (g), in grayscale representation); angular and kinetic energy distribution ((b), (e) and (h)); and macroscopic electron spectra ((c), (f), and (i)) of surface plasmon-accelerated electrons for three example parameter sets. Panels (a)–(c) are for 15-fs and (d)–(i) are for 5-fs laser pulses. In panels (g)–(i) we restricted the emission to a spot with 300-nm radius, as illustrated in (g). We modeled a nanolocalized emission region with this approach. See text for further details. (Source: Dombi and R´acz (2008a).)
parameters (in false color representation in Figures 8a and d) for two different pulse lengths to illustrate few-cycle effects. Whereas in the multicycle regime (15-fs pulse length) in Figure 8a a much more structured distribution can be observed, in Figure 8d (5-fs pulse length) the emission is concentrated primarily at a single structure on the emission map providing a better-behaved electron beam. It can also be seen that the emission of highenergy electrons is localized to the center of the illuminated spot and that the number of distinct structures on the emission maps roughly correspond to the number of optical cycles in the generating pulse. This is because the “birth” interval of those electrons in the continuum that can leave the vicinity of the surface is limited to about one-fourth of every laser cycle. This is due to the breaking of the symmetry by the surface such that positive and negative
Surface Plasmon-Enhanced Electron Acceleration with Ultrashort Laser Pulses
15
half-cycles are not identical from this point of view. Every laser cycle has one such favored interval and electrons emitted in each of these intervals spend different amounts of time in the field; hence, they undergo different acceleration. An even more conspicuous property seems crucially important from the point of view of the applications of this electron source. Figure 8b and 8e depict the angular-kinetic energy distributions of the emitted electron beams, showing the direction in which the energetic electrons leave the surface. The emission is confined to a small range of angles supporting a directionally emitted electron beam ideally suited for novel ultrafast techniques. Provided that the pulse length is in the few-cycle range (Figure 8e), the angular emission map is reduced to a single distinct structure corresponding to a highly directional, quasi-monoenergetic electron beam representing the most favorable regime of SPP-enhanced electron acceleration. By integrating any of the distributions along the x-axis we derive the macroscopically observable electron spectra depicted in Figures 8c and f. The spectrum in Figure 8f has a FWHM 1E kin /E kin value of 0.22 (with E kin denoting the electron kinetic energy) corresponding to a quasi-monoenergetic spectrum. The spectral properties of this electron beam can be further enhanced under experimental circumstances by applying a retarding potential to suppress the low-energy wing of the spectrum. The integrated spectra in Figures 8c, f, and i show a significant difference compared with the one in Figure 7a. This can be attributed exclusively to the different emission regimes (multiphoton vs. tunneling) involved. The sharp temporal distribution of the tunneling peaks located at the field maxima favor the emission of electrons at those time instants when they can gain significant kinetic energy. The sharp spectral cutoff is at the same location as the highest-energy electrons are located in the multiphoton case; however, it is primarily these high-energy electrons that are represented in the field emission case; therefore, a sharp peak appears in the spectrum. On the other hand, the low-energy wings of the spectra in Figures 8c and f display a broader feature, making the source less suitable for ultrafast applications. To generate spectra with higher monoenergeticity, we suggest the application of spatial confinement of the emission area on the metal surface. This can be carried out experimentally by various nanofabrication techniques — for example, by depositing a dielectric layer on top of the metal with a nanoscale opening where the dielectric overlayer is absent and the metal surface is exposed to vacuum. Another possibility is roughening a small rectangular area on top of the metal surface, thereby enhancing the emission from that portion of the film. These potential schemes were taken into account in our simulations by selecting only smaller areas of the surface illuminated by the laser beam, and we considered only those photoelectrons that were emitted from this area. Results are shown in Figures 8g–i where the same emission maps and spectra are given as in Figures 8d–f with
16
P´eter Dombi
the only difference that electrons coming only from a 300-nm wide central portion of the surface were considered. By so confining the emission area, the distribution in Figure 8h shows a highly enhanced contrast. This means that even more monoenergetic spectra and even more directional beams can be generated from this spatially confined source. The 1E kin /E kin value of the integrated spectrum can be enhanced by almost an order of magnitude to 0.033 (see Figure 8i). Our results suggest that SP electron acceleration offers a robust and powerful technique for the generation of ultrafast, monoenergetic, highly directional electron beams (Dombi and R´acz, 2008a).
4. EXPERIMENTAL RESULTS 4.1. Surface Plasmon-Enhanced Photoemission It is well known that the efficiency of several light-matter interaction phenomena and applications, such as Raman scattering, plasmonic biosensors (Lal, Link, & Halas, 2007), and references therein), surface harmonic generation (Quail, Rako, Simon, & Deck, 1983; Simon, Mitchell, & Watson, 1974), and other surface physical and chemical processes can be significantly enhanced by the roughness of the metal surface involved. It was recently shown that even high harmonic generation on atoms is possible in the vicinity of tailored, nanostructured metal surfaces with the help of this phenomenon (Kim et al., 2008). It was shown that the common reason for the increased effects in most such cases is mainly the field enhancement and SPP coupling due to the roughness of the metal surface involved. It is known that the incident electromagnetic field can be enhanced by a factor of up to ×100 (Raether, 1988) on a rough surface if SPPs are also coupled. This means an enhancement of 104 in intensity, which corresponds to a 108 enhancement in two-photon photoemission yield according to Eq. (3) in this favorable case. Moreover, even if the surface of a thin metal film is perfectly (atomically) flat, SPP coupling in the Kretschmann–Raether configuration results in a factor of ×3. . . 4 field enhancement alone at the metal-vacuum interface with respect to the field of the incident beam (Raether, 1988). Even this effect means a drastic photoemission yield enhancement for a perturbative nphoton process. Therefore, one of the first examples of newly discovered femtosecond surface plasmon-enhanced phenomena was SPP-induced photoemission from metal surfaces (Tsang, Srinivasan-Rao, & Fischer, 1990). More systematic studies with Au, Ag, Cu, and Al surfaces revealed photoemission yield enhancement factors of ×50 to ×3500, which indicate field enhancement values of ×2. . . ×8 suggesting that the surfaces involved were of relatively good surface quality (Tsang et al., 1991). Figure 9 shows the main results of these experiments. The curves show the intensity dependence of the photoelectron yield on double logarithmic scales. Therefore, the slope of
Surface Plasmon-Enhanced Electron Acceleration with Ultrashort Laser Pulses 101
Ag
100
17
Au SP Sl op
e
2 of
Sl
10–2
Nonresonance
10–3 Nonresonance 10–4 101
Cu
A1 SP 3
100
3
SP
10–3 10–4 7 10
Sl o
pe
10–2
Sl o
pe
of
10–1 of
Peak electron current density (A/cm2)
e op
of
3
SP
10–1
Nonresonance
Nonresonance
108
109 107
108
109
Peak laser power density (W/cm2)
FIGURE 9 The enhancement of SPP-induced multiphoton photoemission yield as a function of the intensity of the incident laser beam for four different surfaces plotted on double logarithmic scales. The slope of each linear fit equals the nonlinearity of the photoemission process. The lower data sets marked as ‘‘nonresonance’’ depict photoelectron yield from the same metal film without SPP coupling but with a similar illumination geometry. The substantial increase of the SPP-enhanced photoelectron yield is clearly illustrated with the upper curves plotted with solid symbols and marked with ‘‘SP’’. (Source: Tsang et al. (1991)).
each linear fit equals the nonlinearity of the photoemission process. In each case, multiphoton-induced emission takes place since there is no deviation from the linear fits. Moreover, the enhancement of the SPP-enhanced photoelectron yield is illustrated compared with nonlinear photoemission induced from the same film without SPP coupling. These first pioneering results paved the way toward SPP-mediated electron acceleration. Later independent experiments confirmed these results (Chen, Boneberg, & Leiderer, 1993; Irvine et al., 2004). The fact that the electron yield is much higher if SPP coupling takes place than the yield at direct surface illumination without SPP coupling underscores a very important feature of SPP-enhanced emission processes. Namely, it can be stated that the SPPs induce the observed photocurrent primarily; therefore, it would be more appropriate to term the multiphotoninduced emission picture in this case as multiplasmon-induced electron emission. Accordingly, it is the enhanced SPP field that distorts the surface
18
P´eter Dombi
potential in the field emission picture and lowers the tunneling barrier. This means that the field emission regime can be reached at much lower laser input intensities and strong-field phenomena can be induced with highrepetition-rate, cost-effective laser oscillators (see, e.g., Dombi and Antal, 2007; Dombi et al., 2007; Naumov et al., 2005).
4.2. Generation of High-Energy Electrons In addition to their enhancement of photoemission yield, SPP fields can also accelerate the electrons that are set free from the surface, thanks to the mechanisms described in Section 2.1. Recently performed spectrally resolved measurements of SPP photoemission delivered the experimental confirmation of this powerful particle acceleration mechanism in evanescent plasmonic fields (Irvine et al., 2004; Kupersztych et al., 2001; Zawadzka, Jaroszynski, Carey, & Wynne, 2000; Zawadzka et al., 2001). The main features of these electron spectra, especially the scaling of cutoff energies resulting from this mechanism, could be explained within the framework of the semiclassical three-step model described in Section 3.1 (Irvine, 2006). To describe these experiments in detail, Zawadska et al. demonstrated SPP-enhanced electron spectra stretching until 400 eV with 40 TW/cm2 focused intensity in the Kretschmann SPP coupling configuration. The pulse length was 100–150 fs in that case (Zawadzka et al., 2000, 2001). Kupersztych et al. also showed this phenomenon with laser pulses that were 60-fs long and reached 8 GW/cm2 focused intensity (Kupersztych et al., 2001). The highest electron energy was ∼40 eV in their experiments. SPPs were coupled on a grating surface, and in contrast to the results of Zawadska et al., they possessed a peak at higher energies. Irvine et al. demonstrated even more conspicuous results in 2004 by accelerating electrons in SPP fields up to 400 eV with a simple titanium:sapphire oscillator delivering merely 1.5-nJ pulse energy. The resulting focused intensity was 1.8 GW/cm2 . Most interestingly, the SPPenhanced electron spectrum became quasi-monoenergetic peaking at 300 eV with a FWHM of 83 eV (Figure 10). The increased enhancement and confined electron emission in the latter experiment can be explained by considering the surface morphology of the silver film. Surface roughness effects alter the spatial distribution of the SPP field on a nanometer scale ( 1.7) and thus offers high image quality. This is the peak of perfection for a simple microscope. Each cell can be clearly observed, and even something of the internal structure can be discerned. This dispels the notion that single lenses produced indistinct images that were afflicted with severe aberration. (See Color Insert.)
Did Physics Matter to the Pioneers of Microscopy?
55
FIGURE 15 In quest of the ultimate performance–Horace Dall’s spinel microscope. The late Horace Dall of Luton, England, constructed exquisitely small simple microscopes. This one, dating from 1950, is his finest. The lens (marked here 400×) is ground from the mineral spinel, which has a refractive index higher than that of soda glass (1.5) and close to that of lead glass (1.7). It also has lower dispersion than glass, and thus offers the best results that a single lens could reasonably be expected to provide. The lens is mounted in a circular holder (left), which screws into the stage (right), and is held firm by means of a concentric spring (center). This fine microscope, since given to the Royal Microscopical Society in Oxford, was used by the author to show the extremes which a simple microscope could attain.
FIGURE 16 The image of a diatom under Brown’s 32.5× lens. The clarity obtainable with a single-lens microscope can be best demonstrated if we take serial magnifications, using different lenses, of the same specimen. In this case, we see a cell of the centric diatom Coscinodiscus. The diatoms secrete shells of silica that are typically perforated by regular arrays of apertures and are thus ideal specimens to act as test objects for microscopes. In this case, we are using the No. 3 lens magnifying 32.5×.
56
Brian J. Ford
FIGURE 17 The image of a diatom under Brown’s 170× lens. As the magnification is increased, we can begin to make out increasingly fine detail within the diatom frustule. This specimen measures 25 µm across, and under the No. 1 lens, magnifying 170×, we can already begin to discern that the darker features seen in Figure 16 are actually circular structures. Some structure is also appearing towards the periphery, where we can now see that the edge of the cell forms a translucent rim.
FIGURE 18 The image of a diatom under the spinel 400× lens. The highest-quality image obtainable with a simple microscope is offered by the spinel lens made by Dall, magnifying 400×, and here the perforations can be clearly observed. Features that seemed to coalesce into one under the No. 1 170× lens used by Robert Brown can now be resolved as discrete structures. We can also distinguish the radial patterning that marks the rim of the frustules. At 25 µm in diameter, this frustule is roughly twice the diameter of a typical cell.
During the nineteenth century, other manufacturers, like Dollond of London, took these concepts to yet greater heights by producing beautifully tooled portable instruments with increasingly high magnifications. These
Did Physics Matter to the Pioneers of Microscopy?
57
FIGURE 19 The image of a diatom under a present-day Leitz microscope. Modern instruments use phase-contrast, differential interference, Hoffman modulation, or dark-ground microscopy to amplify structural detail. Here we are using a modern Leitz oil-immersion objective lens to show the detail revealed by a fine present-day microscope, but without the benefits of these contrast-enhancing optical systems. The separation of the fine radiating peripheral markings is 0.2 µm, close to the limits of light microscopy. They are just beyond the resolution of the single lens in Figure 18. These correlated images substantiate that simple microscopes produced surprisingly clear images of fine microscopical features.
instruments became well known. In the collections at the University Museum of Utrecht, Netherlands, is a Dollond microscope that is described in the published catalogue as the “Pocket Microscope of Robert Brown”, and gives magnifications for the various lenses as 185×, 330× and 480×. The microscope itself fits into the palm of your hand, and can collapse into a small leather-covered box little larger than a cigarette packet. There remained a mystery surrounding this instrument, however, for there was no record of how it could have been translocated from Brown’s home in Soho Square, London, to the Physics Laboratory in Utrecht University. The lesson here is—never rely on second-hand sources, even when dignified by print in a collections catalogue. It transpired that the microscope had never belonged to Robert Brown. The entry in the hand-written accessions book revealed to me a story that was significantly different from the printed list. This was, it said, a microscope “folgens Rob. Brown”; i.e., after, or following, Brown’s instrument. It was his type of microscope, and was never in his possession: mystery solved (Ford, 1985). A magnification of 480× would be a remarkable performance by a single lens. The working distance of the lens would be less than a millimeter; the lens itself would be no bigger than the head of a pin. In traveling to Utrecht to experiment with the microscope I realized that I would be faced with the summit of achievement for the maker of a single-lens instrument. But it was not to be. Sadly, the lens holder no longer held its tiny lens. This
58
Brian J. Ford
design offered the highest magnification of any microscope ever put into production, and it was also the smallest such instrument in history. As such, it was a dead-end in development. We can see vague connections in concept with the portable microscope designed by John McArthur in Cambridge, but there is no direct lineage from this Dollond microscope to the present day. The Bancks designs, by contrast, clearly reveal the way ahead. We can follow the stages of development, and it is easy to contemplate the homogeneity between the design of these simple microscopes and the modern research microscope. The lineage is unmistakable. We have seen how the design of the Bancks type of microscope evolved during the first quarter of the nineteenth century from a modest lens support on a stand, with crude focusing, a circular stage, and a mirror (Figure 20) into a range of microscopes that variously boasted a substage condenser, concentric controls, and both coarse- and fine-focusing adjustments (see Figure 10). If achromatic microscopes had never been envisaged, this type of instrument would still be in vogue today, for the images that they can produce are impressive (Figure 21).
5. PIONEERS OF FIELD MICROSCOPY Microscopes of the eighteenth century lacked the ingenious accoutrements of the botanical microscope of the early 1800s. As we have seen, these earlier microscopes were uncomplicated and the entire device fit into a boss set into the lid of its box. There were no finely designed fine-focusing mechanisms and little that was ingenious about them. Although they were ¨ often supplied with a Lieberkuhn reflector (and thus could be used for entire botanical specimens), they were more often known, generically, as “aquatic microscopes” because they had been developed to study freshwater organisms. The origin of this design can be traced to one investigator, and a single instrument maker. It was the concept of John Ellis (1710–1776), an Irishborn British government official who spent much time based in Florida and Dominica, and who was an enthusiastic microscopist in his spare time. Ellis was an active member of the burgeoning class of natural historians who were investigating the wonders of an expanding world, and his social milieu is itself a fascinating commentary on the rapid expansion in awareness of the microscopic world (Duyker & Tingbrand, 1995). He had used microscopes for years, but found the instruments then available were unsatisfactory for the study of freshwater microscopical life. So he turned to a well-established instrument maker in London who had provided him with microscopes and proposed an alteration in design (see Figure 20). Microscopes of the time enclosed the specimen in a confining stage, which meant that delicate living organisms could be crushed or—if they survived intact—could not be
Did Physics Matter to the Pioneers of Microscopy?
59
FIGURE 20 The aquatic microscope designed by John Ellis. John Ellis (1710–1776) commissioned the production of this aquatic microscope seventy years before Bancks, in 1754. His design was initially made by the London instrument maker, John Cuff, and it has solid a square-section vertical brass pillar, compared with the hollow tube of the Bancks design that was to follow. The stage has no embellishments and is meant to support a watchglass (containing aquatic microorganisms) or alternatively an ivory slider that would be simply laid across the circular stage. The brush (P) is fashioned from a quill, the hollow end of which could be used to transport drops of water to the watchglass (M). Note too the Lieberk¨ uhn (G) – one is shown fitted to this microscope (top). This illustration is from ‘‘Essays on the Microscope by George Adams’’ (1750–1795), instrument maker to King George III, printed by Dillon and Keating in 1787.
60
Brian J. Ford
FIGURE 21 Chromatic aberration and the botanical microscope. Here we see a transverse section of fern rhizome, and the main features are several large vessels—the tube-like structures that convey sap from the roots to the fronds. Using the No. 1 lens from the Robert Brown microscope (see Figure 11) we can see the histological structures clearly, and the supporting cells that surround the vessels are all well resolved. The color plate also shows significant chromatic aberration. It is important to note that, although the spurious colors are apparent, they do not greatly detract from the clarity of the image. Standard textbooks cite the rainbow-hued fringes that prevented early microscopists from seeing clearly but, as can be seen, this is a conclusion published by commentators who have never had the benefit of seeing the images they describe. (See Color Insert.)
reached by a dissecting needle or a probe, which an investigator might wish to utilize. It was a London instrument maker to whom Ellis vouchsafed the task, John Cuff (1708–1792). Here we have a successful craftsman of the highest standards, already accustomed to producing compound microscopes— admittedly with uncorrected lenses, but finely tooled and beautifully finished. The wooden cases were meticulously constructed of hardwood (apple, mahogany, oak) and lined with baize. The microscopes were of lacquered brass, with accessories to help the user hold the specimen or illuminate it in a variety of ways. For their time, they were as perfect as a microscope could be; but they were bulky, which prevented their being used for visits to the countryside, and the design of the stage made it difficult to study aquatic organisms. Something altogether simpler was required. The design which Ellis devised had a vertical pillar supporting a circular stage, into which a concave watch-glass could easily fit, and which could be mounted into the lid of the microscope box (Figure 22). The lens arm could be raised or lowered to focus the image, and also turned from side to side to scan across the stage, which is important for studies of pond life. Beneath the stage was a double-sided mirror, one plane, the other concave. The whole instrument could be disassembled and packed into the small wooden box, itself typically adorned with shagreen and finely finished. This
Did Physics Matter to the Pioneers of Microscopy?
61
FIGURE 22 From a pioneering era: the microscope of Linnaeus. This instrument was owned by Carl Linnaeus, father of taxonomy, and was photographed by the author at Uppsala, Sweden, where it is preserved at Linnaeus’ former home. In this design, the wooden box was typically covered with shagreen made by polishing the spines from shark or ray skin. As in the uhn is shown in the fitted position. Only one microscope illustrated Figure 20, a Lieberk¨ low-power lens now remains in this microscope case, and it gives images of poor quality. Linnaeus had something of a blind spot for microorganisms, and I have found no record of his using his microscope to any great effect. Great taxonomist though he was, Linnaeus was no microscopist.
was a microscope that could fit easily into the coat pocket. It is easy to use, and (unlike the microscopes for which Cuff was already well known), these were simple microscopes. The problems caused by aberrations were minimized with a single lens, and the user had a highly portable instrument with a wide range of uses both in the field and back at the desk. The basic design also meant that these microscopes were affordable, and the intelligentsia could easy obtain one of their own. The principal problem these instruments posed was one of nomenclature: Are they properly described as a Cuff, or Ellis, microscope? Both terms are used, but here I will settle for Ellis. Although it was manufactured by Cuff, the original description was “Mr Ellis’ Aquatic Microscope”. It was Ellis’ design, after all, and many manufacturers subsequently produced versions of their own. Not everybody who bought one of these diminutive microscopes used it. In Sweden, the father of taxonomic terminology, Carl Linnaeus, purchased an Ellis microscope in a sharkskin case (see Figure 22). It came with ¨ two lenses mounted into a holder bearing a Lieberkuhn, which made the
62
Brian J. Ford
FIGURE 23 Macroscopic observations by Carl Linnaeus. These are the closest I have found to true microscopical observations by Linnaeus. The drawings show the crane fly, Pedicia rivosa (below) and the moss, Funaria hygrometrica (above). When Robert Hooke portrayed the same moss in his book, Micrographia (1665), he clearly showed the cells of which each leaflet is comprised; these pictures by Linnaeus lack such fine detail, and show little more than can be seen with the naked eye (compare with Figures 21 and 39), though the venation of the crane fly wings is well portrayed. The illustration is from Linnaeus’ journal for 1732 at the Linnean Society, to whom the author extends grateful acknowledgement.
microscope eminently suitable for a busy botanist (Report, 1932) but there is no first-hand evidence that Linnaeus used it. None of his surviving drawings, or published diagrams, shows microscopical detail. There is an indifferent drawing of the crane fly, Pedicia (formerly Tipula) rivosa dating from 1732, for which a low-power lens might have been employed, and a few macroscopical botanical studies, all of which could have been made by the naked eye, but nothing more detailed than that (Figure 23). Linnaeus was also surprisingly uninterested in the myriad microscopic organisms that had been documented before his time. As we can see from the published accounts, Linnaeus was always vague about microorganisms (Linnaeus, 1758). He set down a genus “Microcosmus” which was defined as “Corpus variis heterogeneis tectum,” and recognized Volvox globator as “Corpus liberum, gelatinosum, iotundam, artubus destitutum”. Someone must have drawn his attention to amœbæ for Linneaus also recorded the genus as “Chaos 2. V. polymorpho-mutatibis” and indeed the common pond amœba was designated Chaos chaos (L) well into the Victorian era. Linnaeus’ microscope survives in Uppsala to this day. Of the lenses, only one remains and it is of poor quality. This low-power lens is suitable for only the most basic investigations, and it does not have the
Did Physics Matter to the Pioneers of Microscopy?
63
quality that is ordinarily associated with lenses made by Cuff. Perhaps it is a magnifier dating from earlier in Linnaeus’ career and used for close views of plant specimens; in any event, the microscope is now useless for microscopy. Expert taxonomist and indefatigable collector though Linnaeus may have been, he was certainly no microscopist and he missed out on this fundamentally important realm of life. It was his greatest “blind spot”. Why would Linnaeus have purchased a microscope, if not to use it? There was one minute organism that he did describe; this was Hydra, the fresh-water polyp that he named in 1758. Hydra was an extremely popular organism for study by late eighteenth-century microscopists, and indeed interest in this diminutive creature remains current (Lenhoff, 1983). It was specifically for the study of Hydra that the Ellis aquatic microscope had originally been conceived. The problem with early simple microscopes was that they were delicate and troublesome to use. It was hard to mount the specimen; harder still to focus it. Many designers had tried to find ways to make the task easier, and one of the most widespread designs was a screw-barrel microscope (Clay & Court, 1932). In this instrument, the lens was mounted at one end of a tube. The specimen was slid into the body of the microscope, and the lens focused by screwing its holder in or out of the tube. This design had been perfected by James Wilson (1665–1740), who presented it to a meeting of the Royal Society of London in 1702 (see Figure 23). It proved to be very useful for botanists and others working in the field (Wilson, 1743). Commentators have alleged that Wilson plagiarized his design from one already announced by Nicholas Hartsoecker (1656–1725). It is certainly true that Hartsoecker, a Dutch physicist and pupil of Huygens, had indeed constructed a screw-barrel microscope in 1694, eight years before Wilson. It is also clear that the idea of the design had spread to England, for Wilson’s account did not describe himself as the designer, only as the maker. He described the screw-barrel microscope as “Late Invented,” which clearly acknowledges that it had been devised elsewhere. Screw-barrel microscopes were handy devices for observing specimens that were amenable to sliding into the spring-loaded specimen holder. These were the microscopes that gave rise to the ivory slider, a thin sliver of bone or ivory as wide as a pencil and as long as a matchstick. Countersunk holes (usually four in number) were set into the slider, and the dried specimen was held in position between two disks of mica that were secured in position with circlips (see Figure 8). This form of mounting was ideal for insects, butterfly wings and antennae, wood sections, fabrics, and hairs. Sliders are widely found as collectors’ items and the specimens that they contain are often in excellent order, despite three centuries in store. Although the idea of a “slider” seems decidedly dated, it gave rise to the microscope “slide” that is universally familiar to present-day microscopists.
64
Brian J. Ford
The manufacture of slides was made feasible by the introduction of “patent plate” or “flattened crown” glass in the 1850s. At the same time, Chance Bros of Birmingham, England, began the production of coverslips described as being “of various degrees of thickness, from 1/20th to 1/250th of an inch” (Carpenter, 1862). These slides are the standard mounting materials in the present day. In imitation of the appearance of the ivory sliders, it was the convention to cover early glass slides with paper, only the circular areas of the coverslip remaining clear. These are now popular items in slide collections. Slides and sliders are of the greatest value for the preservation and examination of specimen material, particularly entire small specimens and thin sections of larger objects (like tissue sections). They are of less value for the examination of delicate living organisms, like Hydra, the body of which is typically 1 cm in length and which is disturbed (and even structurally disrupted) by perturbations in its environment. Hence the move from the screw-barrel instrument to the aquatic microscope—this alone, with its open stage and watch-glass specimen chamber, allowed microscopists to examine the world of Hydra in a near-natural state.
5.1. The Polype Shows the Way It was the demands imposed by Hydra that led to the development of the aquatic and, later, botanical microscopes. But why? Hydra is one pond creature of many. Was there a reason for the sudden burst of popularity that propelled it to the number one slot on the microscopists’ charts? There was a reason, and it lay in the Netherlands, where a young Swiss naturalist was engaged as a tutor and began a series of experiments that effectively launched the era of experimental biology. This experimenter was Abraham Trembley (1710–1784) of Geneva (Breen, 1956), who was appointed tutor-inresidence to the two children of Count Bentinck of The Hague, Netherlands. He cultured the polyps in glass vessels, observing them as they grew and reproduced. Then he began to experiment with them, everting their bodies, transplanting parts of one onto another, observing how one could use the organism as an experimental animal and coining the concept of transplantation (Trembley, 1744). The experiments would have been a wonderful introduction to microscopical zoology for Trembley’s young charges, and they had many implications for experimental biology. But they would have remained undisclosed in The Hague had it not been for the encouragement of the distinguished philosopher Ren´e R´eamur (1683–1757) and the growing interest of the Royal Society, to whom he communicated Trembley’s findings. For two years the Royal Society withheld support, repeatedly asking for supplies of the organisms and details of how the experiments were performed. Eventually they were convinced. Martin Folkes (1690–1754) at the Society wrote a description of Trembley’s work as “one of the most
Did Physics Matter to the Pioneers of Microscopy?
65
beautiful discoveries in natural philosophy” and the London philosophers suddenly started to show interest (Lenhoff & Lenhoff, 1986). One of these was Henry Baker (1698–1774) who, faced with this remarkable series of experiments, hastily compiled a popular book on the use of the microscope. Baker was a remarkable character who made a living as a young man teaching deaf-mutes to communicate and helping to show them how to live fulfilling lives. His work as a teacher brought him to the attention of Daniel Defoe (1660–1731), who in 1727 invited Baker to visit him at home. Two years later, Baker married Sophia, Defoe’s daughter, and his acceptance into learned London society was complete. In 1741 he was elected Fellow of the Royal Society. Baker threw himself whole-heartedly into amateur science. He was captivated by Trembley’s work, and immediately set about carrying out experiments of his own with Hydra. The published results appeared at length in Baker’s book, enticingly entitled The Microscope Made Easy and published in 1743 (Baker, 1743) (Figure 24). The book was dedicated to “Martin Folkes Esqr; President, And to the Council and Fellows of the Royal Society of London” and it portrays a popular account of the state of microscopy at the time. Baker was no great innovator, but he was an enthusiastic popularizer and set down his account of the Hydra experiments which he describes as “an insect” (it is of course a coelenterate) discovered by Mr Trembley “who now resides in Holland”. Baker tried to reprise Trembley’s experiments and added a few observations of his own. The list of experiments that Baker described was comprehensive and attracted widespread attention: I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI XVII
Cutting off a Polype’s Head Cutting a Polype into two Pieces, Transversely A Polype cut into three Pieces Transversely Cutting the Head of a Polype in four Pieces Cutting a Polype in two Parts, Lengthways Cutting a young Polype in two Pieces whilst still hanging to its Parent Cutting a Polype lengthwise through the Body, without dividing the Head A Repetition of the foregoing Experiment, with different Success Cutting a Polype in two Places through Head and Body, without dividing the Tail Cutting off half a Polype’s Tail Cutting a Polype transversely, not quite through Cutting a Polype obliquely, not quite through Slitting a Polype open, cutting off the End of its Tail Cutting a Polype with four young Ones hanging to it Quartering a Polype Cutting a Polype in three Pieces the long way An Attempt to turn a Polype, and the Event
66
Brian J. Ford
FIGURE 24 Baker publishes Leeuwenhoek’s studies of the common polyp, Hydra viridis. The extensive studies of Hydra carried out by Trembley stimulated interest in aquatic microscopy, yet he was not the first to study the organism. Leeuwenhoek sent to London a series of studies of Hydra starting in his letter to the Royal Society dated 25 December 1702. The findings appeared in Philosophical Transactions (1703). [This version of Leeuwenhoek’s studies appeared in Baker (1743). ‘‘The Microscope made Easy.’’ R Dodsley & Co., London, p. 93.] This popular book gave publicity to many earlier workers, and hosts of amateur observers sought a way of observing these polyps for themselves.
XVIII XIX XX XXI XXII
Turning a Polype inside out An Attempt to make the divided parts of different Polypes unite A speedy Reproduction of a new Head A young Polype becoming its Parent’s Head A cut Polype producing a young One, but not repairing itself.
Did Physics Matter to the Pioneers of Microscopy?
67
Interest in Hydra increased enormously. From being an insignificant organism that few had even noticed, it was suddenly the most fascinating of subjects for any microscopist. And this was where the problems began, for the popular Wilson screw-barrel microscope could not cope with a delicate creature like this. This is the microscope that Baker describes in detail as his book opens: The first that I shall mention, is M R . W ILSON ’ S Single Pocket Microscope; the Body whereof made either of Brass, Ivory, or Silver.... Baker proceeds by describing how this instrument can be mounted on a stand, leaving the observer’s hands free; and concludes with the claim that this microscope: ... is as easy and pleasant in its Use, and as fit for the most curious Examination of the Animalcules and Salts in Fluids... it is as likely to make Discoveries in Objects that have some Degree of Transparency, as any Microscope I have ever seen or heard of. Not everyone agreed, John Ellis for one. Ellis found that the screwbarrel design tended to crush delicate specimens and was a hindrance to his investigations of these aquatic organisms, and he resolved to design something more appropriate. Thus it was that the popularity of Hydra as an experimental subject led to the design of the aquatic microscopes thatestablished their direct lineage to the bench microscope of today.
5.2. The Dutch Draper’s Roots Trembley believed that he had been the first to discover Hydra. He was wrong. This organism had already been described and figured more than 30 years earlier by the renowned Dutch microscopist, Antony van Leeuwenhoek (1632–1723). Indeed, Leeuwenhoek’s studies of Hydra feature in the same book by Henry Baker as the experiments by Trembley. Leeuwenhoek brings us yet deeper in history back towards the birth of microscopy. All his life he used home-made simple microscopes with crude focusing controls and which in use required diligence, skill, and unimaginable patience (see Figures 20 and 22). They were the crudest single-lens microscopes ever used for serious scientific research, yet they provided astonishing results. Leeuwenhoek himself was still working on microscopy when aged 90 and lying on his death-bed, yet for all his indefatigable research and his indomitable persistence, he founded no school of microbiology (van Neil, 1949) and left no group of devoted followers to carry on his teachings. His influence lived on, of course, and was clearly a trigger to Baker’s enthusiasms (Baker, 1743) in addition to stimulating others to look into the remarkable world he had revealed. My work, which I’ve done for a long time, was not pursued in order to gain the praise I now enjoy, but chiefly from a craving after knowledge, which I notice resides
68
Brian J. Ford
in me more than in most other men. And therewithal, whenever I found out anything remarkable, I have thought it my duty to put down my discovery on paper, so that all ingenious people might be informed thereof.—Antony van Leeuwenhoek [in] Letter to the Royal Society of London dated 12 June 1716. It was Antony van Leeuwenhoek who founded the modern science of microscopical biology. Apart from his description of Hydra, he produced innovative and unprecedented studies of sperm cells and blood corpuscles, cell nuclei and bacteria, protozoa and algae; the whole realm of the microscope lay within his grasp. His lenses were produced by painstakingly grinding beads of soda glass into biconvex magnifiers a few millimeters across, some magnifying several hundred diameters, or—more rarely—by blowing a bubble of glass and utilizing the small “nipple” that formed at the far end as his lens (Figure 32). Each of his microscopes was small, the body of each being not much larger than a rectangular postage-stamp made of workable metals like brass and silver that he extracted from the ore. Unlike the lenses, the bodies were not fine examples of workmanship; they did not need to be. Their purpose was simply to hold specimen and lens in juxtaposition and clearly in focus so that observations could be made over time. Then—once Leeuwenhoek knew what he wanted to portray—he handed over the entire instrument to his limner and commanded that the view be drawn for posterity. Leeuwenhoek himself could not draw, and said so in his records. He always had a skilled artist to perform this task. Other contemporaneous Dutch microscopists, like the gifted Jan Swammerdam (1637–1680), could make their own drawings but Leeuwenhoek lacked the skill. There is a parallel connection between Leeuwenhoek and the Flemish painter Johannes Vermeer (1632–1675). The births of both men were recorded on the same page of the baptismal register of the old Church in Delft, and— when the young Vermeer met his untimely demise at the age of 43—it was Leeuwenhoek, by this time a town official, who was appointed executor to Vermeer’s estate. Because the two were contemporaries, it has been claimed that Leeuwenhoek appears in Vermeer’s paintings but there is no evidence for the assertion, and little resemblance between the persons in the Vermeer paintings and the surviving portraits of Leeuwenhoek. It has even been suggested that Vermeer was Leeuwenhoek’s limner, but that is even more fanciful. I truth, we still have no notion who that person (or, more likely, persons) might have been. As we have seen, Leeuwenhoek’s studies were revived by Baker (1743) and his research was published by Hoole (1798–1807) in 1798. An 1875 biography was written Haaxman (Haaxman, 1875) and in 1932 Dobell published his masterful biography (Dobell, 1932). My 1991 book, The Leeuwenhoek Legacy (Ford, 1991) is perhaps best seen as a modest supplement to Dobell’s definitive work. Looking at Leeuwenhoek’s prodigious output gives an overriding impression of devotion and a highly developed ability
Did Physics Matter to the Pioneers of Microscopy?
69
FIGURE 25 A brass Leeuwenhoek microscope from Delft, Netherlands. This is the only Leeuwenhoek microscope still held in his hometown of Delft and is the only one that lacks its lens. Most of his surviving microscopes were based on this design. The microscope is to be found behind the plate-glass window of a display cabinet in the foyer to the Technical University. The body plates are fashioned from Leeuwenhoek’s home-made brass alloy and measure 22 mm × 46 mm. The two plates are fixed together by means of four small rivets and the main securing screw at the base. With this type of diminutive microscope, Leeuwenhoek was the first to observe a great range of now commonplace specimens, including cell nuclei and bacteria.
to record his unprecedented observations with accuracy. In many cases, the descriptions and figures that are recorded in Leeuwenhoek’s voluminous letters are so meticulously accurate that each species can immediately be recognized by present-day microscopists. Leeuwenhoek’s hand-made microscopes were crudely made, with just enough attention to detail to ensure that they functioned properly (Figure 25). The design of the microscopes was simple, but highly effective (Figure 26), and the details he observed seem so remarkably detailed for such an unsophisticated instrument that detractors have doubted his claims since the day they were made. Professor R. V. Jones once recounted to me how he had supervised an examination for medical students in which one question asked for an explanation of why it was theoretically impossible for Leeuwenhoek to have observed bacteria with a simple microscope. This is much the same kind of skepticism that surrounded Robert Brown’s claims (Report, 1932) when his microscope was examined. There are many die-hard skeptics in science. During Leeuwenhoek’s life, his work became widely known. His discoveries even attracted the interest of King Charles II5 who, according to a 5 Dobell (1932, p. 184), incorrectly cited in Fournier, M. (1996). The Fabric of Life, Microscopy in the Seventeenth Century. Baltimore and London: John Hopkins University Press, p. 234, reference 100.
70
Brian J. Ford
FIGURE 26 How the silver Leeuwenhoek microscopes were designed. Two of the surviving microscopes have body plates made from silver that Leeuwenhoek beat into sheet metal at his home in Delft. This diagram has been prepared by the author to illustrate how it was constructed and used in practice. The 25 mm × 45 mm plates were secured together in this case by six rivets, doubtless because of the softness of the silver. A long main screw, beaten and then tapped to provide a thread, is used to raise and lower the transverse stage. The inclined screw is used to focus the image. The stage of this microscope is triangular in section and the specimen pin has been made from wrought iron. A specimen was fixed to the point with sealing wax, and was contained—if necessary—within a flat-sided glass phial in which aquatic microorganisms could flourish.
letter from Robert Hooke to Leeuwenhoek in 1679, “was desirous to see them and very well pleased with the Observations and mentioned your Name at the same time.” Hoole records that Leeuwenhoek had also presented “a Couple of his Microscopes” to Queen Mary II, wife of William III of Orange. Leeuwenhoek also met (and presented a microscope to Peter the Great in 1698. Leeuwenhoek’s discoveries helped to found the science of experimental biology (Ford, 1995) and he became renowned across Europe by the time he died. There are nine surviving Leeuwenhoek microscopes, not all of them of proven provenance (Ford, 1991) and, in addition, we now have the additional legacy of the nine original specimen packets that Leeuwenhoek bequeathed to the Royal Society (Ford, 1981a,b). They include plant tissue sections, slices of optic nerve, dried aquatic algae and aerial sections of seeds, all prepared with diligence and uncommon skill (Figure 27).
6. THE IMAGE OF THE SIMPLE MICROSCOPE Here we arrive at the crucial question: Was it possible for Leeuwenhoek to make the discoveries that are associated with his name? Was he just,
Did Physics Matter to the Pioneers of Microscopy?
71
FIGURE 27 Original Leeuwenhoek specimens sent to the Royal Society of London. On 2 April 1686, Leeuwenhoek sent his final two packets of specimens to London. The first (top) contained, he stated, ‘‘a cotton seed cut into 24 round slices.’’ This was the launch of serial sectioning as an aid to the study of microanatomy. The second packet (bottom) contained ‘‘9 seeds from the cotton tree which have been stripped of their involucres and in which the leaves have been separated.’’ They are fine examples of microdissection; in this packet, the detached specimen (top left corner) clearly shows the cotyledons, plumule, and radicle visible to the naked eye. Dutch scholars, relying on microfiche copies of the pages, described the packets as ‘‘drawn rectangles,’’ missing entirely the fact that specimens lay concealed within.
72
Brian J. Ford
as has been alleged, a dilettante who exaggerated his results? What can an observer perceive with a single-lens microscope? Could bacteria truly have been discovered with anything so simple? We have already seen that a single lens can theoretically resolve structures as small as 0.25 µm, and this is theoretically sufficient to resolve many species of bacteria, typical examples of which are >2 µm in breadth. The best microscope of Leeuwenhoek that is known to survive is in the collection of the University Museum of Utrecht, Netherlands. The lens, when calibrated by my late colleague J. van Zuylen of Zeist, has a focal length of 0.94 mm and magnifies 266×. I used this microscope to image a dry smear of my own blood, and the results displayed the erythrocytes with remarkable clarity. They are biconcave discs with an in vivo diameter of 7.8 µm that is reduced to 7.2 µm in dried smears, and that Leeuwenhoek first described in 1674 (note that Swammerdam—unbeknown to Leeuwenhoek—may have observed the erythrocytes of Rana, the frog, in 1658). Scattered among the erythrocytes (more popularly known as red corpuscles, or red cells) are leucocytes (white blood cells). Of these, the granulocyte has a curiously lobed nucleus in which each lobe measures approximately 2 µm (Figure 28). These lobes are of bacterial dimensions, and my photomicrographs (taken with the Utrecht microscope) clearly show these structures. As we have seen in previous examples, the problem of chromatic aberration is minimal. There are some slight perturbations in color, but not enough to prevent the microscopist from seeing what needs to be seen. We have a second strand of evidence. I examined the specimens of the stem of elder, Sambucus, that Leeuwenhoek sent to the Royal Society London in 1674 (Figure 29). In one specimen I observed a fine fibril under the scanning electron microscope (SEM). On the SEM scale it measured 0.7 µm across. I had managed to find the same fibril as had been visible in the micrographs taken through the Leeuwenhoek lens, so we had directly comparable images in both Leeuwenhoek’s microscope and a present-day electron microscope. This made calibration an easy matter. Finally there is the practical demonstration of living bacteria with a simple microscope. We can see how the Victorian microscopists resolved spiral bacteria from the published images (Figure 30). For this experiment I utilized a spinel lens ground by my good friend, the late Horace Dall of Luton, which was calibrated to magnify 395× (and is thus comparable with the best surviving Leeuwenhoek microscope). With this modern version of a singlelens microscope I easily observed living aquatic spiral bacteria of the genus Spirillum (Figure 31). The results were unambiguous. The bacteria could be well resolved, thus confirming that Leeuwenhoek had living bacteria within his range (Ford, 1998).6 These three results leave no room for doubt. 6 This article has been reprinted as: Premi´eres images au microscope, Pour la Science, 249, 169–173; and ¨ Fruhe Mikroskopie, Spektrum der Wißenschaft, June, 68–71. [Other editions in Chinese, Japanese, Polish, and Spanish].
Did Physics Matter to the Pioneers of Microscopy?
73
FIGURE 28 Human blood cells imaged by the Leeuwenhoek microscope at Utrecht. Some additional studies were made at the conclusion of the photomicrography session with the Leeuwenhoek microscope (see Figure 33). A smear of blood on a coverslip was airdried and focused for photography. This image shows the entire field of view; noteworthy is the high proportion of the field that is usably in focus. For such small specimens, chromatic aberration is inconsequential. Note (top right) a single leukocyte with its lobed nucleus clearly resolved. The cell itself is some 12 µm across, and the lobes are about 2 µm in diameter–roughly the size of bacteria like Staphylococcus. This picture, taken almost as an afterthought, vividly demonstrates the capacity of the single lens. As in all of the photographs in this chapter, the image contrast and color have been optimized by Adobe Photoshop CS2. (See Color Insert.)
6.1. Analyzing the Image To the optical physicist, it is the lens data that matter above all. The results obtained by van Zuylen give the measurable parameters that define the theoretical performance of the Utrecht lens, and he gives a linear magnification of 266×, numerical aperture 0.13, calculated resolution 1.16 µm, measured resolution 1.35 µm. The lens van Zuylen measured as 1.2 mm thick and the glass from which it is made is standard soda (bottle or window) glass of refractive index 1.5—just as one would expect. Very well, but what does this mean in practice? If we take the literal interpretation of these figures, then it seems logical to deduce that anything measuring 1. These spaces are endowed with the following norm:
k f kW k, p =
k X
p k f (i) k L p
!1/ p =
k Z X
|f
(i)
!1/ p (t)| dt
.
p
(38)
i=0
i=0
An interesting particular case is for p = 2, denoted H k = W k,2 , because of their relation with the Fourier series. More information about the Sobolev spaces can be found in the book by Adams (1975). 2.7.2. Besov Spaces The next kind of spaces are Besov spaces B sp,q . Functions taken in B sp,q have s derivatives in L p . The parameter q permits more precise characterization of the regularity. A general description of these spaces can be found in Triebel (1992). In this paper, we give only their connection with wavelets. Indeed, different expressions exist for the norm associated with Besov space but one uses the wavelet coefficients; see (39) #1/ p
" ∀f ∈
B sp,q
k f k B sp,q =
X
|αn |
p
n
+
+∞ X
2
j
1 d 2 − p +s
q
" X
p 2
#q/ p 1/q
2 j |β jn | p
.
(39)
n
j=0
The homogeneous version is ∀ f ∈ B˙ sp,q
k f k B˙ s
p,q
=
+∞ X j=−∞
2
j
d 1 2 − p +s
q
" X
p 2
2 j |β jn | p
#q/ p 1/q
, (40)
n
where αn and β jn are the coefficients issued from the wavelet expansion (see Section 2.2). 2.7.3. Ridgelet Spaces In the same way as previous, Cand`es define the ridgelet spaces R sp,q endowed with the norm based on the ridgelet coefficients.
100
J´erˆ ome Gilles
Definition 3. For s > 0 and p, q > 0, we said that f ∈ R sp,q if f ∈ L 1 and Ave kR f (u, .) ? ϕk L p < ∞ u p 1/ p and 2 js 2 j (d−1)/2 Ave kR f (u, .) ? ψ j k L p ∈ lq (N),
(41)
u
R where R f (u, t) = u.x=t f (x) d x is the Radon transform of f (u (cos θ ; sin θ )). The function ϕ is the scale function associated with ψ.
=
Then the induced norm is defined by k f k R sp,q = Ave kR f (u, .) ? ϕk L p u ( ) q 1/q X 1/ p p + 2 js 2 j (d−1)/2 Aveu kR f (u, .) ? ψ j k L p
(42)
j>0
and its homogeneous version R˙ sp,q
k f k R˙ s
=
p,q
X
2 js 2 j (d−1)/2
j∈Z
1/q q 1/ p p . Aveu kR f (u, .) ? ψ j k L p
(43)
As in the Besov case, these norms can be calculated from the ridgelet coefficients. Let w j (u, b)( f ) = h f (x), ψ j (u.x − b)i for j > 0 and v(u, b)( f ) = h f (x), ϕ(u.x − b)i these ridgelet coefficients, then Z kfk
R sp,q
=
+
|v(u, b)( f )| du db p
( X
js j (d−1)/2
2 2
1/ p
Z
|w j (u, b)( f )| du db p
1/ p !q )1/q
.
(44)
j>0
More information can be found in Cand`es (1998). 2.7.4. Contourlet Spaces Inspired from the previous spaces, we propose to define the contourlet spaces, which will be denoted Cosp,q .
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
101
Definition 4. Let s > 0 and p, q > 0, if f ∈ Cosp,q ; then #1/ p
" X
k f kCosp,q =
|α j0 ,n |
p
n
+
X
j d2 − 1p +s q
2
l
j −1 2X X
j6 j0
n
k=0
q/ p 1/q j 2p p 2 |β j,k,n | ,
(45)
or in the homogeneous case,
k f kCo ˙ s
p,q
=
X
2
j d2 − 1p +s q
l
j −1 2X X
j∈Z
k=0
n
q/ p 1/q p 2 j 2 |β j,k,n | p ,
(46)
where α j0 ,n and β j,k,n are the contourlet coefficients mentioned in Section 2.6. 2.7.5. Bounded Variation (BV) Spaces The last space of interest is the BV space, the space of bounded variations functions. This space is widely used in image processing because it is a good candidate to modelize structures in images. Definition 5. The space BV over a domain is defined as BV =
f ∈ L 1 ();
Z
|∇ f | < ∞ ,
(47)
where ∇ f is the gradient, in the distributional sense, of f and Z
Z
|∇ f | = sup − → ϕ
2 1 − → − → − → f div ϕ ; ϕ ∈ C0 (, R ), | ϕ | 6 1 .
(48)
This space is endowed with the following norm: Z k f k BV = k f k L 1 +
|∇ f |.
(49)
But in general, we only keep the second term, which is well known as the total variation of f . In the rest of the paper, we will use the notation J( f ) =
Z
|∇ f |.
(50)
102
J´erˆ ome Gilles
More information about the BV space is available in Haddad (2005) and Vese (1996). We now have all the basic tools needed to describe the image decomposition models. The next two sections present the structures + textures and structures + textures + noise models, respectively.
3. STRUCTURES + TEXTURES DECOMPOSITION The starting point of the image decomposition models is the work of Meyer (2001) about the Rudin–Osher–Fatemi (ROF) algorithm (Rudin et al., 1992). Let us recall the ROF model. Assume f is an observed image that is the addition of the ideal scene image u, which we want to retrieve, and a noise b. The authors propose to minimize the following functional to get u: FλROF (u) = J (u) + λk f − uk2L 2 .
(51)
This model assumes that u is in BV because this space preserves sharp edges. This algorithm gives good results and is very easy to implement by using the nonlinear projectors proposed by Chambolle (2004) (see Appendix A). Now if we take the image decomposition point of view, f = u + v, the functional in Eq. (51) can be rewritten as FλROF (u, v) = J (u) + λkvk2L 2 .
(52)
We remind the reader that decomposition means u is the structures part and v the textures part. Meyer shows that this model is not adapted to achieve this decomposition. In order to convince us, the following example illustrates that the more a texture is oscillating, the more it is removed from both the u and v parts. Example 1. Let v be a texture created from an oscillating signal over a finite domain. Then v can be written (x = (x1 , x2 )) as follows: v(x) = cos(ωx1 )θ (x),
(53)
where ω is the frequency and θ the indicator function over the considered domain. Then we can calculate the L 2 and BV norms of v, respectively. We get 1 kvk L 2 ≈ √ kθk L 2 , 2
(54)
103
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
which is constant ∀ω and does not specially capture textures. In addition, kvk BV =
ω kθ k L 1 , 2π
(55)
which grows as ω → ∞ and then clearly rejects textures. In order to adapt the ROF model to capture the textures in the v component, Meyer proposes to replace L 2 space by another space, called G, which is a space of oscillating functions. He proves that this space is the dual space of BV (where BV = { f ∈ L 2 (R2 ), ∇ f ∈ L 1 (R2 )}, which is close to the BV space and the total variation described earlier in the paper); see Meyer (2001) for more theoretical details about these spaces. This space G is endowed by the following norm:
1
2 2 2 | | |g |g + kvkG = inf 2
1
g
,
(56)
L∞
where g = (g1 , g2 ) ∈ L ∞ (R2 ) × L ∞ (R 2 ) and v = div g. If we calculate the G-norm of the oscillating texture in Eq. (53) of Example 1, we get kvkG 6
C , |ω|
(57)
where C is a constant. Then it is easy to see that this space G is well adapted to capture textures. Now, the modified functional performing the structures + textures decomposition is FλY M (u, v) = J (u) + λkvkG ,
(58)
where f = u + v, f ∈ G, u ∈ BV , v ∈ G. The drawback of this model is the presence of an L ∞ norm in the the expression of the G-norm (this does not allow classic variational calculus). The first people to propose a numerical algorithm to solve the Meyer model were Vese and Osher (2002). Their approach was to use the theorem which tells that ∀ f ∈ L ∞ (), k f k L ∞ = lim p→∞ k f k L p and a slightly modified version of Meyer’s functional:
q
2 2 OV 2
Fλ,µ, (u, g) = J (u) + λk f − (u + div g)k + µ g + g p 2
1 L2
.
(59)
Lp
Then variational calculus applies and results in a system of three connected partial differential equations. All the details of the equations discretization are available in Vese and Osher (2002). This algorithm works well but is very sensitive in the choice of its parameters, which induced many instability.
104
J´erˆ ome Gilles
Another way to solve Meyer model was proposed by Aujol et al. (Aujol, 2004; Aujol, Aubert, Blanc-F´eraud, & Chambolle, 2003; Aujol, Gilboa, Chan, & Osher, 2006). The authors propose a dual-method approach that naturally arises because of the dual relation between the G and BV spaces. The problem is assumed to be in the discrete case and defined over a finite domain . They proposed a modified functional to minimize AU Fλ,µ (u, v)
v + (2λ)−1 k f − u − vk2L 2 = J (u) + J µ ∗
(60)
and (u, v) ∈ BV () × G µ ().
(61)
The set G µ is the subset in G where ∀v ∈ G µ , kvkG 6 µ. Moreover, J ∗ is the characteristic function over G 1 with the property that J ∗ is the dual operator of J (J ∗∗ = J ). Thus, J (v) = ∗
0 if v ∈ G 1 +∞ else.
(62)
The interesting point is that the precited Chambolle’s projectors are the projector over the sets G µ , ∀µ; these operators will be denoted PG µ in the rest of the paper. More details about these projectors can be found in Chambolle (2004) and recalled in Appendix A. Then the authors propose an iterative AU (u, v). algorithm that gives the minimizers (u, ˆ v) ˆ of Fλ,µ • Let us fix v, we seek for the minimizer u of inf J (u) + (2λ)−1 k f − u − vk2L 2 . u
• Now we fix u and seek for the minimizer v of ∗ v inf J + k f − u − vk2L 2 . v µ
(63)
(64)
Chambolle’s results show that the solution of Eq. (63) is given by uˆ = f − vˆ − PG λ ( f − v) ˆ
(65)
and the solution of Eq. (64) by vˆ = PG µ ( f − u). ˆ
(66)
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
105
Then the numerical algorithm is 1. Initialization: u 0 = v0 = 0 2. Iteration n + 1:
u n+1
vn+1 = PG µ ( f − u n ) = f − vn+1 − PG λ ( f − vn+1 )
3. We stop the algorithm if max (|u n+1 − u n |, |vn+1 − vn |) 6 or if we reach a prescribed maximal number of iterations. The authors prove that the minimizers (u, ˆ v) ˆ are also minimizers of the original Meyer functional [Eq. (58)], and that it is better to start by calculating vn+1 than u n+1 ; see Aujol (2004) and Aujol et al. (2003) for the complete proofs. Figure 2 presents the three original images (Barbara, House, and Leopard) used for tests in the rest of the paper. Figures 3–5 illustrate the results from Aujol’s algorithm. The chosen parameters are (λ = 1, µ = 100), (λ = 10, µ = 1000), and (λ = 5, µ = 1000) respectively. For clarity reasons, we enhanced the contrasts of the textured components. On each test we see that the separation between structures and textures works well. Some residual textures remain in the structures part; this can be explained by the fact the parameter λ acts as a tradeoff between the “power” of separability and too much regularization of u. As the G-norm is difficult to handle, Meyer (2001) proposes to replace the ∞ ∞ space G by the Besov space B˙ −1,∞ because G ⊂ B˙ −1,∞ (in the following, we ∞ ˙ will denote E = B−1,∞ ). The advantage is that the norm of a function v over this space can be defined from its wavelet coefficients. The corresponding model proposed by Meyer is FλY M2 (u, v) = J (u) + λkvk E .
(67)
Aujol and Chambolle were the first to propose a numerical algorithm that uses the space E. As previously, they reformulated the model in a dualmethod approach, where E µ is the subset of E, where ∀ f ∈ E µ , k f k E 6 µ and B ∗ ( f ) is the indicator function over E 1 . Then the functional to minimize is AC ∗ v Fλ,µ (u, v) = J (u) + B + (2λ)−1 k f − u − vk2L 2 . (68) µ
106
J´erˆ ome Gilles
FIGURE 2 Original Barbara, House, and Leopard images.
Structures
FIGURE 3
Textures
BV –G structures + textures image decomposition of Barbara image.
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
Structures
FIGURE 4
Textures
BV –G structures + textures image decomposition of House image.
Structures
FIGURE 5
107
Textures
BV –G structures + textures image decomposition of Leopard image.
Chambolle, DeVore, Lee, and Lucier (1998) proved the existence of a projector on this space, denoted PE µ , defined by PE µ ( f ) = f − W ST ( f, 2µ),
(69)
where W ST is the wavelet soft thresholding operator (we mean that we first perform the wavelet expansion of the function, then we do the soft thresholding of the wavelet coefficients, and end by reconstructing the image). Then the new numerical algorithm is as follows:
108
J´erˆ ome Gilles
Structures
FIGURE 6
Textures
BV –E µ structures + textures image decomposition of Barbara image.
1. Initialization: u 0 = v0 = 0 2. Iteration n + 1: vn+1 = PE µ ( f − u n ) = f − u n − W ST ( f − u n , 2µ) u n+1 = f − vn+1 − PG λ ( f − vn+1 ) 3. We stop if max (|u n+1 − u n |, |vn+1 − vn |) 6 or if we reach a prescribed maximal number of iterations. The results obtained by this model are presented in Figures 6–8. This algorithm works, but its main drawback is that it captures some structures informations (like the legs of the table in the Barbara image; see Figure 6). This behavior appears because the space E is much bigger than the space G; in particular the space E contains functions that are not only textures. Osher, Sole, and Vese (2002) explore the possibility of replacing the space G by the Sobolev space H −1 . They propose the following functional (v is obtained by v = f − u): FλV S (u) = J (u) + λk f − uk2H −1 ,
(70)
R where kvk H −1 = |∇(1−1 )v|2 d x dy. The authors give the corresponding Euler–Lagrange equations and their discretization. Another way to
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
Structures
FIGURE 7
Textures
BV –E µ structures + textures image decomposition of House image.
Structures
FIGURE 8
109
Textures
BV –E µ structures + textures image decomposition of Leopard image.
numerically solve the problem is to use a modified version of Chambolle’s projector PH −1 (see Appendix A). Figures 9–11 present the results obtained λ with this algorithm. Some other models were proposed that test different spaces to replace BV or G spaces. We mention the work of Aujol et al. (Aujol and Chambolle, 2005; and Aujol and Gilboa, 2006) who propose replacing the space BV by 1 , or replacing G by some Hilbert spaces, which the smaller Besov space B1,1 permits the possibility of extracting textures with a certain directionality. 1 , instead of BV (the Haddad (2005) proposes using the Besov space B˙ 1,∞ norms over these two spaces are equivalent) with the L 2 norm for the v part.
110
J´erˆ ome Gilles
Structures
FIGURE 9
BV –H −1 structures + textures image decomposition of Barbara image.
Structures
FIGURE 10
Textures
Textures
BV –H −1 structures + textures image decomposition of House image.
Garnett, Jones, Triet, and Vese (2005) and Triet and Vese (2005) study the ˙ O −α , and W˙ −α, p to model the textures use of the spaces div (B M O), B M component.
4. STRUCTURES + TEXTURES + NOISE DECOMPOSITION The previous algorithms yield good results but are of limited interest for noisy images (we add a gaussian noise with σ = 20 on each test image of Figure 2, the corresponding noisy test images can be viewed in Figure 12). Indeed, noise can be viewed as a very highly oscillatory function (this means that noise can be view as living in the space G). Therefore, the algorithms
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
Structures
FIGURE 11
Textures
BV –H −1 structures + textures image decomposition of Leopard image.
FIGURE 12 Original Barbara, House, and Leopard images corrupted by gaussian noise (σ = 20).
111
112
J´erˆ ome Gilles
Structures
FIGURE 13
Textures
BV –G structures + textures image decomposition of the noisy Barbara image.
incorporate the noise in the textures components. Then the textures are corrupted by noise (see Figure 13 for example). In this section, we present some extension of the two-component model to the three-component model, f = u + v + w, which could discriminate among structures (u), textures (v), and noise (w).
4.1. BV –G–G Local Adaptative Model In Gilles (2007b), we proposed a new model to decompose an image into three parts: structures (u), textures (v), and noise (w). As in the u + v model, we consider that structures and textures are modelized by functions in BV and G spaces, respectively. We also consider a zero mean gaussian noise added to the image. Let us view noise as a specific very oscillating function. In virtue of Meyer’s work (2001), where it is shown that the more a function is oscillatory, the smaller its G-norm is, we propose to modelize w as a function in G and consider that its G-norm is much smaller than the norm of textures (kvkG kwkG ). These assumptions are equivalent to choosing v ∈ G µ1 ,
w ∈ G µ2 ,
where µ1 µ2 .
(71)
To increase the performance, we propose adding a local adaptability behavior to the algorithm following an idea proposed by Gilboa, Zeevi, and Sochen (2003). These authors investigate the ROF model given by Eq. (51) and propose a modified version that can preserve textures in the denoising process. To do this, they do not choose λ as a constant on the entire image but as a function λ( f )(x, y) which represents local properties of the image. In a cartoon-type region, the algorithm enhances the denoising process by increasing the value of λ; in a texture-type region, the algorithm decreases
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
113
λ to attenuate the regularization to preserve the details of textures. So λ( f )(x, y) can be viewed as a smoothed partition between textured and untextured regions. Then, in order to decompose an image into three parts, we propose to use the following functional: JG Fλ,µ (u, v, w) 1 ,µ2
= J (u) + J
∗
v µ1
+J
∗
w µ2
+ (2λ)−1 k f − u − ν1 v − ν2 wk2L 2 ,
(72)
where the functions νi represent the smoothed partition of textured and untextured regions (and play the role of λ in Gilboa’s paper). The νi functions must have the following behavior: • for a textured region, we want to favor v instead of w. This is equivalent to ν1 close to 1 and ν2 close to 0, • for an untextured region, we want to favor w instead of v. This is equivalent to ν1 closed to 0 and ν2 close to 1. We see that ν1 and ν2 are complementary, so it is natural to choose ν2 = 1 − ν1 : R2 → ]0; 1[. The choice of ν1 and ν2 is discussed after the following JG proposition, which characterizes the minimizers of Fλ,µ (u, v, w). 1 ,µ2 Proposition 3. Let u ∈ BV , v ∈ G µ1 , and w ∈ G µ2 be the structures, textures, and noise parts, respectively, and f the original noisy image. Let the functions (ν1 ( f )(., .), ν2 ( f )(., .)) be defined on R2 → ]0; 1[, and assume that these functions could be considered as locally constant compared to the variation of v and w. Then a minimizer defined by (u, ˆ v, ˆ w) ˆ =
arg
(u,v,w)∈BV ×G µ1 ×G µ2
JG min Fλ,µ (u, v, w), 1 ,µ2
(73)
is given by ˆ uˆ = f − ν1 vˆ − ν2 wˆ − PG λ ( f − ν1 vˆ − ν2 w), f − uˆ − ν2 wˆ vˆ = PG µ1 , ν1 f − uˆ − ν1 vˆ wˆ = PG µ2 , ν2
(74) (75) (76)
where PG µ denotes Chambolle’s non-linear projectors (see Appendix A). The proof of this proposition can be found in Gilles (2007b). As in the two-part BV –G decomposition model, we get an equivalent numerical scheme:
114
J´erˆ ome Gilles
FIGURE 14
Texture partition ν1 obtained by local variance computation.
1. Initialization: u 0 = v0 = w0 = 0, 2. Compute ν1 and ν2 = 1− ν1 from f , 3. Compute wn+1 = PG µ2 the division by zero),
f −u n −ν1 vn ν2 +κ
, (κ is a small value in order to prevent
2 wn+1 4. Compute vn+1 = PG µ1 f −u nν−ν , +κ 1 5. Compute u n+1 = f − ν1 vn+1 − ν2 wn+1 − PG λ ( f − ν1 vn+1 − ν2 wn+1 ), 6. If max{|u n+1 − u n |, |vn+1 − vn |, |wn+1 − wn |} 6 or if we did Nstep iterations then stop the algorithm, else jump to step 3. Concerning the choice of the νi functions, we were inspired by the work of Gilboa et al. (2003). The authors choose to compute a local variance on the texture + noise part of the image obtained by the ROF model ( f − u). In our model, we use the same strategy but on the v component obtained by the two parts decomposition algorithm. This choice is implied by the fact that the additive gaussian noise can be considered as orthogonal to textures. As a consequence, the variance of a textured region is larger than the variance of an untextured region. So, in practice, we first compute the two-part decomposition of the image f . On the textures part, for all the pixels (i, j), we compute the local variance on a small window (odd size L) centered on (i, j). At the least, we normalized it to obtain the values in ]0; 1[. All the details about the computation of the νi ’s function can be found in Gilles (2007b). Figure 14 shows an example from the noisy Barbara image. As expected, the variance is higher in the textured regions and lower in the others. Figures 15–17 show the results of the u +v +w decomposition we obtained by the BV –G–G local adaptive model. This model can separate noise from the textures. If we look more precisely, we can see that some residual noise remains in the textures, and some textures are partially captured in the noise
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
Structures
115
Textures
Noise
FIGURE 15
BV –G–G structures + textures + noise image decomposition of Barbara image.
part. This is due to the choice of the parameters λ, µ1 , and µ2 which act on the separability power of the algorithm.
4.2. Aujol–Chambolle BV –G–E Model The same time as our work, Aujol and Chambolle thought of the same structures + textures + noise decomposition problem (Aujol & Chambolle, 2005). They proposed a model close to our model described in the previous subsection but with the difference that they consider the noise as a ∞ distribution taken in the Besov space E = B˙ −1,∞ . Then the associated functional is AC2 Fλ,µ,δ (u, v, w)
w v = J (u) + J + B∗ + (2λ)−1 k f − u − v − wk2L 2 , (77) µ δ ∗
116
J´erˆ ome Gilles
Structures
Textures
Noise
FIGURE 16
BV –G–G structures + textures + noise image decomposition of House image.
where u ∈ BV , v ∈ G µ , and w ∈ E δ as defined in the previous sections. The authors prove that the minimizers are (see Aujol and Chambolle (2005)): uˆ = f − vˆ − wˆ − PG λ ( f − vˆ − w), ˆ
(78)
vˆ = PG µ ( f − uˆ − w), ˆ
(79)
wˆ = PE δ ( f − uˆ − v) ˆ = f − uˆ − vˆ − W ST ( f − uˆ − v, ˆ 2δ),
(80)
where W ST ( f − uˆ − v, ˆ 2δ) is the WST operator applied on f − uˆ − vˆ with a threshold set to 2δ. Then the numerical algorithm is given by 1. Initialization: u 0 = v0 = w0 = 0, 2. Compute wn+1 = f − u n − vn − W ST ( f − u n − vn , 2δ),
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
Structures
117
Textures
Noise
FIGURE 17 image.
BV –G–G structures + textures + noise image decomposition of Leopard
3. Compute vn+1 = PG µ ( f − u n − wn+1 ), 4. Compute u n+1 = f − vn+1 − wn+1 − PG λ ( f − vn+1 − wn+1 ), 5. If max{|u n+1 − u n |, |vn+1 − vn |, |wn+1 − wn |} 6 or if we performed Nstep iterations, then stop the algorithm, else jump to step 2. The results of this algorithm on our test images are shown in Figures 18– 20, respectively. We can see that textures are better denoised by this model. This is a consequence of a better noise modeling by distributions in the Besov space. But the residual texture is more important than the one given by our algorithm in the noise part. Another drawback appears in the structures part; the edges in the image are damaged because some important wavelet coefficients are removed. Previously, Gilles (2007b) provides the possibility to add the local adaptivity behavior of the BV –G–G model to the BV –G–E model. We refer the reader to Gilles (2007b) to see the BV –G–E local
118
J´erˆ ome Gilles
Structures
Textures
Noise
FIGURE 18
BV –G–E structures + textures + noise image decomposition of Barbara image.
adaptivity functional and find the corresponding results. This modified version shows less improvement compared to the original. We prefer to explore the replacement of wavelets by new geometric multiresolution tools such as contourlets. ∞
˙ −1,∞ Decomposition Model 4.3. The BV –G–Co As mentionned previously, the new directional multiresolution tools, such as curvelets or contourlets, exhibit very good results in denoising. They also better reconstruct the edge in an image. So, the idea to replace the use of wavelet by curvelets or contourlets naturally arises. In this paper, we focus on the choice of contourlets. This choice is equivalent to changing the Besov space in the model described in the previous subsection by the homogeneous ˙ ∞ contourlet space Co −1,∞ . Then, the equivalent functional is given in Eq. (81)
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
Structures
119
Textures
Noise
FIGURE 19
BV –G–E structures + textures + noise image decomposition of House image.
as below: Co Fλ,µ,δ (u, v, w) = J (u) + J ∗
w v ∗ + JCo + (2λ)−1 k f − u − v − wk2L 2 , (81) µ δ
∗ ( f ) is the indicator function over the set Co if we denote Co = where JCo δ 1 n o ∞ ∞ f ∈ Co−1,∞ /k f kCo−1,∞ 6 δ (norm over the contourlet spaces is defined in the Section 2.7.4) defined by
∗ JCo (f) =
0 if f ∈ Co1 +∞ else.
(82)
Then, the following proposition gives the solutions that minimize the previous functional.
120
J´erˆ ome Gilles
Structures
Textures
Noise
FIGURE 20 image.
BV –G–E structures + textures + noise image decomposition of Leopard
Proposition 4. Let u ∈ BV , v ∈ G µ , w ∈ Coδ be the structures, textures, and noise components derived from the image decomposition. Then the solution (u, ˆ v, ˆ w) ˆ =
arg
(u,v,w)∈BV ×G µ ×Coδ
Co inf Fλ,µ,δ (u, v, w)
(83)
is given by uˆ = f − vˆ − wˆ − PG λ ( f − vˆ − w) ˆ vˆ = PG µ f − uˆ − wˆ wˆ = f − uˆ − vˆ − C ST f − uˆ − v; ˆ 2δ , where PG λ is the Chambolle nonlinear projector and C ST ( f, 2δ) is the Contourlet Soft Thresholding operator of f − u − v.
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
121
Proof. The components u, ˆ vˆ are obtained by the same arguments used in the proof of Proposition 3 (this proof is available in Gilles, 2007b). The particular point concerns the expression of wˆ expressed with the soft thresholding Co (u, v, w) of the contourlet coefficients. Assume we want to minimize Fλ,µ,δ compared to w; it is equivalent to find w solution of (we set g = f − u − v) n o wˆ = arg min kg − wk2L 2 .
(84)
w∈Coδ
ˆ such that We can replace it by its dual formulation: wˆ = g − h, n o hˆ = arg min 2δkhkCo1 + kg − hk2L 2 . 1 h∈Co1,1
1,1
(85)
We can use the same approach used by Chambolle et al. (1998). Let (c j,k,n ) j∈Z,06k62(l j ) ,n∈Z2 and (d j,k,n ) j∈Z,06k62(l j ) ,n∈Z2 denote the coefficients issued from the contourlet expansions of g and h, respectively. As contourlets form a tight frame, with a bound of 1, we have (we denote = Z × [[0, 2(l j ) ]] × Z2 ) X kgk2L 2 = |c j,k,n |2 . (86) ( j,k,n)∈
Then Eq. (85) can be rewritten as X |c j,k,n − d j,k,n |2 + 2δ ( j,k,n)∈
X
|d j,k,n |,
(87)
( j,k,n)∈
which is equivalent to |c j,k,n − d j,k,n |2 + 2δ|d j,k,n |.
(88)
However, Chambolle et al. (1998) prove that the solution of this kind of problem is the soft thresholding of the coefficients (c j,k,n ) with 2δ as the threshold. Then hˆ = C ST (g, 2δ), which by duality implies that wˆ = g − C ST (g, 2δ). We conclude that wˆ = f − uˆ − vˆ − C ST ( f − uˆ − v, ˆ 2δ), which end the proof.
(89)
The corresponding numerical scheme is the same as in the BV –G–E algorithm, except we replace the wavelet expansion by the contourlet expansion in the soft thresholding:
122
J´erˆ ome Gilles
Structures
Textures
Noise
FIGURE 21 image.
1. 2. 3. 4. 5.
BV –G–Co structures + textures + noise image decomposition of Barbara
Initialization: u 0 = v0 = w0 = 0, Compute wn+1 = f − u n − vn − C ST ( f − u n − vn , 2δ), Compute vn+1 = PG µ ( f − u n − wn+1 ), Compute u n+1 = f − vn+1 − wn+1 − PG λ ( f − vn+1 − wn+1 ), If max{|u n+1 − u n |, |vn+1 − vn |, |wn+1 − wn |} 6 or if we performed Nstep iterations, then stop the algorithm; else jump to step 2.
Figures 21–23 show the results obtained by replacing wavelets by contourlets. The advantage of using geometric frames is that it preserves well the integrity of oriented textures as seen in the zoomed images in Figure 24. In this section, we presented many decomposition models. We can imagine the use of other frames and basis like curvelets, cosines, and so on. The idea of decomposing an image by thresholding different basis expansion coefficients corresponds to the recent theory of morphological component
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
Structures
123
Textures
Noise
FIGURE 22 image.
BV –G–Co structures + textures + noise image decomposition of House
analysis (MCA) (Bobin, Starck, & Moudden, 2007; Bobin, Starck, Fadili, & Donoho, 2007). This approach seeks sparse representation of the different components and is useful for sources separation.
5. PERFORMANCE EVALUATION The previous section described different decomposition models based on specific function spaces. But one question arises: Which is the best one? This section adresses this question by defining well-adapted criteria and their associated metrics. We build a special test image by creating different components separately and then by adding them. We will denote f 0 the test image composed of u 0 (the structures reference image) + v0 (the textures reference image) + w0 (the noise reference image). We finish by giving the measures obtained for this image.
124
J´erˆ ome Gilles
Structures
Textures
Noise
FIGURE 23 image.
BV –G–Co structures + textures + noise image decomposition of Leopard
5.1. Test Image Because we want to compare the quality of each extracted components, we will create specific components: u 0 for structures, v0 for textures, and w0 for noise. Textures are built by sine functions over some finite domains; structures are made by drawing some shapes with an adapted software like GIMP. The noise part is simply a gaussian noise with σ = 20. The u 0 and v0 reference parts and the recomposed test image are shown in Figure 25.
5.2. Evaluation Metrics Assume the test image is composed of known reference images u 0 , v0 , and w0 . We choose the following criteria to measure the decomposition quality: the L 2 -norms of errors u − u 0 and v − v0 , where u and v are the structures and textures components issued from the decomposition. Another quantity
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
Wavelet thresholding
125
Contourlet thresholding
FIGURE 24 Zoomed images for the textured components of wavelet- and contourlet-based algorithms.
FIGURE 25 Structures and textures reference images and the recomposed test image.
that is interesting to evaluate is the residual structures + textures present in the noise component w. To measure this quantity we prove the following proposition. Proposition 5. Let b(i, j) denote a gaussian noise of variance σ 2 and d(i, j) an image free of noise (we assume that the intercorrelation between b and d is negligible). Let f = Ad + b be a simulated noise + residue image, where A ∈ R corresponds to residue level. Then kγ f − γb k L 2 ≈ A2 ,
(90)
where γ f and γb are the autocorrelation functions of f and b, respectively. Proof. We start by calculating the autocorrelation function of f : X γ f (k, l) = f (i, j) f ∗ (i + k, j + l). (i, j)∈Z2
(91)
126
J´erˆ ome Gilles
FIGURE 26 Residual reference image.
However, we assume that images are real; then f (i, j) = f ∗ (i, j) and we deduce that γ f (k, l) =
X
[Ad(i, j) + b(i, j)] [Ad(i + k, j + l) + b(i + k, j + l)] (92)
(i, j)∈Z2
=
X
A2 d(i, j)d(i + k, j + l) +
(i, j)∈Z2
+
X
X
b(i, j)b(i + k, j + k)
(i, j)∈Z2
[Ad(i, j)b(i + k, j + l) + Ad(i + k, j + l)b(i, j)]
(93)
(i, j)∈Z2
= A2 γd (k, l) + γb (k, l) + A (γdb (k, l) + γbd (k, l)) .
(94)
Now we examine the norm k.k L 2 of this autocorrelation function. First, notice that γb (k, l) = σ 2 δ(k, l) (where δ(k, l) is the Kronecker symbol) because we assumed that the noise is gaussian. The statement of the proposition assumed that the intercorrelations are negligible; in pratice, it is easy to check that the quantity A (γdb (k, l) + γbd (k, l)) is negligible compared to A2 γd (k, l). We deduce that γ f (k, l) − γb (k, l) ≈ A2 γd (k, l);
(95)
then, by passing to the norm, we get kγ f − γb k L 2 ≈ A2 kγd k L 2 .
(96)
To illustrate this proposition, assume that we take the image in Figure 26 as d(i, j) and we generate an image b(i, j) full of gaussian noise (σ = 20). Then we compose the image f = Ad + b for the different values A ∈ {0.05; 0.1; 0.2; 0.3; 0.4; 0.5; 0.6; 0.7; 0.8; 0.9} (this means that more and more residue appears as A increases, see Figure 27 top row).
127
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
A = 0.05
A = 0.3
A = 0.8
FIGURE 27 Noisy reference images affected by different residual levels and their associated autocorrelation images.
|| γ f – γ b ||L2 849.093432
0.1
3312.071022
0.2
13099.095280
0.3
29367.800483
0.4
52118.223554
0.5
81350.371724
0.6
117064.247377
0.7
159259.851531
0.8
207937.184693
0.9
263096.247142
200000 160000
|| γ f – γ b ||L2
A 0.05
120000 80000 40000
0.2
0.4
0.6
0.8
A
FIGURE 28 Results of the measure norm kγ f − γb k L 2 for the different values of A (left) and its associated graph.
Figure 28 gives the measured values of kγ f − γb k L 2 and shows the associated graph. As announced by the proposition, we show the quadratic behavior of the norm of the autocorrelation differences as A grows. We will use this metric in the next subsection to evaluate the residual quantity in the noise parts at the output of the different decomposition algorithms.
128
J´erˆ ome Gilles
TABLE 1 Evaluation measures obtained for all u, v, w decomposition algorithms Algorithm
F JG
F AC2
F Co
ku˜ − u 0 k L 2 kv˜ − v0 k L 2 kγw − γw0 k L 2
792.8 1844.9 423.2
873.5 2832.4 423.5
984.6 1598.6 255.3
5.3. Image Decomposition Performance Evaluation In this subsection we apply three-part image decomposition on the test image built in Section 5.1 and use the metrics defined in Section 5.2 to evaluate their performances. In this chapter, we restrict the choice of the different parameters to only the ones that give the best visual performances, but in the future, a more global, in terms of parameters variability, test could be to explore the complete behaviors of the algorithms. The choosen parameters are • Algorithm F J G : λ = 10, µ1 = 1000, µ2 = 100, and a window size of 3 × 3 pixels, • Algorithm F AC2 : λ = 1, µ = 500 and δ = 9.4 (κ = 0.2 and σ = 20), • Algorithm F Co : λ = 1, µ = 500 and δ = 23.5 (κ = 0.5 and σ = 20). Figure 29 shows the outputs of the different algorithms while Table 1 gives the corresponding measures. We can see the BV –G–G-based algorithm F J G has the smallest error for the structures image but the textures are slightly less preserved than the contourlet-based model F Co . Its noisy part is of the same quality as the wavelet-based model F AC2 . Moreover, it is clear the F Co algorithm gives the best denoising performance and has the least residue; it also has the best score for the textures quality. Even if the visual quality seems to be close to the F J G algorithm, the contourlet-based model has the worst score on the structures component. Then globally, as expected, the model based on contourlet expansion gives the best decomposition.
6. CONCLUSION This chapter provides an overview of structures + textures image decomposition. We also present the extension to noisy images decomposition and show that it is necessary to adopt a three-part decomposition model (structures + textures + noise). The different models are based on the bounded-variation space to describe the structures component of an image. The textures are defined by the space G of oscillating functions proposed by Meyer; different stategies can be used for the noise. Some other function spaces can be chosen; most often it is equivalent to choosing the best basis
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
129
FIGURE 29 Outputs of the decomposition algorithms. First row: F J G algorithm; second row: F AC2 algorithm; last row: F Co algorithm.
or frame to represent the different components. This approach is the same philosophy as the principle of morphological component analysis recently introduced by the work of Bobin et al. (Bobin, Starck, and Moudden (2007); Bobin Starck, Fadili, et al. (2007)). An interesting property used in the BV –G–G model is the local adaptibility of the algorithm by choosing a nonconstant parameter ν. Some recent theoretical work on the Besov and Triebel–Lizorkin spaces seems to provide some insight on the local behavior of an image (in terms of local scales). Here this approach is used to improve the quality of the decomposition. The main problem of the decomposition models, and it remains an open question, is the choice of the different parameters. Aujol et al. (2006) propose a method of automatically selecting the parameter λ, but it is very expansive in computing time. We currently start some work to find some solutions.
130
J´erˆ ome Gilles
We have proposed a method, which consists of building specific test images and using three different metrics, to evaluate the performance of the quality of components issued from the different decomposition algorithms. The first tests seem to confirm that the model based on the thresholding of contourlets coefficients is the best one. However, more complete tests based on different test images with different kind of textures, noise, or structures and by tuning the different parameters are needed. This could help us to understand completely the behaviors of this kind of algorithm. The last topic explored in this study is the application of the image decomposition. A previous study (Gilles, 2007a) proves that the BV –G model enhanced the thin and long structures. Then, we use the textures component as the input of a road detection algorithm in aerial images. We believe that many applications could be created in the future.
APPENDIX A. CHAMBOLLE’S NONLINEAR PROJECTORS Chambolle (2004) proposes an algorithm based on a nonlinear projector to solve a certain category of total variation-based functional. This appendix summarize this work. Some proofs are provided because they are relevant to the rest of the chapter.
A.1. Notations and Definitions We assume the processed image is size M × N . We denote X = R M×N and Y = X × X. Definition 6. Let u ∈ X ; then the discret gradient of u, written ∇u ∈ Y = X × X , is defined by (∇u)i, j = (∇u)i,1 j , (∇u)i,2 j ,
(97)
with ∀i, j ∈ [[0, . . . , M − 1]] × [[0, . . . , N − 1]] (∇u)i,1 j = (∇u)i,2 j =
u i+1, j − u i, j if i < M − 1 0 if i = M − 1
(98)
u i, j+1 − u i, j if j < N − 1 0 if j = N − 1.
(99)
Definition 7. Let p ∈ Y ( p = ( p 1 , p 2 )), we define the numerical divergence operator div : Y → X such that div = −∇ ∗ (∇ ∗ is the adjoint operator of ∇)
131
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
by the following:
(div p)i, j
1 1 pi, j − pi−1, j if 0 < i < M − 1 1 if i = 0 = pi, j − p 1 if i = M − 1 i−1, j 2 2 pi, j − pi, j−1 if 0 < j < N − 1 if j = 0 + pi,2 j − p 2 if j = N − 1. i, j−1
(100)
We recall that h−div p, ui X = h p, ∇uiY .
A.2. Total Variation In the discrete case, the total variation can be written by: J (u) =
X
(∇u)i, j
(101)
0 0; p 0 = 0; n > 0, we get pi,n+1 j
=
pi,n j
g g n+1 n n − ∇ div( p ) − p . + τ ∇ div( p ) − λ i, j λ i, j i, j (135)
135
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
Finally, we get the following iterative formulation: pi,n+1 j
pi,n j + τ ∇ div( p n ) − λg i, j = . 1 + τ ∇ div( p n ) − λg i, j
(136)
Chambolle proves the following important theorem. Theorem 3. If τ < 81 , then λdiv( p n ) converges to PG λ (g) when n → +∞. In pratice, we note that the choice n = 20 is sufficient to reach the wanted convergence.
A.4. Extension The previous result can be extended to the case of BV −H functional where H is a Hilbert space such that there exists a linear positive symmetric operator K that defines the following norm over H: h f, giH = h f, K gi L 2 .
(137)
Then, if we want to minimize J (u) +
λ k f − uk2H , 2
(138)
we can use the following modified Chambolle projector: pi,n+1 j
pi,n j + τ ∇ K −1 div( p n ) − λg i, j = . 1 + τ ∇ K −1 div( p n ) − λg i, j
(139)
And the corresponding convergence theorem is shown below. 1 , then λ1 K −1 div( p n ) converges to vˆ when 8kK −1 k L 2 1 −1 div( p n ) → uˆ where uˆ is the minimizer of Eq. (138). λK
Theorem 4. If τ < and f −
n → +∞
A special case is for K = −1−1 , which corresponds to the Sobolev case H = H −1 .
REFERENCES Adams, R. (1975). Sobolev spaces. Academic Press. Aujol, J. (2004). Contribution a` l’analyse de textures en traitement d’images par m´ethodes variationnelles et e´ quations aux d´eriv´ees partielles. Doctoral thesis. University of NiceSophia Antipolis, France.
136
J´erˆ ome Gilles
Aujol, J., & Chambolle, A. (2005). Dual norms and image decomposition models. International Journal of Computer Vision, 63(1), 85–104. Aujol, J., & Gilboa, G. (2006). Constrained and SNR-based solutions for TV-Hilbert space image denoising. Journal of Mathematical Imaging and Vision, 26(1–2), 217–237. Aujol, J., Aubert, G., Blanc-F´eraud, L., & Chambolle, A. (2003). Decomposing an image. Application to textured images and SAR images. Technical Report. University of Nice-Sophia Antipolis. Aujol, J., Gilboa, G., Chan, T., & Osher, S. (2006). Structure-texture image decompositionmodeling algorithms and parameter selection. International Journal of Computer Vision, 67(1), 111–136. Bamberger, R., & Smith, M. (1992). A filter bank for the directional decomposition of images: Theory and design. IEEE Transactions on Signal Processing, 40(4), 882–893. Bobin, J., Starck, J.-L., & Moudden, Y. (2007). Sparsity and morphological diversity in blind source separation. IEEE Transactions on Image Processing, 16(11), 2662–2674. Bobin, J., Starck, J.-L., Fadili, J., & Donoho, D. (2007). Morphological component analysis: An adaptative thresholding strategy. IEEE Transactions on Image Processing, 16(11), 2675–2681. Burt, P., & Adelson, E. (1983). The Laplacian pyramid as a compact image code. IEEE Transactions on Communication, 31(4), 532–540. Cand`es, E. (1998). Ridgelets: theory and applications. Doctoral thesis. Department of Statistics, Stanford University. Cand`es, E., & Donoho, D. (1999). Curvelets: A surprisingly effective nonadaptive representation of objects with edges. Technical Report. Department of Statistics, Stanford University. Available at: http://www.curvelet.org/papers/Curve99.pdf. Cand`es, E., Demanet, L., Donoho, D., & Ying, L. (2005). Fast discrete curvelet transforms. Multiscale Modeling and Simulation, 5, 861–899. Chambolle, A. (2004). An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision, 20(1–2), 89–97. Chambolle, A., DeVore, R., Lee, N., & Lucier, B. (1998). Nonlinear wavelet image processing: Variational problems compression and noise removal through wavelet shrinkage. IEEE Transactions on Image Processing, 7, 319–335. Daubechies, U. (1992). Ten lectures on wavelets. Philadelphia: Society for Industrial and Applied Mathematics. Do, M. (2001). Directional multiresolution image representations. Doctoral thesis. Department of Communication Systems, Swiss Federal Institute of Technology, Lausanne. Do, M. (2003). Contourlets and sparse image representations. In SPIE conference on wavelet applications in signal and image processing X, San Diego, USA. Do, M., & Vetterli, M. (2001). Pyramidal directional filter banks and curvelets. In IEEE international conference on image processing (ICIP). Do, M. & Vetterli, M. (2002). Contourlets: A directional multiresolution image representation. In IEEE international conference on image processing (ICIP). Do, M., & Vetterli, M. (2003a). The contourlet transform: An efficient directional multiresolution image representation. IEEE Transactions on Image Processing, 14(12), 2091–2106. Do, M., & Vetterli, M. (2003b). Framing pyramids. IEEE Transactions on Signal Processing, 51, 2329–2342. Donoho, D., & Duncan, M. (1999). Digital curvelet transform: Strategy, implementation and experiments. Technical Report. Department of Statistics, Stanford University. Available at: http://www.curvelet.org/papers/DCvT99.pdf. Garnett, J.B., Jones, P.W., Triet, M.L., & Vese, L. (2005). Modeling oscillatory components with ˙ O −α and W˙ −α, p . Technical Report. UCLA CAM Report 07-21. the homogeneous spaces B M Available at: ftp://ftp.math.ucla.edu/pub/camreport/cam07-21.pdf. Gilboa, G., Zeevi, Y., & Sochen, N. (2003). Texture preserving variational denoising using an adaptive fidelity term. In Proceedings of VLSM (pp. 137–144). Available at: http://www.math.ucla.edu/˜gilboa/pub/vlsm03.pdf. Gilles, J. (2007a). Noisy image decomposition: A new structure texture and noise model based on local adaptivity. Journal of Mathematical Imaging and Vision, 28(3), 285–295.
Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation
137
Gilles, J. (2007b). Choix d’un espace de repr´esentation image adpat´e a` la d´etection de r´eseaux routiers. In Traitement et Analyse de l’Information: M´ethode et Application (TAIMA) Workshop. Haddad, A. (2005). M´ethodes variationnelles en traitement d’image. Doctoral thesis. Ecole Normale Sup´erieure de Cachan, France. H¨ardle, W., Kerkyacharian, G., Picard, D., & Tsybakov, A. (1997). Wavelets, approximation and statistical applications. In Proceeding of Paris–Berlin seminar. http://www.quantlet.com/mdstat/scripts/wav/html/index.html. Mallat, S. (1999). A wavelet tour of signal processing (2nd ed.). London: Academic Press. Meyer, Y. (1993). Wavelets: Algorithms and applications. Philadelphia: Society for Industrial and Applied Mathematics. Meyer, Y. (2001). Oscillating patterns in image processing and in some nonlinear evolution equations. In The fifteenth Dean Jacquelines B. Lewis memorial lectures. Providence, RI: American Mathematical Society. Osher, S., Sole, A., & Vese, L. (2002). Image decomposition and restoration using total variation minimization and the H −1 norm. Multiscale Modeling and Simulation, 1(3), 349–370. Po, M., & Do, M. (2006). Directional multiscale modeling of images using the contourlet transform. IEEE Transactions on Image Processing, 15(6), 1610–1620. Rudin, L., Osher, S., & Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D, 60, 259–268. Triebel, H. (1992). Theory of function spaces II. Basel: Birkh¨auser. Triet, M. L., & Vese, L. (2005). Image decomposition using total variation and div(B M O). Multiscale Modeling and Simulation, 4(2), 390–423. Vese, L. (1996). Variational methods and partial differential equations for image analysis and curve evolution. Doctoral thesis. University of Nice-Sophia Antipolis, France. Vese, L., & Osher, S. (2002). Modeling textures with total variation minimization and oscillating patterns in image processing. Journal of Scientific Computing, 19(1–3), 553–572. Vidakovic, B., & Mueller, P. (1991). Wavelets for kids. http://www.isye.gatech.edu/brani/wp/kidsA.ps, cours d’introduction aux ondelettes.
Chapter
4 The Reverse Fuzzy Distance Transform and its Use when Studying the Shape of Macromolecules from Cryo-Electron Tomographic Data Stina Svensson
Contents
1. Introduction 2. Preliminaries 2.1. The Reverse Fuzzy Distance Transform 2.2. The Centers of Maximal Fuzzy Balls 3. Segmentation Using Region Growing by Means of the Reverse Fuzzy Distance Transform 4. Cryo-Electron Tomography for Imaging of Individual Macromolecules 4.1. Methods for Analyzing Cryo-Electron Tomography Data 4.2. Specific Imaging Settings 4.3. Phantoms Constructed From Structures Deposited in the Protein Data Bank 4.4. Simulated Data 5. From Electron Tomographic Structure to a Fuzzy Objects Representation 6. Identifying the Subunits of a Macromolecule 7. Identifying the Core of an Elongated Macromolecule 8. Conclusions Acknowledgments References
140 142 144 147 151 153 154 154 155 157 160 161 165 167 168 168
Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden Advances in Imaging and Electron Physics, Volume 158, ISSN 1076-5670, DOI: 10.1016/S1076-5670(09)00009-3. c 2009 Elsevier Inc. All rights reserved. Copyright
139
140
Stina Svensson
1. INTRODUCTION Information is used for many applications in research or industry, collected in the form of digital images. The interpretation of these images can be done by manual visual inspection or in a more (semi-)automatic way by using various computerized methods. The latter are preferable because they potentially increase both the speed and the objectivity of the analysis. Typically scientists are interested in identifying the structures in the following denoted objects, in the digital image, and in drawing some conclusions regarding, for example, their shape. In some cases, the image acquisition technique forces researchers, base the methods on information in images with low signal to noise ratio (SNR). Moreover, in some cases the objects of interest are represented by only a small number of image points. In the first case, it is difficult to locate the border of the object with high accuracy; that is, to decide whether a certain image point belongs to the object or not. In the latter case, the shape of the object is difficult to analyze as a relatively small change in the positioning of the border may result in a relatively large change of a measured shape feature. In such cases, we can gain robustness of the methods by using fuzzy set methods. Partial means to a set was introduced by Zadeh (1965) and, more of an image analysis context, by Rosenfeld (1979, 1984). The idea is to avoid binary, or crisp, segmentation into object and background and instead assign each image point membership values related to the degree of “belongingess” to the structure of interest. The segmented fuzzy object then is used in the subsequent analysis. Such analysis is less dependent on small changes in the border, something that may be imposed by a crisp segmentation, as well as more scale invariant. (See, for example, Udupa and Samarasekera, 1996, and followers.) The approach is well established in the field of medical imaging, (e.g., through the review by Udupa and Saha, 2003). Recently it has also gained interest in the field of electron tomography (ET) (Bongini ˜ Wong-Barnum, Volkmann, & Ellisman, 2008). It can et al., 2007; Garduno, be used not only for identification and representation of the objects, but also for extracting more stable measurements; see, for example Bloch (2005); ¨ Bogomolny (1987); Chanussot, Nystrom, and Sladoje (2005); Sladoje and Lindblad (2007). When analyzing the shape of an object (crisp or fuzzy), the distance between image points often provides useful information. For example, we might be interested, in the thickness of an elongated structure. For discrete images, the computation of distances between objects of image points has been a crucial issue to obtain a computationally convenient and mathematically correct computation. Rosenfeld and Pfaltz (1966) presented the first ideas and were followed by, among others, Borgefors (1986) for improved approximation of the Euclidean distance. In a distance transform (DT), each point in the object is assigned a value corresponding to the
The Reverse Fuzzy Distance Transform
141
distance to its closest point in the background. The DT can then be used for subsequent shape analysis. The grey-weighted distance, for which the grey-level of a point is added to the spatial distance to the background, was introduced by Rutovitz (1968). This introduces the question of how to weight the different dimensions compared to each other. Shortly thereafter, Levi and Montanari (1970) introduced another grey-weighted distance, for which the mean of the grey-levels of two neighboring points is multiplied by the spatial distance between them. The latter was put into a theoretical framework and denoted the fuzzy distance transform (FDT) by Saha, Wehrli, and Gomberg (2002); This term will be used in this chapter. Fuzzy is used to stress the fact that distances are computed onto a fuzzy object. The greyweighted distance in Rutovitz (1968) is similar to what is denoted topographic ´ (2001). The distance by Philipp-Foliguet, Vieira, and De Albuquerque Araujo notion of topographic distance was introduced by Meyer (1994) but with a grey-weighting based on the local gradient between two neighboring points. Put into a more general framework, the FDT is equal to the geodesic time with the fuzzy object as a geodesic mask (Soille, 1994). A DT can be used not only to directly extract shape information from the (crisp) object it represents, but also to extract shape descriptors of the object. The shape descriptors can, in turn, be used to facilitate shape measures or to achieve a compact representation of the object for efficient storage of the information in the image. One such representation is the medial axis introduced by Blum (1967), where the object is represented by a curve is centrally located in the object. Blum (1967) described this shape descriptor for two-dimensional (2D) objects, but it can be used also for 3D objects, being a medial axis or a medial surface depending on the shape of the object. A medial representation, such as the medial axis, is well suited to represent an elongated object and can be used, for example, to facilitate measurement of its length. By assigning the distance values in the DT to the points in the medial representation, it also provides information about the thickness of the object. An object can be seen as the union of a number of balls, defined by their center point’s respective radius. In fact, the distance values in a DT are the radii of balls (for the corresponding distance function), which are subsets of the object. Many of these balls are redundant for the shape as they are subsets of other balls. A maximal ball is defined as a ball that is not a subset of any other single ball. The center points of such balls, the set of centers of maximal balls (CMBs), gives a medial representation of the object as they constitute a set centrally located in the object (Arcelli & Sanniti di Baja, 1988). By adding a ball to each point in the set, where the ball’s radius is equal to the distance value for the point, the object can be recovered. The concept of CMBs can be generalized to a fuzzy setting and then deal with fuzzy objects, yielding the centers of maximal fuzzy balls (CMFBs). This generalization was described by Svensson (2007a, 2008). Fuzzy is used to indicate that the
142
Stina Svensson
balls are extracted from the FDT. Various other aspects of balls in nonconvex domains have been previously described earlier (e.g., in Bloch, 2000; Sanniti di Baja and Svensson, 2002). As mentioned, a crisp object can be recovered from its CMBs. This recovery process can be efficiently implemented using the reverse distance ¨ transformation (see, for example, the work by Nystrom and Borgefors, 1995). This concept also can be generalized to a fuzzy setting, resulting in the reverse fuzzy distance transform (RFDT) (Svensson, 2007b). It is of interest not only for the recovery of the (support) of a fuzzy object from its CMFBs, but also because it can be used as a region-growing process for the segmentation of subunits of a fuzzy object, following the approach described for crisp objects in Svensson and Sanniti di Baja (2002). We have previously described the RFDT for 2D and 3D images, and the CMFBs for 2D images (Svensson, 2007a,b, 2008). Here, we recall this theoretical framework and apply it to shape analysis for a specific application. In fact, through collaborations with the Electron Tomography group at the Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden, where cryo-electron tomography (ET) is used for structural biology research, many shape-related issues have arisen in which the mentioned methods can be applied, resulting in a more robust analysis of the 3D reconstructions. As the application in mind regards 3D images, the theory is described for the 3D case even though it can be applied equally well to 2D images. Two-dimensional images are used as illustrations for easier understanding. In Section 2, we provide some preliminaries, with special focus on the RFDT (Section 2.1) and CMFB (Section 2.2). In Section 3, we show how the RFDT can be used as a region-growing algorithm. We continue with a description of the application in mind (Section 4) and give some ideas on how a 3D reconstruction can be converted to a fuzzy object (Section 5). The latter is not the focus for this manuscript but still is essential. In Section 6, we show how the region growing by RFDT can be used to identify the subunits of a macromolecular structure. Section 7 describes how CMFBs can be used to find the core of an elongated macromolecular structure.
2. PRELIMINARIES We recall the definition in Zadeh (1965): Let X be the reference set, then a fuzzy subset A of X is defined as a set of ordered pairs A = {(x, µA (x)) | x ∈ X }, where µA : X → [0, 1] is the membership function of A in X . A 3D fuzzy digital subset defined on Z3 ; that is, object O is a fuzzy 3 3 O = (v, µO (v)) | v ∈ Z , where µO : Z → [0, 1]. A voxel v belongs to the support of O if µO ( p) > 0. The fuzzy distance between two points v and u in a fuzzy object O is defined as the length of the shortest path between v and u. In Levi and
143
The Reverse Fuzzy Distance Transform
√ √ h1, 2, 3i
Montanari (1970) and Saha et al. (2002), the fuzzy distance dO Z3 × Z3 → R between v and u in O is set to √ √ h1, 2, 3i
dO
=
min
hv=v1 ,...,vm =ui
m−1 X i=1
1 (µO (vi ) + µO (vi+1 )) · w(vi+1 − vi ), 2
:
(1)
where w(vi+1 − vi ) is the spatial Euclidean√distance between vi and vi+1 ; that √ is, 1 if vi and vi+1 are face neighbors, 2 if they are edge neighbors, and 3 if they are vertex neighbors. We remark that, for two voxels v1 = (x1 , y1 , z 1 ), v2 = (x2 , y2 , z 2 ) ∈ Z3 for which max{|x1 − x2 |, |y1 − y2 |, |z 1 − z 2 |} = 1, we use the notion face neighbor if |x1 − x2 | + |y1 − y2 | + |z 1 − z 2 | = 1, edge neighbor if |x1 − x2 | + |y1 − y2 | + |z 1 − z 2 | = 2, and vertex neighbor if |x1 − x2 | + |y1 − y2 | + |z 1 − z 2 | = 3. The FDT can be computed using a sequential algorithm similar to the one used for computing the DT of a 3D binary image (Borgefors, 1996). However, it is computationally inefficient because repeated sets of forward and backward scans over the image are necessary, as opposite to binary images, where the DT of an nD image can be calculated in two scans. The number of iteration is different depending on the problem domain and may be large. Therefore, in Saha et al. (2002), an approach using sorted priority queues is proposed. As there is no difference in the results, we use a sequential algorithm for simplicity, even if it is more time consuming. The Euclidean distance is approximated using weights for the local distance between two neighboring voxels. If taking into account a 3 × 3 × 3 neighborhood, the h3, 4, 5i distance is preferably used, where 3 is the weight to a face neighbor, 4 the weight to an edge neighbor, and 5 the weight to a vertex neighbor (Borgefors, 1996). To compute the FDT for a fuzzy object O, initial values, D0F , of the points in the support of O are set to infinity and of the remaining points to 0. Increasing fuzzy distances are then locally propagated over O in forward and backward scans. For the h3, 4, 5i FDT, using Eq. (1) is straightforward. In scan i, i ≥ 1, a voxel v ∈ O is assigned the temporary distance value DiF (O, h3, 4, 5i, v): DiF (O, h3, 4, 5i, v) 1 F (O, h3, 4, 5i, v + u) + (µO (v) + µO (v + u)) · w(u) , = min Di−1 u∈scani 2 (2) where w(u) refers to the spatial distance weight used for the neighbor u, given in local coordinates around v, and scani to the set of neighbors used in a forward scan (i odd) and in a backward scan (i even). The weights are shown in Figure 1. The thick lines separate the neighborhood into the part
144
Stina Svensson z=1 z=0 z = –1
5
4
5
4
3
4
5
4
5
4
3
4
3
0
3
4
3
4
5
4
5
4
3
4
5
4
5
FIGURE 1 Weights used for computation of the FDT taking into account a 3 × 3 × 3 neighborhood. The thick lines indicate which part is used in the forward (z = −1 and upper part of z = 0) and backward scan (lower part of z = 0 and z = 1).
used in the forward (front and upper) and in the backward scan (lower and back). The central voxel is included in both scans. The process is repeated until, in scan l + 1, no further updating of voxel values is performed. The result is an image (FDT) in which v is assigned the value D F (O, h3, 4, 5i, v) = DlF (O, h3, 4, 5i, v), that is, its fuzzy h3, 4, 5i distance from the complement of O. Note that D F is throughout the paper used to denote the value of a voxel in an FDT and dO to denote the underlying distance function used to calculate an FDT. The methods described in this chapter are applied to a fuzzy object O. Hence, to have a complete process starting directly from the provided data, algorithms are needed to extract a fuzzy object representation from the structure of interest. This can be done in various ways. How this is done in our case is described in Section 5. There we use the membership function often referred to as fuzzy connectedness, cA (Rosenfeld, 1979). For cA , the strength of membership of a contiguous path πuv between two voxels u and v in a fuzzy subset A is defined as the smallest membership value along the path and the degree of connectedness cA (u, v) as the strongest path between u and v that is, cA (u, v) = max
p∈πuv
min µA (e) ,
e∈E ( p)
(3)
where πuv is the set of all paths between u and v and E( p) is the set of all voxels along the path p. This process can be implemented using a raster scan technique similar to what is described for the FDT above (Smedby, Svensson, ¨ & Lofstrand, 1999) or using a dynamic programming approach (Udupa & Samarasekera, 1996).
2.1. The Reverse Fuzzy Distance Transform A (crisp) object O ∈ Zn can be seen as the union of a set of balls Bi (d, ci , ri ) = {x ∈ Zn | d(ci , x) ≤ ri } for some distance function d : Zn ×Zn → R with center
The Reverse Fuzzy Distance Transform
145
points ci ∈ Zn and radii ri ∈ R, i, . . . , m. Hence, O can be represented by a set S (O, d) = {(ci , ri ) | i = 1, . . . , m} as O is recovered from S by placing a ball of radius ri at each ci . This recovery process is efficiently implemented using the reverse distance transform (RDT). The RDT is computed by propagating local distance information in two scans over the image starting from a set of points each assigned with a distance value, for example, S. It results in a grey-level image in which each point assigned a distance value belongs to at ¨ and Borgefors least one of the balls. For details, see, for example, Nystrom (1995). The RDT can be generalized to a fuzzy setting, resulting in the RFDT. To compute the RFDT starting from a set A = {(ci , ri ) | i = 1, . . . , m}, where ci ∈ O and ri ∈ Z, we assign initial values, R D0F , to all image points. Points in A are assigned their respective ri , while the remaining points are set to 0. Decreasing fuzzy distances are then propagated on O. For the h3, 4, 5i RFDT, and scan i, i ≥ 1, a voxel v ∈ O is assigned the temporary distance value R DiF (O, h3, 4, 5i, v): R DiF (O, h3, 4, 5i, v) 1 F = max R Di−1 (O, h3, 4, 5i, v + u) − (µO (v) + µO (v + u)) · w(q) , u∈scani 2 (4) where w(u) refers to the spatial distance weight (see Figure 1) used for the neighbor u, given in local coordinates around v, and scani to the neighbors used if i is odd and if i is even. The process is repeated until, in scan l + 1, no further updating of voxel values is performed. The result is an image (RFDT) in which each voxel v is assigned the value R D F (O, h3, 4, 5i, v) = R DlF (O, h3, 4, 5i, v); that is, its reverse fuzzy h3, 4, 5i distance to its closest voxel in A. To illustrate the effect of the RFDT and also to compare it with the RDT, we have constructed a set of 2D fuzzy objects Oi , i = 1, . . . , 4 (Figure 2, left column, rows one to four). Each Oi corresponds to the union of two balls (Oi0 and Oi00 ), where the radius of the support of Oi0 (left) is larger than the radius of the support of Oi00 (right). The border of each ball has linearly decreasing membership values. Two different slopes have been used, giving different degrees of fuzziness. Oi , i = 1, . . . , 4, have been constructed in such a way that O10 and O100 both have weak slopes; O20 has weak slope and O200 has strong slope; O30 has strong slope and O300 has weak slope; and O40 and O400 both have strong slopes. Three crisp objects, O5 , O6 , and O7 , created by taking µO1 > 0, µO2 > 0.6, and µO3 > 0.6, respectively, are also shown in Figure 2 (left column, rows four to seven). In Figure 2 (middle), the h3, 4i FDT and h3, 4i DT for Oi for i = 1, . . . , 4, and i = 5, . . . , 7, respectively, are shown. (We remark that h3, 4i FDT and h3, 4i DT correspond, for 3D images, to the,
146
Stina Svensson
FIGURE 2 Left: Fuzzy objects Oi , i = 1, . . . , 4 and crisp objects Oi , i = 1, 2, 3. Middle: The h3, 4i FDT for Oi , i = 1, . . . , 4 and the h3, 4i DT for Oi , i = 1, 2, 3. Right: The pixels, shown in grey and white, reached by the RFDT when applied to Ai for Oi , i = 1, . . . , 4 and the RDT when applied to Ai for Oi , i = 1, 2, 3. The set Ai , i = 1, . . . , 7 are shown overlayed in blue. The support of O1 is outlined in green. (See Color Insert.)
above-described h3, 4,5i FDT and h3, 4, 5i DT.) For each Oi , i = 1, . . . , 7, we have a set Ai Oi , dOi = ci0 , ri0 , ci00 , ri00 , where ri0 and ri00 are set equal to the shortest fuzzy h3, 4i distance from ci0 and ci00 to the complement of the support of Oi , for i = 1, . . . , 4, respectively, and to the shortest h3, 4i distance from ci0 and ci00 to the complement of Oi , for i = 5, . . . , 7, respectively. The sets Ai , i = 1, . . . , 7 are shown in blue in Figure 2 (right column). In this case, Ai does not represent Oi , i = 1, . . . , 7, completely. The pixels reached when applying the RFDT for i = 1, . . . , 4, and the RDT for i = 5, . . . , 7,
The Reverse Fuzzy Distance Transform
147
respectively, to Ai , i = 1, . . . , 7, are shown in grey, corresponding to pixels closest to ci0 , and white, corresponding to pixels closest to ci0 , in Figure 2 (right column). Due to the choice of ri0 and ri00 , pixels on the border of the support of Oi are reached, but no pixels outside. For Oi , i = 1, . . . , 4, we see the effect the fuzziness of the border of Oi0 and Oi00 has. When different slopes have been used for the two balls, that is, for O2 in row two and O3 in row three, the border between the reached regions is slightly shifted toward the left (O2 ) or to the right (O3 ). When the same slope has been used, that is, for O1 in row one and O4 in row four, the border between the reached regions is located in the same position, but the size of the regions differs. In the case of a weak slope, that is, for O1 , a smaller region is reached than for the strong slope, that is, for O4 . To a certain extent we can see the same effect for the crisp objects Oi , i = 5, . . . , 7, but there it is dependent on the threshold of the membership function used to create the objects. By using a fuzzy setting, we avoid this dependency and can directly analyze the (fuzzy) shape of the structure of interest. In Figure 2, the support of O1 , which is equal to the support of Oi , i = 2, . . . , 5, is outlined in green for easier comparison. For implementational aspects of the RFDT, we refer to Svensson (2007b). Note that for the computation of the local fuzzy distance between two voxels v and u required in the algorithm, µO (v) and µO (u) need to be known. Hence, the RFDT is actually computed on O. Saha et al. (2002) suggested to use the Euclidean distance as the spatial local distance between voxels when computing the FDT [Eq. (1)]. We use Eq. (1) for the computation of the FDT and the RFDT but with the h3, 4, 5i distance, instead of the Euclidean distance following the concept of weighted DTs for binary images (Borgefors, 1996). By using the h3, 4, 5i distance, we can work with integer numbers and still achieve a good approximation of the Euclidean distance. The RDT, as well as RFDT, can be used as a region-growing process. We start from a set of seeds, each of which has a distance value as well as a label. The label is propagated together with decreasing distance information. The result is that points in the object are labeled with the label of the closest seed point. In fact, this type of label propagation was used for the examples shown in Figure 2 and is used in Section 3.
2.2. The Centers of Maximal Fuzzy Balls Given a (crisp) object O ∈ Zn , a distance function d : Zn × Zn → R, and the corresponding DT computed on O, the distance value of a point c ∈ O in the DT can be interpreted as the radius r of a ball B (d, c, r ) = {x ∈ Zn | d(c, x) ≤ r } such that B ⊆ O and B (d, c, r + ) 6⊆ O, > 0. Let DTB = {Bi (d, ci , ri ) | i = 1, . . . m}, where ci are all the points O, ri are the respective Sm values in the DT, and m is the number of points in O. Hence, O = i=1 Bi , where Bi ∈ DTB.
148
Stina Svensson
B M is denoted a maximal ball and cM a center of a maximal ball, if for all Sk Bi ∈ DTB, Bi 6⊃ B M , i = 1, . . . , k. Thus, i=1 BiM is equal to O. This means that O can be represented by its set of centers of maximal balls, denoted CMB (O, D), and O can be recovered from CMB (O, D) using the RDT. CMB (O, D) can be identified in one scan over the DT by value comparison. This is due to the fact that CMB (O, D) consists of the points in the DT that do not propagate distance information to neighboring points; that is, CMB (O, D) consists of “local maxima.” Considering O ∈ Z3 and the h3, 4, 5i distance, a voxel v belongs to CMB (O, h3, 4, 5i) if, for all element n i , i = 1, . . . , 26, in the neighborhood, given in local coordinates around v, with their respective weights w(n i ) ∈ {3, 4, 5}, if D(O, h3, 4, 5i, v + n i ) < D(O, h3, 4, 5i, v) + w(n i ),
(5)
where D(O, h3, 4, 5i, v) is the distance value of voxel v found in the h3, 4, 5i DT of O. Special treatment is necessary for voxels with small values to not detect false CMBs. For the h3, 4, 5i distance, voxels with value 3 are considered as having value 1 while performing the comparison (Arcelli & Sanniti di Baja, 1988). The reason is that not all distance values can occur in the h3, 4, 5i DT, where no voxels can have values 1 or 2. The identification process of CMB (O, h3, 4, 5i) is valid for other DTs, with suitable adjustment of the condition for voxels with small distance values. CMB (O, D) is not only a compact representation of O, but also often used as nonremovable points in skeletonization (a commonly used medial representation) to guarantee that the object can be recovered from its CMB (O, D) (Sanniti di Baja, 1994). Once the RFDT has been introduced, we can use it to define what we will denote as the CMFB for a fuzzy object O. It is of interest to consider the concept of CMFBs for O since they can be used, for example, as fuzzy object-based skeletonization, representing the most internal structure of O. A (fuzzy) object O ∈ Zn can to some extent be treated as a set of balls BiF (dO , ci , ri ) = {x ∈ Zn | dO (ci , x) ≤ ri } for some distance function dO : Zn × Zn → R, with S center points ci ∈ Zn and radii ri ∈ R, i, . . . , n, n where the union of the set, i=1 BiF , is equal to the support of O. In the fuzzy case, we cannot, as in the crisp case, say that O can be represented by a set S (O, dO ) = {(ci , ri ) | i = 1, . . . , m} as BiF is not only dependent on ci and ri but also on O itself. Despite this, the concept is still of importance. The support of O can be obtained from S by means of the corresponding RFDT. We extend the concept of CMBs (the crisp case) to a fuzzy framework by introducing the CMFBs, denoted CMFB O, D F , where dO : Zn × Zn → R is the underlying fuzzy distance function and D F are the values in the corresponding FDT. Analogous to the crisp case, we define CMFB O, D F to be the points that do not propagate distance information to neighboring points while calculating the FDT. CMFB O, D F can be detected in one scan over the FDT of O by value comparison, taking into account also the
The Reverse Fuzzy Distance Transform
149
membership values in O. To be more precise, considering O ∈ Z3 and the fuzzy h3, 4, 5i distance, a voxel v belongs to CMFB (O, h3, 4, 5i) if, for all element n i , i = 1, . . . , 26 in the neighborhood, with their respective weights w(n i ) ∈ {3, 4, 5}, D F (O, h3, 4, 5i, v + n i ) < D F (O, h3, 4, 5i, v) 1 + (µO (v) + µO (v + n i )) · w(n i ), 2
(6)
where D F (O, h3, 4, 5i, v) is the distance value at v in the h3, 4, 5i FDT. The identification of CMFB O, D F does not require any special treatment of small fuzzy distance values. This is a difference from the crisp case described above. (Note that this differs from what was earlier stated in Svensson, 2008, a discovery made after the publication was in press). F Following the definition, given the B (dO , c, r ) for which (c, r ) represents F a voxel v ∈ CMFB O, D , there exists no other voxel u ∈ CMFB O, D F , with c0 , r 0 such that the support of B F (dO , c, r ) is a subset of the support of B F dO , c0 , r 0 . However, the support of the support of B F (dO , c, r ) can be a subset of the union of support of a set of B F (d O , ci , ri ) , i = 1, . . . , n (corresponding to voxels v1 , . . . , vn in CMFB O, D F ). Hence, it is possible to find a smaller set that actually recovers the support of O. (We will see in the following text that a reduction of CMFB O, D F is actually of interest.) However, such a reduction process is not a trivial task. The most brute-force way is to calculate the RFDT from CMFB O, D F on a leave one out basis and compare its support with the support of O, meaning that the RFDT must be computed a number of times corresponding to the number of voxels in ¨ (1997), a more efficient reduction CMFB O, D F . In Borgefors and Nystrom process was described for the crisp case. The examples shown here use the brute-force way, as generalization of, that is, the algorithm in Borgefors and ¨ (1997) to a fuzzy setting Nystrom is currently not available. We will denote the reduced set CMFBR O, D F . To show the advantages of a fuzzy approach compared to a crisp, we use the example in Figure 3. The fuzzy object O8 (Figure 3, left, top row) is composed of six balls of different fuzziness, but with supports having the same radius. The CMFB (O8 , h3, 4i) is shown in blue. Two crisp objects, O9 and O10 , created by taking µO8 > 0 and µO8 > 0.4, respectively, are shown in Figure 3 (left, middle and bottom row), with CMB (O9 , h3, 4i) and CMB (O10 , h3, 4i) in blue. The support of O8 is outlined in green for comparison. As can be seen, CMFB (O8 , h3, 4i) provides a robust representation, still reflecting the fuzziness of the border of the balls as the vertical part centrally located in each of the balls becomes more evident as the fuzziness decreases. CMB (O9 , h3, 4i) and CMB (O10 , h3, 4i) presents a different situation: O9 consists of balls having the same radius and, thus, CMB (O9 , h3, 4i) has a similar constitution throughout the object.
150
Stina Svensson
FIGURE 3 Top row: A fuzzy object O8 with CMFB (O, h3, 4i) shown in blue (left) and with CMFBR (O, h3, 4i) (right). Middle row: A crisp object O9 with CMB (O, h3, 4i) (left) and with CMBR (O, h3, 4i) (right). Bottom row: A crisp object O10 with CMB (O, h3, 4i) (left) and with CMBR (O, h3, 4i) (right). The support of O8 outlined in green. (See Color Insert.)
O10 consists of balls with increasing radii, and, thus, CMB (O10 , h3, 4i) gives a representation that varies more. Again, the vertical part becomes more evident as the radius increases. However, as for Figure 2, this is dependent on the threshold of the membership function used to create O10 . For a comparison, Figure 3 also shows CMFBR (O8 , h3, 4i), CMBR (O9 , h3, 4i), and CMBR (O10 , h3, 4i) (right column). The reduction process removes the spurious pixels in the far right ball for CMFB (O8 , h3, 4i). In this case, the reduction is not that evident. However, as seen in another case, especially for real data, it is crucial to remove unnecessary points to achieve a representation more suitable for subsequent analysis. In Figure 4a and d, two fuzzy objects O11 and O12 are shown with CMFBR (O11 , h3, 4i) CMFBR (O12 , h3, 4i) in blue and the support of O11 and O12 outlined in green, respectively. O11 shows that, the internal grey-level structure is enhanced by the described fuzzy approach. The support of O11 is a ball. Internally O11 has an ellipsoidal region of higher membership values. CMFBR (O11 , h3, 4i) reflects both of these aspects. This is a property that cannot be achieved by using a crisp approach. For a comparison, we show two crisp objects O13 and O14 (Figure 4b and c, respectively) with CMBR (O13 , h3, 4i) and CMBR (O14 , h3, 4i) in blue and the support of O11 outlined in green. O13 and O14 have been created by taking µO11 > 0 and µO11 > 0.5, respectively. CMBR (O13 , h3, 4i) gives a suitable representation of the support of O11 , while CMBR (O14 , h3, 4i) emphasizes one aspect of the internal structure of O11 . Hence, we need to choose which information is the most important. In a fuzzy setting, we instead can consider both aspects. O12 illustrates another important aspect. It consists of a set of equally sized ellipsoids, all with a fuzzy border, placed at increasing distance from each other. Because of this the border of support of O12 has concavities of increasing size. Especially for a crisp approach, jaggedness
The Reverse Fuzzy Distance Transform
(a)
(b)
151
(c)
(d) FIGURE 4 Two fuzzy objects (a) O11 and (d) O12 with CMFBR (Oi , h3, 4i) in blue and the support of Oi outlined in green (i = 4, 5). Two crisp objects (c) O13 and (d) O14 with CMBR (Oi , h3, 4i) in blue and the support of Oi outlined in green (i = 6, 7). O13 and O14 are constructed by thresholding of µO11 . (See Color Insert.)
of a border will result in many spurious CMBs located around the main axis. CMFBR (O11 , h3, 4i) shows the effect only when the concavities appears more evident.
3. SEGMENTATION USING REGION GROWING BY MEANS OF THE REVERSE FUZZY DISTANCE TRANSFORM Image segmentation is the process used to define the relevant structures— the objects—in an image by separating them from each other and from the nonrelevant parts, the background. It is a crucial step in the analysis of a digital image and is often, despite years of research, the most difficult part. Many segmentation algorithms are based on the concept of region growing. Seeds, which are a set of points carrying information about both position and identity label, are placed inside the potential objects, either manually or by some automatic process. The seed of a certain label is then allowed to propagate its label onto neighboring points, if similar enough according to some cost function, in an iterative maner. This process is done in parallel for all seeds. The label propagation from a seed terminates when a propagation front originating from a seed with a different label is reached or when there are no more neighboring points that are similar enough. The cost function can be, for example, based on grey-level homogeneity in the region corresponding to the seed or gradient magnitude information extracted from the image. Methods that originates from this basic idea are, for example, level set based segmentation (Sethian, 1999) and watershed segmentation (WS) (Beucher & Lantuejoul, 1979). We here propose to use region growing by means of the RFDT. In this section, we assume that the
152
Stina Svensson
seeds are given. In Section 6, we show how seed detection can be used for a specific application. Given a fuzzy object O and with the aim to segment it, or rather decompose it, into relevant subunits with prior knowledge about seeds, we suggest emphasizing the shape further than other methods by using the RFDT as a region-growing process. Initially, we have O, its FDT calculated by dO , and a set of labeled seeds L0 O, D F = {(C0i , R0i , li ) | i = 1, . . . , m}, where C0i ⊂ O, and li and R0i are their labels and distance values in the FDT, respectively. L0 can be, for example, the local grey-level maxima in the FDT. The RFDT of L0 is computed on O, where labels are propagated together with decreasing fuzzy distance information. By this, a subset of the points in O is assigned a label. The process is repeated using L j O, D F = {(C ji , R ji , li ) | i = 1, . . . , m} as input, where C ji ⊂ O are the points labeled li by the previous steps 0, . . . , j − 1, and R ji are their corresponding distance values in the FDT. After a number of iterations, dependent on O and L0 , all points in O are assigned a label. We note that this process allows us to incorporate information about O and at the same time use fuzzy distance information (i.e., shape) in the region growing. When region growing is applied to the FDT, information about O and fuzzy distance information are also used, but they are weighted together in the FDT. In our opinion, the shape of an object is emphasized even more if region growing by means of RFDT is applied, since the two types of information, from O and its FDT, are better exploited when used not only for computing the FDT and selecting seeds, but also in the actual region-growing process. A region-growing process resembling region growing by RFDT is the seeded WS described by Vincent (1993). However, for seeded WS the region growing is done grey-level after grey-level, which means that we need to choose whether to use either the information stored in the FDT or to base the region growing directly on the membership values of O. The RFDT can be implemented, for example, by using sorted priority queues in the same maner as seeded WS, giving comparable computational cost. When CMFB O, D F (or CMFBR O, D F ) is not included in L j O, D F for any j, iterative usage of RFDT will actually not be enough to assign all voxels in O with a label. A constrained FDT can be used for assignment of the remaining voxels. An unlabeled voxel in O is assigned the same label as its closest, using d F , already labeled voxel. Region growing by RFDT is well suited for identification of rather spherical subunits of a fuzzy object, or, to use a slightly different terminology, rather spherical clustered structures. We use the fuzzy objects Oi , i = 1, . . . , 4, previously shown in Figure 2, to illustrate the region growing by RFDT. In Figure 5, the left column shows Oi with the support outlined in green and L0 (Oi , h3, 4i) overlaid in magenta and blue, i = 1, . . . , 4. In the right column, the subunits resulting from the region growing are shown. Notice the size of the region corresponding to a seed depends on the
The Reverse Fuzzy Distance Transform
153
FIGURE 5 Left column: Fuzzy objects Oi with L0 (Oi , h3, 4i) in magenta and blue, i = 1, . . . , 4. Right column: Subunits in magenta and blue found using region growing by RFDT starting from L0 . The support of Oi , , i = 1, . . . , 4 is outlined in green. (See Color Insert.)
significance of the subunit it corresponds to compared with the significance of the subunit with which it “competes”.
4. CRYO-ELECTRON TOMOGRAPHY FOR IMAGING OF INDIVIDUAL MACROMOLECULES This section describes an application from the field of structural biology for which the above-described theoretical framework can be used with favor. Macromolecular structures can be imaged using a transmission electron microscope (TEM). In fact, transmission electron microscopy is a powerful tool for increased understanding of biological processes. To achieve a threedimensional (3D) view over the sample under study, one of two techniques can be applied: so-called single-particle imaging or individual-particle imaging. Both methods need to be combined with a reconstruction technique to obtain a 3D electron tomographic image from 2D micrographs (i.e., 2D projection images). We focus here on the latter approach. Micrographs of the sample are captured from different angles. The micrographs are then used to reconstruct a 3D image (in the following 3D reconstruction). The 3D reconstruction can be determined by using the well-known filtered backprojection technique as described by Crowther, DeRosier, and Kulg (1970), algebraic reconstruction techniques (ART) as described in Gordon,
154
Stina Svensson
Bender, and Herman (1970), the simultaneous iterative reconstruction technique (SIRT) as described in Gilbert (1972) or by, which is the technique used in this work, the iterated regularization method Constrained Maximum Entropy Tomography (COMET), with filtered backprojection as prior, ¨ Burnett, and Bricogne (1996) and as described in Skoglund, Ofverstedt, ¨ Rullg˚ard, Oktem, and Skoglund (2007). By using cryo-ET, it is possible to examine proteins and other macromolecules with a resolution of a few nanometers. The 3D reconstruction provides structural information of macromolecules that cannot be captured in any other way, as the properties of individual molecules can be examined. For full use of the unique type of information, methods for interpretation of the 3D reconstructions are necessary. This type of image is challenging. The image quality is rather poor for several reasons. The electron irradition destroys the sample, which means that the total dose must be kept very low, resulting in images with low SNR and thereby low contrast. The electron microscope limits the angular range that can be examined to 120◦ –140◦ , resulting in limited angle artefacts in the 3D reconstructions. The samples are often of uneven thickness, which skews the background level in the 2D micrographs and hence also in the 3D reconstructions. These are just of few of the image-quality challenges. In addition, the macromolecules often are represented by a rather small number of voxels. Considering all these facts, this theoretical framework is well suited for application to 3D reconstructions.
4.1. Methods for Analyzing Cryo-Electron Tomography Data As mentioned previously, developing automatic, or even semi-automatic, methods for ET data is not a trivial task. This is true both when the data are collected in a single-particle manner or in an individual-particle manner. Over the past 10 to 15 years a number of articles have been published on this topic, but the field still is far behind the developments in, for example, medical imaging. Methods for segmentation of subunits can be found in Volkmann (2002), Yu and Bajaj (2006), Baker, Yu, Chiu, and Bajaj (2006), ˜ et al. (2008) and Nguyen and Ji (2008) among others. As ET Garduno gives low-resolution data, it is often of interest to combine the structural information possible to extract from high-resolution data (i.e., docking of high-resolution structure, as resolved by, for example, X-ray crystallography) into low-resolution data. Such methods can be found in Wriggers, Milligan, and McCammon (1999), and Birmanns and Wriggers (2007), among others.
4.2. Specific Imaging Settings To show the performance of our methods on experimental data, we have ¨ ¨ used the material published by Sandin, Ofverstedt, Wikstrom, Wrange, and Skoglund (2004). There cryo-ET experiments on the monoclonal murine antibody IgG2a in solution were described. Antibodies are crucial
The Reverse Fuzzy Distance Transform
155
constituents of our immunological defense systems. They binds to foreign agents and target them, for instance, for destruction. The IgG antibody is the most abundant antibody in blood and has a molecular weight of about 150 kDa. It consists of three subunits, two fragment antigen binding (Fab) arms and a stem (Fc). The subunits are pairwise connected by a flexible hinge that allows for significant relative mobility. The three subunits can be identified in 3D reconstructions, but the resolution is too low to actually resolve the hinge region. The hinge region consists of 19 amino acids and depending on the extent to which this region is stretched, the subunits appear either disconnected or connected. In the study, IgG2a solution was mixed with a solution of 10-nm colloidal gold particles at a ratio of 2:1. Ultrathin films of the mixed solution were plunged into liquid ethane at −175◦ C, which causes a very rapid freezing rate with the effect that the water forms vitreous ice particles around the protein, preserving the hydrated structure and immobilizing them in the states they last occupied. The specimen was imaged under a low-dose condition (the ˚ 2 ) using a field emission gun (FEG) 200-keV total electron dose was ∼ 20 e/A TEM (Philips CM200). Two-dimensional micrographs was recorded over an angular range of 120◦ –125◦ at either every degree or every other degree on a CCD detector with a pixel size of 14 µm2 , at a magnification of 26,715×. The colloidal gold particles were used to align the micrographs to one another. Three-dimensional reconstructions were reconstructed using COMET. For a more detailed description, see Sandin et al. (2004). In Figure 6, four 50 × 50 × 50 voxel extracts from the 3D reconstructions, each containing one manually identified IgG antibody, are shown volumerendered. The antibodies shown are in IgG1 (top left), IgG2 (top right), IgG3 (bottom left), and IgG4 (bottom right).
4.3. Phantoms Constructed From Structures Deposited in the Protein Data Bank To show various aspects of our methods in a more controlled way, we have constructed a set of phantoms from structures deposited in the Protein Data Bank (PDB) (Berman et al., 2000). From a PDB entry, it is possible generate a 3D image with a certain voxel size and resolution of the corresponding macromolecule. For example, this can be done using a model where a Gauss kernel is placed at each atom position and multiplied by the mass of that atom (Pittet, Henn, Engel, & Heymann, 1999). The total density is then calculated by adding the contributions from Gauss kernels of atoms in the vicinity of the voxel. The resolution of the image is 2σ , where σ is the variance used to calculate the Gauss kernels. This approach is a simplification of the macromolecule as each atom is approximated by a kernel of the same shape. Instead, we have used RHOGEN, a program originating from the crystallographic community and
156
Stina Svensson
IgG1
IgG2
IgG3
IgG4
FIGURE 6 Four 50 × 50 × 50 voxel extracts from 3D reconstructions each containing one manually identified IgG antibody (shown volume-rendered).
adapted to ET at the Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden, and freely available for academic use,1 to create the phantoms. Each atom is approximated by a function that reflects the shape of the atom and different oxidation states of the atom. The potential functions are different if the incoming scattering is from the electron cloud or from the nuclei; hence, the needed adjustment of the program. We have chosen three different structures to show the different features of our methods. They are, listed with reference to their respective PDB, as follows: 1igt This corresponds to murine IgG2a murine antibody (mAb) 231 and is the only crystallographic structure of an intact IgG2 antibody (Harris, Larson, Hasel, & McPherson, 1997). Hence, it is the structure that best corresponds to the experimental data described in Section 4.2. As previously mentioned, the molecular weight of an IgG antibody is ∼150 kDa. 1 Contact Prof. Ulf Skoglund (
[email protected]) for a copy of the program.
The Reverse Fuzzy Distance Transform
157
2rec This corresponds to the RecA protein from the experiments by Yu and Egelman (1997), where it was suggested that the RecA hexamer is a structural homologue of ring helicases. The RecA protein plays several key roles in genetic recombination and repair and has therefore been the focus of several electron microscopy and X-ray crystallography studies. In Birmanns and Wriggers (2007), this structure was used to illustrate the proposed algorithm for multiresolution anchor-point registration of biomolecular assemblies and their components. The molecular weight of RecA is ∼230 kDA. 1q5a This corresponds to the S-shaped trans-interactions of cadherins described in He, Cowin, and Stokes (2003), which is a model based on the fitting of C-cadherin ectodomains (PDB id 1l3w, Boggon et al. (2002)) to 3D reconstructions of desmosomes. Cadherins are transmembrane proteins that mediate adhesion between cells in the tissues of animals. Cell adhesion relies on interactions between cadherin molecules. Such interactions between cadherin molecules and their organization in the desmosome can only be studied in situ (as far as we are aware), using ET. Recently a model for the organization ˜ Diez, Betts, & was built using this technique (Al-Amoudi, Castano Frangakis, 2007). The molecular weight of the structure (1q5a) is ∼200 kDa. ˚ We start by constructing phantoms of atomic resolution (i.e., grid size 1 A). These are shown in Figure 7 (left column). The 3D images are then low-pass ˚ and then resampled to grid size 5.74 A ˚ (i.e., filtered to a resolution of 30 A what correspond to the grid size for the experimental data). The resulting low-resolution phantoms are shown in Figure 7 (middle and right columns).
4.4. Simulated Data One difficulty when evaluating methods intended for use in ET applications is the fact that no ground truth exists and phantoms, constructed as in Section 4.3, provide a “too-ideal” view of the structures as many artefacts are imposed by the image acquisition technique. This makes a fully objective evaluation hard to achieve. One way to overcome this to some extent is to construct a phantom and use a TEM simulator to simulate micrographs from it, which can then be used to obtain a 3D reconstruction of the phantom. The phantom can be constructed from any suitable structure deposited in PDB as described in Section 4.3. For our simulations, we use the TEM simulator implemented by Hans Rullg˚ard, Department of Mathematics, Stockholm University, Stockholm, Sweden, which is freely available.2 The TEM model in 2 Download from URL: www.math.su.se/˜hansr.
158
Stina Svensson
FIGURE 7 From top to bottom: Phantoms constructed of PDB id 1igt, 2rec, and 1q5a, respectively. The left column shows surface renderings of phantoms of atomic resolution. The middle column shows surface renderings of phantoms that have been low-pass filtered ˚ and resampled to grid size 5.74 A ˚ . The right column shows volume to a resolution of 30 A renderings of the phantoms in the middle column.
¨ the simulator is described by Fanelli and Oktem (2008). To run the simulator the user needs to provide a file containing a 3D image in raw format containing the model for the sample; that is, the phantom to be imaged, together with text files describing the parameters specifying the model for the ET experiment. We have used phantoms of atomic resolution as described in Section 4.3. The settings in the simulator correspond to descriptions in Section 4.2, which means that the generated micrographs resemble micrographs from a FEG 200-keV TEM (Philips CM200). Three-dimensional reconstructions from the micrographs were done using COMET. Figure 8 shows the simulated 3D
The Reverse Fuzzy Distance Transform
159
FIGURE 8 From top to bottom: Simulated 3D reconstructions of phantoms constructed of PDB id 1igt, 2rec, and 1q5a, respectively. The left column shows maximum intensity projections and the right column shows surface renderings of the 3D reconstructions.
reconstructions as maximum intensity projections (left column) and surface rendered (right column). Of note, the simulated data are still rather ideal compared with the experimental data as the simulator takes into account many aspects of the image acquisition process but is far from complete. However, this technique
160
Stina Svensson
provides a much more realistic type of data than the phantoms and allows us to better verify the performance of the our methods.
5. FROM ELECTRON TOMOGRAPHIC STRUCTURE TO A FUZZY OBJECTS REPRESENTATION A 3D reconstruction is a 3D image containing the density values of the imaged structure. The methods described here require a fuzzy object as the input data. Hence, before applying the methods we need to extract a fuzzy object corresponding to the structure. This can be done in many different ways. Some articles on fuzzy segmentation of electron tomographic data sets, which is a process that can be used to extract a fuzzy object, have ¨ ˜ et al., 2008; Gedda, Skoglund, Ofverstedt, already been published (Garduno & Svensson, submitted for publication; Svensson, 2007b; Svensson et al., 2006). Because the connectedness of the image points making up the objects is imprecise, fuzzy set methods are well suited and, hence, it is probable that more publications on this topic will be forthcoming. To extract a fuzzy object from the data described in Section 4, we use the following approach. For the experimental data, some preprocessing is required to remove small variations in the 3D reconstructions, most likely related to noise. Because we want to leave significant features such as edges, an edge-preserving smoothing algorithm is a suitable choice. A number of algorithms are available for this purpose. An evaluation algorithm on the topic, focused on ET data, was recently published (Narasimha et al., 2008). We use anisotropic diffusion filtering introduced by Perona and Malik (1990), since it has been shown to perform well on our data. This step is not required for the phantoms or for the simulated data because in those cases the original data provide sufficient input. After the data are preprocessed, we intend to extract the fuzzy object O corresponding to a certain structure by using the degree of belongingness a voxel in the image has with voxels that we are certain belong to the structure (seed points)—that is, voxels placed in the most internal parts of the structure. The degree of belongingness is measured using fuzzy connectedness c [Eq. (3)]. The first step is then to identify suitable seed points. For this purpose, we follow the approach described, for example, in Svensson et al. (2006). There the FDT, calculated directly from the input data, is used to emphasize the most internal parts of the structure. Seed points are found among the voxels with the highest FDT values—in our case, by using a simple thresholding. The seeds are considered to have µO = 1. We calculate c for all other voxels in the image. As we use a fuzzy setting, the positioning of the border of the support of O is less crucial than if a crisp setting were used. We have chosen to include in O the voxels having the 3000, 4500, and 4000 highest c values for 1igt, 2rec, and 1q5a, respectively,
161
The Reverse Fuzzy Distance Transform
IgG1
IgG3
IgG2
1igt
2rec
IgG4
1q5a
FIGURE 9 Top row: Fuzzy objects representing the experimental data of the IgG antibodies in Figure 6. Bottom row: Fuzzy objects representing the simulated data of phantoms constructed of PDB id 1igt, 2rec, and 1q5a in Figure 8.
which is a number proportional to their molecular weight. Once the support of O has been determined, µO is set to be the calculated c values rescaled so that the voxels in the support of O have values in the range [0, 1]. ˜ et al. (2008). There seeds are A different approach is suggested in Garduno marked manually in the image and are assigned different labels. One seed is used for the background. The seeds are then allowed to compete with each other (i.e., the label of a seed is propagated together with the membership value). A voxel is assigned the label to which seed it has the highest degree of ˜ et al. (2008) use a more sophisticated belongingness. Furthermore Garduno membership function than Eq. (3), for which the expected homogeneity of the region corresponding to each seed is reflected. Hence, the membership function may differ for the different seeds. Figure 9 shows (top row) the fuzzy objects representing the experimental data of IgG antibodies in Figure 6 and (bottom row) representing the simulated data of 1igt, 2rec, and 1q5a in Figure 8. The fuzzy objects representing the phantoms (Figure 7), not surprisingly, turn out very similar to the input data (but with voxel values in the range [0,1]) and we therefore do not include them in Figure 9.
6. IDENTIFYING THE SUBUNITS OF A MACROMOLECULE Two of the previously mentioned structures (IgG/1igt and 2rec) have been chosen to show the identification of subunits in a macromolecular
162
Stina Svensson
structure using region growing by RFDT (Section 3). This is a relevant topic because subunit identification is often required before subsequent analysis, where the goal of the analysis can be, for instance, to determine interrelations to various degrees (i.e., the structural conformation of the imaged macromolecule). Region growing by RFDT can be used for this purpose. As mentioned earlier, it is tailored for structures with rather spherical subunits. This is the case for a large set of macromolecules such as virus particles. What is required, given a fuzzy object representation O of the macromolecule, is to detect suitable seeds [referred to as L0 O, D F in Section 3] for the region, growing. In most cases, subunits have the highest density in their most internal parts; hence, µO will be close to 1 corresponding to those regions (following the construction of µO described in Section 5). The FDTs computed on O further emphasize this fact. In previous publications, we have suggested use of Euclidean- or fuzzy distance-based clustering of local grey-level maxima in the FDT computed on O as a way to detect relevant seeds (Gedda et al., submitted for publication; Svensson, 2007b). Another simpler and less general, but still efficient way to detect seeds is to use a suitable set of mathematical morphology operations. In this chapter, we chose to use the latter approach as it proved robust enough for the data at hand. For seed detection before region growing by RFDT other approaches, developed for ET data, could be followed, such as the gradient vector diffusion method presented in Yu and Bajaj (2005). We assume that each subunit of O has an internal rather homogeneous region of adjacent voxels, v1 , . . . , vn for which µO (vi ), i = 1, . . . , n is higher than its surroundings. However, due to the poor quality of the input data, it is not guaranteed that there will be one single local grey-level maxima corresponding to each subunit. Therefore, we use a method to detect such a region called image reconstruction by erosion (Vincent, 1993). We start by creating a structuring element (SE) shaped as a ball and with a diameter slightly smaller than the expected diameter of one subunit. O is then eroded by SE—that is, each voxel v in the support of O is assigned the smallest membership value found in a neighborhood of v corresponding to SE. We denote the resulting image SE (O). SE (O) is then iteratively dilated by SE using O as a constraint. The result is an image with a locally maximal plateau corresponding to each subunit. The plateau can be easily identified and is included in L0 O, D F . For more details on this mathematical morphology based technique, see Vincent (1993) and Soille (1999). Once L0 O, D F has been identified, region growing by RFDT is applied. Figure 10 shows the subunits identified for the experimental data of the IgG antibodies in Figure 6. The subunits in this case correspond to the two Fab arms (yellow and orange) and the Fc stem (red). To identify the seeds we ˚ used a structuring element of diameter seven voxels: ∼40 A.
163
The Reverse Fuzzy Distance Transform
IgG1
IgG2
IgG3
IgG4
FIGURE 10 The Fab arms (yellow and orange) and the Fc stem (red) for the IgG antibodies in Figure 6 identified using region growing by RFDT. (See Color Insert.)
One important feature to consider when evaluating a certain method is its robustness toward changes in resolution of the input data. For this purpose, we refer to the phantoms described in Section 4.3. The PDB id 1igt and ˚ with 2rec was used to construct phantoms of resolution 20, 30, and 40 A, ˚ ˚ a grid size of 5.74 A. (The 30 A resolution phantoms are shown in Figure 7.) Figure 11 shows the subunits identified for each of the phantoms, where the subunits correspond to the Fab arms (yellow and orange) and the Fc stem (red) for 1igt and to the individual RecA subunits for 2rec. To identify the ˚ seeds we used a structuring element of diameter seven voxels (i.e., ∼40 A) ˚ for 1igt and a structuring element of diameter three voxels (i.e., ∼17 A) for 2rec. The subunits are satisfactory identified, and the region growing by RFDT thereby appears robust to changes in resolution, in all cases except for the ˚ There the phantom constructed of PDB id 2rec at a resolution of 20 A. border between the subunits shown in blue and yellow does not reflect the structure well. As the resolution becomes higher, the amount of internal structure becomes richer. This is useful for several structural focused analysis reasons, but it actually affects the seed detection process used here in a negative way. The used grid size is not enough to fully represent the internal structure so that, even though 2rec is symmetric, discretization caused the yellow seed to be weaker (in the sense of containing less relevant image points) than the blue seed. We use this example to show that, even with this difference, the fuzzy shape is better preserved than if other region growing methods are used. In Section 3, region growing by RFDT was briefly compared with seeded WS (Vincent, 1993). In WS, the grey-level image is considered as a topographic map and the final segmentation corresponds to the catchment basins, one for each local maximum. Region growing starts from labeled local grey-level maxima and propagates on a greylevel basis. When the propagation fronts from different maxima meet, a watershed is built to prevent further region growing. In a real application, the numbers of local maxima are often large. Many of the maxima correspond to small nonrelevant grey-level variations in the image that can result in
164
Stina Svensson
FIGURE 11 Subunits identified using region growing by RFDT for phantoms constructed of PDB id 1igt (tow row) and 2rec (bottom row). The phantoms are at a resolution of, from left to ˚ and have a grid size 5.74 A ˚ . The subunits correspond to the Fab arms right, 20, 30, and 40 A (yellow and orange) and the Fc stem (red) for 1igt and to the individual RecA subunits for 2rec. (See Color Insert.)
FIGURE 12 Subunits identified using region growing by RFDT (left) and seeded WS (right) for ˚ , with grid size 5.74 A ˚. a phantom constructed of PDB id 2rec at a resolution of 20 A
oversegmentation. This can be overcome only allowing region growing from seeds. WS is a well-known segmentation algorithm used in many different application (see Meyer and Beucher (1990) and Vincent (1993) for overviews). Lately WS has also drawn interest in the electron microscopy community (see Volkmann (2002)). Therefore, it is relevant for use as a comparison. Figure 12 shows the subunits identified for the phantom constructed of PDB id 2rec ˚ using region growing by RFDT (left) and seeded WS at a resolution of 20 A (right) starting from the same seeds. The positioning of the border between the subunits shown in blue and yellow is even more erroneous when seeded WS is used.
The Reverse Fuzzy Distance Transform
165
FIGURE 13 Subunits identified using region growing by RFDT for simulated data of phantoms constructed of PDB id 1igt (left) and 2rec (right). The subunits correspond to the Fab arms and the Fc stem (red, yellow, and orange) for 1igt. For 2rec, two subunits are identified (instead of the desired six) due to loss of information in the simulation process. (See Color Insert.)
Finally, Figure 13 shows the subunits identified for the simulated data in Figure 8 (top and middle row), corresponding to phantoms constructed of PDB id 1igt and 2rec. Region growing by RFDT was applied to their respective fuzzy object representations (Figure 9, bottom row, left and middle). For the seed detection we used the same settings as for the experimental data and for the phantoms. For 2rec, the resolution is insufficient to actually resolve the six subunits; the seed detection method fails to detect six separate regions.
7. IDENTIFYING THE CORE OF AN ELONGATED MACROMOLECULE The third structure in Section 4.3, namely PDB id 1q5a, was chosen to illustrate the use of CMFBs in the analysis of elongated structures. For elongated structures, identification of subunits is often not an issue. Instead, we are interested in tracking the structure and measure (e.g., its length or the thickness along it). To facilitate such measurements, the core, preferably a centrally located curve, representing the original structure can be used. For this purpose, the CMFBs are of interest. In Figure 14, the fuzzy objects Oi , CMFB(Oi , h3, 4, 5i), and CMFBR (Oi , h3, 4, 5i), i = 1, 2, where O1 is used to represent the phantom constructed ˚ and grid size 5.74 A ˚ in Figure 7 and of PDB id 1q5a at a resolution of 30 A O2 the simulated data of 1q5a in Figure 8, are shown as surface-rendered. The number of voxels included are 5502 (O1 ), 2133 (CMFB (O1 , h3, 4, 5i)), and 1355 (CMFBR (O1 , h3, 4, 5i)) for the phantom and 3875 (O2 ), 2361 (CMFB (O2 , h3, 4, 5i)), and 1061 (CMFBR (O2 , h3, 4, 5i)) for the simulated data. Note that from CMFBR (Oi , h3, 4, 5i), the support of Oi can be recovered using RFDT. Hence, we have an efficient way of representing Oi while still preserving important aspects of its shape. Figure 14, clearly shows the effect of the limited angular range in the electron microscope. Because the sample only can be examined to 120◦ –140◦ ,
166
Stina Svensson
FIGURE 14 From left to right: The fuzzy objects Oi , CMFB (Oi , h3, 4, 5i), and CMFBR (Oi , h3, 4, 5i), i = 1, 2, where O1 is used to represent the phantom constructed of PDB id 1q5a in Figure 7 and O2 the simulated data 1q5a in Figure 8. Surface rendering is used in all subfigures.
the 3D reconstructions will have missing data artefacts. In the case of an elongated, tubular structure such as 1q5a, the result is the cross section along its main axis more resembles an ellipse than the expected disk. This causes CMFBR (O2 , h3, 4, 5i) to be a medial surfacerather than a medial curve. For a fuzzy object O, CMFB O, D F [as well as CMFBR O, D F ] constitutes a medial representation O. One common difficulty when extracting a medial representation of a structure is the identification of end points. Intuitively, a medial representation can be obtained by “peeling” or thinning (i.e., iterative removal of all current border voxels until a curve has been obtained). This process, even when constraints are used for a topology-preserving thinning, evidently causes shortening of the structure. For CMFB O, D F , all relevant protrusions will be represented. As the support of O can be recovered CMFB O, D F , this set for some applications appears to be a too-rich structure. For instance, if the interest is to measure the length of O1 or O2 , CMFBR (O1 , h3, 4, 5i) and CMFBR (O2 , h3, 4, 5i) are preferably further reduced to curves, with the induced loss of information. We remark that CMFB O, D F is not always as dense as for the examples shown for 1q5a, but can constitute a set of disconnected voxels internally located in O. To facility subsequent analysis,
The Reverse Fuzzy Distance Transform
167
such a sparse set is preferably connected to a curve. Yu and Bajaj (2006) suggested extracting the centerline by tracing the eigenvectors of local structure tensors. Initially, they detect seeds corresponding to local maximal grey-level points in the 3D reconstructions. The seeds are then connected by starting from the seeds in two opposite directions and following the principal axis, defined by the eigenvactor corresponding to the minimum eigenvalue for the local structure tensor. A similar approach could be applied starting from CMFBR O, D F and calculating the local structure tensors from the fuzzy object.
8. CONCLUSIONS This chapter has focused on the use of fuzzy set methods in the shape analysis of structures in images with low signal-to-noise ratio (and thereby low contrast). I have described a theoretical framework, including the reverse fuzzy distance transform (RFDT), region growing by RFDT, as well as the concept of centers of maximal fuzzy balls (CMFB), which are applied to fuzzy object representations of relevant structures. All are generalizations of methods well known for crisp objects to a fuzzy setting. These generalizations allow us to achieve a more robust analysis, especially when the input data are of low contrast and low resolution (i.e., when crisp segmentation of the image into object and background is difficult). The output of region growing by RFDT is a crisp segmentation of a fuzzy object. Hence, we are still somewhere between a crisp and a truly fuzzy setting. For example, when addressing the relation between different regions we consider crisp relations. It may be of interest to take one step further and use partial belongings to a region and thereby obtain a fuzzy decomposition into regions where points belong to a region up to a certain membership. Ideas like this, including also fuzzy adjacencies, are described by Bloch (2005). The second part of chapter applies the methods to cryo-electron tomographic data of macromolecules. To show various aspects of the methods and to illustrate artefacts imposed by the image acquisition technique (and thereby further emphasize the need for robust methods tailored for this specific application), I use phantoms constructed from structures deposited in the Protein Data Bank and simulated data of both the constructed phantoms and experimental data. I believe that basing validation of methods on datasets with different resolutions and image quality is very relevant, especially as there is no way to create true ground truth datasets when cryo-electron tomography is used. One final remark regards the computation time required for the presented methods. In this chapter no optimization with respect to computational efficiency has been done. Of course, this aspect needs to be considered before using the methods in large-scale projects. For the computation of the FDT
168
Stina Svensson
and the RFDT, sorted priority queues are preferable (Saha et al., 2002). Region growing by RFDT can be implemented in a manner similar to watershed segmentation (Vincent & Soille, 1991) to thereby obtain a computationally more efficient algorithm. The reduction of CMFBs is still an open question. However, suggestions have been made for a solution in a crisp setting ¨ 1997). (Borgefors & Nystrom,
ACKNOWLEDGMENTS Joakim Lindblad and Robin Strand, Centre for Image Analysis, Uppsala, Sweden; and Nataˇsa Sladoje, Faculty of Engineering, Novi Sad, Serbia; are acknowledged for scientific support regarding the theoretical parts of the manuscript. ¨ ¨ Ulf Skoglund, Lars-Goran Ofverstedt, and Lars Norl´en, Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden, are acknowledged for scientific support regarding the electron tomography application. Hans Rullg˚ard, Department of Mathematics, Stockholm University, Stockholm, Sweden, is gratefully acknowledged for implementing and providing the TEM simulator. The experimental cryo-ET data sets on the IgG antibody have been provided by Sara Sandin, Division of Structural Studies, MRC Laboratory of Molecular Biology, Cambridge, United Kingdom (formerly at the Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden). The work presented here is part of project conducted with Magnus Gedda, Centre for Image Analysis, Uppsala, Sweden, who is acknowledged for various valuable contributions. The author received financial support from the Swedish Research Council (Project 621-2005-5540) and the Visualisation Research Programme by the Knowledge Foundation, V˚ardalstiftelsen (the Foundation for Health Care Sciences and Allergy Research), the Foundation for Strategic Research, VINNOVA, and Invest in Sweden Agency.
REFERENCES ˜ Diez, D., Betts, M. J., & Frangakis, A. S. (2007). The molecular Al-Amoudi, A., Castano architecture of cadherins in native epidermal desmosomes. Nature, 450(7171), 832–837. Arcelli, C., & Sanniti di Baja, G. (1988). Finding local maxima in a pseudo-Euclidean distance transform. Computer Vision, Graphics, and Image Processing, 43(3), 361–367. Baker, M. L., Yu, Z., Chiu, W., & Bajaj, C. (2006). Automated segmentation of molecular subunits in electron cryomicroscopy density maps. Journal of Structural Biology, 156, 432–441. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., et al. (2000). The protein data bank. Nucleic Acids Research, 28(1), 235–242. Beucher, S., & Lantuejoul, C. (1979). Use of watersheds in contour detection. In International workshop on image processing: Real-time edge and motion detection/estimation.
The Reverse Fuzzy Distance Transform
169
Birmanns, S., & Wriggers, W. (2007). Multi-resolution anchor-point registration of biomolecular assemblies and their components. Journal of Structural Biology, 157, 271–280. Bloch, I. (2000). Geodesic balls in a fuzzy set and fuzzy geodesic mathematical morphology. Pattern Recognition, 33, 897–905. Bloch, I. (2005). Fuzzy spatial relationships for image processing and interpretation: A review. Image and Vision Computing, 23, 89–110. Blum, H. (1967). A transformation for extracting new descriptions of shape. In W. Wathen-Dunn (Ed.), Models for the perception of speech and visual form (pp. 362–380). Cambridge, MA: MIT Press. Boggon, T. J., Murray, J., Chappuis-Flament, S., Wong, E., Gumbiner, B. M., & Shapiro, L. (2002). C-cadherin ectodomain structure and implications for cell adhesion mechanisms. Science, 296(5571), 1308–1313. Bogomolny, A. (1987). On the perimeter and area of fuzzy sets. Fuzzy Sets and Systems, 23, 257–269. Bongini, L., Fanelli, D., Svensson, S., Gedda, M., Piazza, F., & Skoglund, U. (2007). Resolving the geometry of biomolecules imaged by cryo electron tomography. Journal of Microscopy, 228, 174–184. Borgefors, G. (1986). Distance transformations in digital images. Computer Vision, Graphics, and Image Processing, 34, 344–371. Borgefors, G. (1996). On digital distance transforms in three dimensions. Computer Vision and Image Understanding, 64(3), 368–376. ¨ Borgefors, G., & Nystrom, I. (1997). Efficient shape representation by minimizing the set of centres of maximal discs/spheres. Pattern Recognition Letters, 18, 465–472. ¨ I., & Sladoje, N. (2005). Shape signatures of fuzzy star-shaped sets based Chanussot, J., Nystrom, on distance from the centroid. Pattern Recognition Letters, 26, 735–746. Crowther, R. A., DeRosier, D. J., & Kulg, A. (1970). The reconstruction of three-dimensional structure from projections and its application to electron microsopy. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 317(1530), 319–340. ¨ Fanelli, D., & Oktem, O. (2008). Electron tomography: A short overview with an emphasis on the absorption potential model for the forward problem. Inverse Problems, 24, 013001. ˜ Garduno,, E., Wong-Barnum, M., Volkmann, N., & Ellisman, M. H. (2008). Segmentation of electron tomographic data sets using fuzzy set theory principles. Journal of Structural Biology, 162, 368–379. ¨ Gedda, M., Skoglund, U., Ofverstedt, L.-G., & Svensson, S. (2008). Image processing system for localising macromolecules in electron tomography data (submitted for publication). Gilbert, P. (1972). Iterative methods for the three-dimensional reconstruction of an object from projections. Journal of Theoretical Biology, 36(1), 105–117. Gordon, R., Bender, R., & Herman, G. T. (1970). Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography. Journal of Theoretical Biology, 29(3), 471–481. Harris, L. J., Larson, S. B., Hasel, K. W., & Mcpherson, A. (1997). Refined structure of an intact IgG2a monoclonal antibody. Biochemistry, 36, 1581–1597. He, W., Cowin, P., & Stokes, D. L. (2003). Untangling desmosomal knots with electron tomography. Science, 302(5642), 109–113. Levi, G., & Montanari, U. (1970). A grey-weighted skeleton. Information and Control, 17, 62–91. Meyer, F. (1994). Topographic distance and watershed lines. Signal Processing, 38, 113–125. Meyer, F., & Beucher, S. (1990). Morphological segmentation. Journal of Visual Communication and Image Representation, 1(1), 21–46. Narasimha, R., Aganj, I., Bennett, A. E., Borgnia, M. J., Zabransky, D., Sapiro, G., et al. (2008). Evaluation of denoising algorithms for biological electron tomography. Journal of Structural Biology, 164(1), 7–17. Nguyen, H. (2008). Shape-driven three-dimensional watersnake segmentation of biological membranes in electron tomography. IEEE Transactions on Medical Imaging, 27(5), 616–628. ¨ Nystrom, I., & Borgefors, G. (1995). Synthesising objects and scenes using the reverse distance transformation in 2D and 3D. In C. Braccini, L. D. Floriani, & G. Vernazza (Eds.), Proceedings of ICIAP’95: Image Analysis and Processing (pp. 441–446). Berlin: Springer-Verlag.
170
Stina Svensson
Perona, P., & Malik, J. (1990). Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis, 12(7), 629–639. ´ A. (2001). Segmentation into fuzzy Philipp-Foliguet, S., Vieira, M. B., & De Albuquerque Araujo, regions using topographic distance. In Proceedings of the XIV Brazilian Symposium on Computer Graphics and Image Processing (pp. 282–288). Washington, DC: IEEE Computer Society. Pittet, J.-J., Henn, C., Engel, A., & Heymann, J. B. (1999). Visualizing 3D data obtained from microscopy on the internet. Journal of Structural Biology, 125, 123–132. Rosenfeld, A. (1979). Fuzzy digital topology. Inform. Control, 40(1), 76–87. Rosenfeld, A. (1984). The fuzzy geometry of image subsets. Pattern Recognition Letters, 2, 311–317. Rosenfeld, A., & Pfaltz, J. L. (1966). Sequential operations in digital picture processing. Journal of the Association for Computing Machinery, 13(4), 471–494. ¨ Rullg˚ard, H., Oktem, O., & Skoglund, U. (2007). A component-wise iterated relative entropy regularization method with updated prior and regularization parameters. Inverse Problems, 23, 2121–2139. Rutovitz, D. (1968). Data structures for operations on digital images. In G. C. Cheng, D. K. Pollock, & A. Rosenfeld (Eds.), Pictorial Pattern Recognition (pp. 105–133). Washington, DC: Thompson. Saha, P. K., Wehrli, F. W., & Gomberg, B. R. (2002). Fuzzy distance transform: Theory, algorithms, and applications. Computer Vision and Image Understanding, 86, 171–190. ¨ ¨ A.-C., Wrange, O., & Skoglund, U. (2004). Structure and Sandin, S., Ofverstedt, L.-G., Wikstrom, flexibility of individual immunoglobulin G molecules in solution. Structure, 12(3), 409–415. Sanniti di Baja, G. (1994). Well-shaped, stable, and reversible skeletons from the (3, 4)-distance transform. Journal of Visual Communication and Image Representation, 5, 107–115. Sanniti di Baja, G., & Svensson, S. (2002). A new shape descriptor for surfaces in 3D images. Pattern Recognition Letters, 23(6), 703–711. (Special issue on Discrete Geometry for Computer Imagery). Sethian, J. A. (1999). Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cambridge University Press. ¨ Skoglund, U., Ofverstedt, L.-G., Burnett, R. M., & Bricogne, G. (1996). Maximum-entropy threedimensional reconstruction with deconvolution of the contrast transfer function: A test application with adenovirus. Journal of Structural Biology, 117, 173–188. Sladoje, N., & Lindblad, J. (2007). Representation and reconstruction of fuzzy discs by moments. Fuzzy Sets and Systems, 158(5), 517–534. ¨ Svensson, S., & Lofstrand, ¨ Smedby, O., T. (1999). Greyscale connectivity concept for visualizing MRA and CTA volumes. In S. K., Mun and Y., Kim (Eds.), Medical imaging 1999: image display, (volume 3658 of Proceedings of SPIE), (pp. 212–219). SPIE-The International Society for Optical Engineering. Soille, P. (1994). Generalized geodesy via geodesic time. Pattern Recognition Letters, 15, 1235–1240. Soille, P. (1999). Morphological Image Analysis. Berlin: Springer-Verlag. Svensson, S. (2007a). Centres of maximal balls extracted from a fuzzy distance transform. In G. J. F., Banon, J., Barrera, U. d. M., Braga-Neto, and N. S. T., Hirata (Eds.), Procedings of 8th international symposium on mathematical morphology, Vol. 2 (pp. 19–20). Available from http://urlib.net/dpi.inpe.br/ismm@80/2007/06.13.10.08. Svensson, S. (2007b). A decomposition scheme for 3D fuzzy objects based on fuzzy distance information. Pattern Recognition Letters, 28(2), 224–232. Svensson, S. (2008). Aspects on the reverse fuzzy distance transform. Pattern Recognition Letters, 29(7), 888–896. ¨ Svensson, S., Gedda, M., Fanelli, D., Skoglund, U., Ofverstedt, L.-G., & Sandin, S. (2006). Using a fuzzy framework for delineation and decomposition of immunoglobulin G in cryo electron tomographic images. In Y. Y. Tang, S. P. Wang, G. Lorette, D. S. Yeung, & H. Yan (Eds.), Proceedings The 18th International Conference on Pattern Recognition: Vol. 4 (pp. 520–523). Svensson, S., & Sanniti di Baja, G. (2002). Using distance transforms to decompose 3D discrete objects. Image and Vision Computing, 20(8), 529–540.
The Reverse Fuzzy Distance Transform
171
Udupa, J. K., & Saha, P. K. (2003). Fuzzy connectedness and image segmentation. Proceedings of the IEEE, 91(10), 1649–1669. Udupa, J. K., & Samarasekera, S. (1996). Fuzzy connectedness and object definition: Theory, algorithms, and applications in image segmentation. Graphical Models and Image Processing, 58, 246–261. Vincent, L. (1993). Morphological grayscale reconstruction in image analysis: Applications and efficient algorithms. IEEE Transactions on Image Processing, 2(2), 176–201. Vincent, L., & Soille, P. (1991). Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(6), 583–597. Volkmann, N. (2002). A novel three-dimensional variant of the watershed transform for segmentation of electron density maps. Journal of Structural Biology, 138, 123–129. Wriggers, W., Milligan, R. A., & McCammon, J. A. (1999). Situs: A package for docking crystal structures into low-resolution maps from electron microscopy. Journal of Structural Biology, 125, 185–195. Yu, X., & Egelman, E. H. (1997). The RecA hexamer is a structural homologue of ring helicases. Nature Structural & Molecular Biology, 4(2), 101–104. Yu, Z., & Bajaj, C. (2005). Automatic ultrastructure segmentation of recontructed CryoEM maps of icosahedral viruses. IEEE Transactions on Image Processing, 14(9), 1324–1337. Yu, Z., & Bajaj, C. (2006). Computational approaches for automatic structural analysis of large bio-molecular complexes. IEEE/ACM transactions on computational biology and bioinformatics, 1, 1–15. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–353.
Chapter
5 Anchors of Morphological Operators and Algebraic Openings M. Van Droogenbroeck
Contents
1. Introduction 1.1. Terminology and Scope 1.2. Toward Openings 1.3. Anchors 2. Morphological Anchors 2.1. Set and Function Operators 2.2. Theory of Morphological Anchors 2.3. Local Existence of Anchors 2.4. Algorithmic Properties of Morphological Anchors 3. Anchors of Algebraic Openings 3.1. Spatial and Shift-Invariant Openings 3.2. Granulometries 4. Conclusions References
173 174 177 180 182 183 187 188 192 195 197 198 199 200
1. INTRODUCTION Over the years mathematical morphology, a theory initiated by Matheron (1975) and Serra (1982), has grown to a major theory in the field of nonlinear image processing. Tools of mathematical morphology, such as morphological filters, the watershed transform, and connectivity operators, are now widely available in commercial image processing software packages and the theory itself has considerably expanded over the past decade (Najman & Talbot, 2008). This expansion includes new operators, algorithms, methodologies, and concepts that have led mathematical morphology to become part of the mainstream of image analysis and image-processing technologies. University of Li`ege, Department of Electrical Engineering and Computer Science, Montefiore, Sart Tilman, Li`ege, Belgium Advances in Imaging and Electron Physics, Volume 158, ISSN 1076-5670, DOI: 10.1016/S1076-5670(09)00010-X. c 2009 Elsevier Inc. All rights reserved. Copyright
173
174
M. Van Droogenbroeck
The growth in popularity is due not only to the theoretical work of some pioneers but also to the development of powerful tools for image processing such as granulometries (Matheron, 1975), pattern spectrum analysis-based techniques (Maragos, 1989) that provide insights into shapes, and transforms like the watershed (Beucher & Lantu´ejoul, 1979; Vincent & Soille, 1991) or connected operators (Salembier & Serra, 1995) that help to segment an image. All these operators have been studied intensively and tractable algorithms have been found to implement them effectively, that is, in real time on an ordinary desktop computer. Historically, mathematical morphology is considered a theory that is concerned with the processing of images, using operators based on topological and geometrical properties. According to Heijmans (1994), the first books on mathematical morphology discuss a number of mappings on subsets of the Euclidean plane, which have in common that they are based on set-theoretical operations (union, intersection, complementation), as well as translations. More recently researchers have extended morphological operators to arbitrary complete lattices, a step that has paved the way to more general algebraic frameworks. The geometrical interpretation of mathematical morphology relates to the use of a probe which is a set called structuring element. The basic idea in binary morphology is to probe an image with the shape of the structuring element and draw conclusions on how this shape fits or misses the regions of that image. Consequently, there are as many interpretations of an image as structuring elements, although one often falls back to a few subset of structuring elements, such as lines, squares, balls, or hexagons. In geometrically motivated approaches to mathematical morphology, the focus clearly lies on the shape and size of the structuring element. Algebraic approaches do not refer to geometrical or topological concepts. They concentrate on the properties of operators. Consequently, algebraic approaches embrace larger classes of operators and functions. But in both approaches the goal is to characterize operators to help design solutions useful for image processing applications.
1.1. Terminology and Scope Let R and Z be the set of all real and integer numbers, respectively. In this chapter, we consider transformations and functions defined on a space E , which is the continuous Euclidean space Rn or the discrete space Zn , where n ≥ 1 is an integer. Elements of E are written p, x, . . ., while subsets of elements are denoted by upper-case letters X, Y, . . . Subsets of R2 are also called binary sets or images because two colors suffice to draw them (elements of X are usually drawn in black, and elements that do not belong to X in white). Next we introduce an order on P(E ), the power set comprising all subsets of E , that results from the usual inclusion notion. X is smaller or equal to Y if
Anchors of Morphological Operators and Algebraic Openings
175
and only if X ⊆ Y . Note that we decide to encompass the case of equality in contrary to strict inclusion ⊂. The power set P(E ) with the inclusion ordering is a complete lattice (Heijmans, 1994). This chapter also deals with images. Images are modeled as functions (denoted as f, g, . . .), that map a non-empty set E of E into R, where R is a set of binary values {0, 1}, a discrete set of grey-scale values {0, . . . , 255}, or a closed interval [0, 255], defining, respectively, binary, grey-scale or continuous images. The ordering relation ≤ is given by “ f ≤ g if and only if f (x) ≤ g(x) for every x ∈ E”. It can be shown that the space Fun(E , R) of all functions and ≤ form a complete lattice, which means among others that every subset of the lattice has an infimum and a supremum. The frameworks of P(E ) and ⊆, or Fun(E , R) and ≤, are equivalent as long as we deal with complete lattices. Most results can thus be transposed from one framework to the other. In the following we arbitrary decide to restrict functions to be single-valued grey-scale images. Also, for convenience, we use the unique term operators to refer to operations that map sets of E into E or handle Fun(E , R). In addition, we restrict the scope of this chapter to operators that map sets to sets, or functions to functions. This implies that both the input and the output lattices are the same, or equivalently, that there is only one complete lattice under consideration. Operators are denoted by greek letters , δ, γ , ψ, . . . 1.1.1. The Notion of Idempotence Linear filters and operators are common in many engineering fields that process signals. As an analogy, remember that every computer with a sound card contains a hardware module that prefilters the acquired signal before sampling to prevent aliasing. Likewise, digital signals are converted to analog signals by means of a low-pass filter. The concept of a reference filter called an ideal filter is often used to characterize linear filters. Ideal filters have a binary transmittance with only two values: 0 or 1. Either they retain a frequency or they drop it. Ideal filters stabilize the result after a single pass; further applications do not modify the result any more in the spectral domain (nor in the spatial domain!). Unfortunately, it can be shown that because practical linear filters have a finite kernel in the spatial domain, they cannot be ideal. It might be impossible to build ideal filters, but they nevertheless serve as a reference because it is pointless to repeat the filtering process. The notion of frequency is irrelevant in nonlinear image processing. Nonlinear operators operate in the initial domain, either globally or locally. An important class of nonlinear operators computes rank statistics inside a moving window. For example, the median operator that selects the median from a collection of values taken inside a moving local window centered on x and that allocates the median value to ψ( f )(x) is known to be efficient for the removal of salt-and-pepper noise (Figure 1). However, in some cases, the median filter oscillates as shown in Figure 2.
176
M. Van Droogenbroeck
(a)
(b)
(c)
FIGURE 1 Effect of a median filter. (a) Original grey-scale image, (b) original image corrupted by noise, and (c) filtered image.
FIGURE 2 Successive applications of a three-pixel wide median filter on a binary image may result in oscillations.
Oscillations may not be common in practice, but they still cast doubts on the significance of the output function. Therefore, the behavior of median operators has been characterized in terms of root signals. Technically, a root signal f of an operator, which is sometimes called a fixed point, is a function that is invariant to the applications of that operator for each location: ∀x ∈ E , ψ( f )(x) = f (x), where f is the root signal. The existence of root signals is not restricted to the median operator. Let us consider a simple example of a one-dimensional constant function g(x) = k and a linear filter with an impulse response h(x). The filtered signal r (x) is R +∞ R +∞ the convolution of g(x) by h(x): r (x) = −∞ g(t − x)h(t) dt = k −∞ h(t) dt = kH (0), where H (0) is the Fourier transform of h(t) taken for f = 0. In the case of an ideal filter, H (0) = 1 so that r (x) = k = g(x). This illustrates that, up to a constant, root signals for linear operators typically include constantvalued or straight-line signals. For nonlinear operators, there is a property somewhat similar to that of an ideal filter for linear processing. Mathematical morphology uses a property called idempotence.
Anchors of Morphological Operators and Algebraic Openings
177
Definition 1. Consider an operator ψ on a complete lattice (L , ≤), and let X be an arbitrary element of L . An operator ψ is idempotent if and only if ψ(ψ(X )) = ψ(X )
(1)
for any X of L . In the following, we use this notation or the operator composition: ψψ = ψ. In contrast to the case of ideal filters, it is possible to implement idempotent operators. Therefore, the property is part of the design of filters and not just a goal; idempotence is chosen as one of the compulsory property for the operator. This explains why many idempotent operators have been proposed: algebraic filters, morphological openings, attribute openings,. . . Idempotence might be one of the requirements in the design of an operator, it does not suffice! For example, an operator ψ that maps every function to g(x) = −∞ is idempotent but useless. Hereafter, we present additional properties to complete the algebraic framework and elaborate on the formal definition of openings.
1.2. Toward Openings By definition, a notion of order exists on a complete (L , ≤). The property of increasingness guarantees that an order between objects in the lattice is preserved (remember that we deal only with objects that belong to a unique lattice, that is X, ψ(X ) ∈ L and there is only one notion of order ≤). That is, Definition 2. The lattice operator ψ is called increasing if X ≤ Y implies ψ(X ) ≤ ψ(Y ) for every X, Y ∈ L . Let ψ be an operator on L , and let X be an element of L for which ψ(X ) = X . Then X is called invariant under ψ or, alternatively, a fixed point of ψ. The set of all elements invariant under ψ is denoted by Inv (ψ) and is called the invariance domain of ψ. The Tarski’s fixpoint theorem specifies that the invariance domain Inv (ψ) of an increasing operator ψ on a complete lattice is nonempty. Increasingness builds a bridge between ordering relations before and after the operator. But, as X and ψ(X ) are defined on the same lattice, one can also compare X to ψ(X ), leading us to define additional properties: Definition 3. Let X, Y be two sets (or functions) of a lattice (L , ≤). An operator ψ on L is called • extensive if ψ(X ) ≥ X for every X ∈ L ; • anti-extensive if ψ(X ) ≤ X for every X ∈ L . In more practical terms, increasingness tells us if an order in the source lattice is preserved in the destination lattice, idempotence if the application
178
M. Van Droogenbroeck
(a) f(x)
(b) γ (f(x))
(c) g(x)
(d) γ (g(x))
FIGURE 3 Illustrations of increasingness and anti-extensivity. (a) and (b) Original grey-scale image f (x) and after processing with an opening operator γ ; (c) and (d) Similar displays for a whiter image g(x).
of the operator stabilizes the results, and extensivity if the result is smaller or larger than its source. Figure 3 shows these remarks in illustrative terms: f (x) ≤ g(x) implies γ ( f ) ≤ γ (g), γ ( f (x)) ≤ f (x), and γ (g(x)) ≤ g(x). When an operator ψ is both increasing and idempotent, it is called an algebraic filter. Regarding the extensivity property, there are two types of algebraic filters: anti-extensive or extensive algebraic filters are respectively called algebraic openings or algebraic closings. Openings and closings share the common properties of increasingness and idempotence, but are dual with respect to extensivity. Thanks to this duality, we can limit the scope of this chapter to openings; handling closings brings similar results. Figure 4 shows the effect of an algebraic opening Qn that rounds greyscale values to a closest inferior multiple of an integer n; this operator is called quantization in signal processing. It is increasing, anti-extensive, and idempotent, but it should be noted that if none f (x) is a multiple of n, then
Anchors of Morphological Operators and Algebraic Openings
(a) f(x)
(b) Q10(f(x))
179
(c) Q50(f(x))
FIGURE 4 Quantization operator. (a) Original grey-scale image, (b) image rounded to the closest inferior multiple of 10, and (c) image rounded to the closest inferior multiple of 50.
f (x) 6= ψ( f (x)) for all x ∈ E . In other words, quantization may produce values that are not present in the original image and thus have questionable statistical significance. Note that the definition of order is a pointwise property. f (x) is compared with g(x) or γ ( f (x)) but not compared with the value at a different location (called pixel in image analysis). In practice, however, neighboring pixels share some common physical significance that, for example, rank operators explore. A rank operator of rank k within a discrete sliding window centered at a given location x is obtained by sorting in ascending order the values falling inside the window and by selecting as output value for x the kth value in the sorted list. Some of the best-known rank operators are the local minimum and maximum operators. In mathematical morphology, these operators are referred to as erosion and dilation, respectively, and the window itself is termed a structuring element or a structuring set. There filters are also referred to as min- or max-filters in the literature. The presence of some interaction between neighboring pixels introduced by rank operators is why their characterization becomes more challenging. Consider a complete lattice (L , ≤). To elaborate on the notion of neighborhood, we propose the definition of a property called spatiality. Definition 4. An operator ψ on (L , ≤) is said to be spatial if for every location x ∈ E and for every function f , there exists y ∈ E such that ψ( f (x)) = f (y),
(2)
and at least one y is different from x. The trivial case of x = y, for every x ∈ E , is thus excluded. As explained previously, the quantization operator is not spatial because it does not consider the neighborhood of x.
180
M. Van Droogenbroeck
Assuming that an image is the result of an observation, the smaller the choice of the neighborhood for finding y, the higher the physical correlation between pixels will be. On purpose, there is no notion of distance between x and y in the definition of spatiality, although one hopes that operators with a reasonable physical significance should restrict the search for y to a close neighborhood of x. Spatiality constrains operators to select values in the neighborhood of a pixel. But the underlying question that is twofold remains: (1) Does an operator drive any input to a root signal (this is called the convergence property), and (2) if not, do oscillations propagate? Root signals have been studied with a particular emphasis on median filters (Arce & Gallagher, 1982; Arce & McLoughlin, 1987; Astola, Heinonen, & Neuvo, 1987; Eberly, Longbotham, & Aragon, 1991; Eckhardt, 2003; Gallagher & Wise, 1981). The convergence property is of no particular interest for idempotent operators, as ψ(ψ( f )) = ψ( f ), so that the question becomes that of determining the subset of locations x ∈ E such that f (x) = ψ( f (x)). In different terms, the study of the invariance domain Inv (ψ) is a key to a better understanding of ψ; indeed, characterizing locations for a function f with respect to ψ can help implement the operator (as shown in Van Droogenbroeck and Buckley, 2005).
1.3. Anchors To analyze the behavior of some operators, we introduce the concept of anchors. We now define this concept, which can be seen as an extension of that of roots. An anchor is essentially a version of the root notion where the domain of definition is reduced to a subset of it. Definition 5. Given a signal f and an operator ψ on a complete lattice (L , ≤), the pair comprising a location x in the domain of definition of f and the value f (x) is an anchor for f with respect to ψ if ψ( f )(x) = f (x).
(3)
In marketing terms, one would say “The right value at the right place.” The set of anchors is denoted Aψ ( f ). Note that Definition 5 differs from the initial definition provided in Van Droogenbroeck and Buckley (2005) to emphasize the role of both the location x and the value of f (x). We provide an illustration in Figure 5. In this particular case, there is no evidence that anchors should always exist. Take a grey-scale image whose values f (x) are all odd, then Q2 ( f ) has no anchor, although f and Q2 ( f ) look identical. The existence of anchors is an open issue. Also it is interesting to determine whether an order between operators implies a similar inclusion order between anchor sets. In general, γ1 ≤ γ2 is no guarantee to establish an inclusion of their respective anchor sets. However, drawings (d) and (e) of
Anchors of Morphological Operators and Algebraic Openings
(a) f(x)
(b) Q10(f(x))
(d) Anchors of Q10(f(x))
181
(c) Q50(f(x))
(e) Anchors of Q50(f(x))
FIGURE 5 Quantization operator and anchors whose locations are drawn in black in (d) and (e). (a) Original image, (b) image rounded to the closest inferior multiple of 10, (c) image rounded to the closest inferior multiple of 50, (d) and (e), respectively, anchors of (b) and (c).
Figure 5 suggest that AQ50 ( f ) ⊆ AQ10 ( f ), which is true in this case because, in addition to Q50 ( f ) ≤ Q10 ( f ), Q50 ( f ) “absorbs” Q10 ( f ); more precisely, it is required that Q50 ( f ) = Q50 (Q10 ( f )) (see Theorem 8). Figure 6 shows anchors of two other common operators: morphological erosions and openings (detailed further in Section 2). Papers dealing with roots, convergence, or invariance domains focus either on the operator itself or on the entire signal. Anchors characterize a function locally, but they also help in finding algorithms, or interpreting existing algorithms. Van Droogenbroeck and Buckley (2005) presented algorithms applicable to morphological operators based on linear structuring elements and show how they offer an alternative to implementations like the one of van Herk (1992). In this chapter, we use an algebraic framework, with an eye on the geometrical notions, to expose the notion of anchors. The remainder of this chapter is organized as follows. Section 2 recalls several definitions and
182
M. Van Droogenbroeck
(a) f
(b)
(d) γ B (f )
B (f )
(c) Anchors of
B (f )
(e) Anchors of γ B (f )
FIGURE 6 Illustration of anchors [marked in black in (c) and (e)]. (a) Original image, (b) an image eroded by a 3 × 3 square structuring element, (c) anchor locations of B ( f ), (d) an image opened by a 3 × 3 square structuring element, and (e) anchor locations of γ B ( f ).
details theoretical results valid for morphological operators; anchors related to morphological operators are called morphological anchors. This section rephrases many results presented in Van Droogenbroeck and Buckley (2005). Section 3 extends the notion of anchors to the framework of algebraic operators. In particular, we present the concept of algebraic anchors that applies for algebraic openings and closings. The major contribution is the proof that if some operators might have no anchors (remember the case of the quantization operator Q2 of an image filled with odd grey-scale values), classes of openings and closings, others than their morphological “brothers,” have anchors, too.
2. MORPHOLOGICAL ANCHORS After a brief reminder on basic morphological operators, we emphasize the role of anchors in the context of erosions and openings by discussing their
Anchors of Morphological Operators and Algebraic Openings
183
existence and density. It is shown that anchors are intimately related to morphological openings and closings (their duals), and that the existence of anchors is guaranteed for openings. Furthermore, it is possible to derive properties useful for the implementation of erosions and openings. Section 3 generalizes a few results in the case of algebraic openings.
2.1. Set and Function Operators If E is the continuous Euclidean space Rn or the discrete space Zn , then the translation of x by b is given by x + b. To translate a given set X ⊆ E by a vector b ∈ E , it is sufficient to translate all the elements of X by b: X b is defined by X b = {x + b|x ∈ X }. Due to the commutativity of +, X b is equivalent to b X , where b X is the translate of b by all elements of X . Let us consider two subsets X and B of E . The erosion and dilation of these sets by a set B are respectively defined as X B=
\
X −b = { p ∈ E |B p ⊆ X },
(4)
[
(5)
b∈B
X⊕B=
[
Xb =
b∈B
Bx = {x + b|x ∈ X, b ∈ B}.
x∈X
For X ⊕ B, X and B are interchangeable, but not for the erosion, where it is required that B p be contained within X . Note that there are as many erosions as sets B. As B serves to enlighten some geometrical characteristics of X , it is called a structuring element or structuring set. Although the window shape might be arbitrary, it is common practice in applied image analysis to use linear, rectangular, or circular structuring elements. If B contains the origin o, ! X B=
\ b∈B
X −b =
\
X −b ∩ X,
(6)
b∈B\{o}
which is included in X . Therefore, if o ∈ B, the erosion and dilation are, respectively, anti-extensive and extensive. In addition, both operators are increasing but not idempotent. Because erosions and dilations are, respectively, anti-extensive and extensive (when the structuring element contains the origin), the cascade of an erosion and a dilation suggests itself. This set, denoted by X ◦ B, is called the opening of X by B and is defined by X ◦ B = (X B) ⊕ B.
(7)
Similarly, the closing of X by B is the dilation of X followed by the erosion, both with the same structuring element. It is denoted by X • B and defined by X • B = (X ⊕ B) B. Dilations and erosions are closely related although
184
FIGURE 7
M. Van Droogenbroeck
Opening and closing with a ball B.
not inverse operators. A precise relation between them is expressed by the duality principle (Serra, 1982) that states that ˇ c X B = (X c ⊕ B)
or
ˇ c, X ⊕ B = (X c B)
(8)
where the complement of X , denoted X c , is defined as X c = { p ∈ E | p 6∈ X }, and the symmetric or transposed set of B ⊆ E is the set Bˇ defined as Bˇ = {−b|b ∈ B}. Therefore, all statements concerning erosions and openings have an equivalent form for dilations and closings and vice versa. When B contains the origin, X B is the union of locations p that satisfy B p ⊆ X . When a dilation is applied to this set, the resulting set sums p B -like contributions, which are equivalent to B p . So X ◦ B is the union of B p that fits into X : X ◦ B = {B p |B p ⊆ X }.
(9)
In addition, it can be shown that X ◦ B is identical to X ◦ B p , so that the opening does not depend on the position of the origin when choosing B. The interpretation of X ◦ B as the union {B p |B p ⊆ X } is referred to as the geometrical interpretation of the morphological opening. A similar interpretation yields for the closing. The closing is the complementary set of the union of all the translates B p contained in X c . Figure 7 illustrates an opening and a closing with a ball. The geometrical interpretation suffices to prove that if X ◦ B is not empty, then there are at least #(B) anchors, where #(B) denotes the cardinality or area of B. The existence of anchors for X B is less trivial; assume that X is a
Anchors of Morphological Operators and Algebraic Openings
185
chessboard and B = { p}, where p is located at the distance of one square of the chessboard. In this case, X B = X p and X ∩ X p = ∅; A X B (X ) is empty. To the contrary, if o ∈ B and X B is not empty, then the erosion of X by B has anchors. In the following, we define operators on grey-scale images and then discuss the details of anchors related to erosions and openings. Previous definitions can be extended to binary and grey-scale images. If f is a function and b ∈ E , then the spatial translate of f by b is defined by f b (x) = f (x − b). The spatial translate is also called horizontal translate. The vertical translate, used later in this chapter, of a function f by a value v is defined by f v (x) = f (x) + v. The vertical translate shifts the function values in the grey-scale domain. The erosion of a function f by a structuring element B is denoted by B ( f )(x) and is defined as the infimum of the translations of f by the elements −b, where b ∈ B ^
B ( f )(x) =
f −b (x) =
b∈B
^
f (x + b).
(10)
b∈B
Likewise, we define the dilation of f by B, δ B ( f )(x), as δ B ( f )(x) =
_ b∈B
f b (x) =
_
f (x − b).
(11)
b∈B
Note that we consider so-called flat structuring elements; more general definitions using a non-flat structuring elements exist but they are not considered here. Just as for sets, the morphological opening γ B ( f ) and closing φ B ( f ) are defined as compositions of erosion and dilation operators: γ B ( f ) = δ B ( B ( f )), φ B ( f ) = B (δ B ( f )).
(12) (13)
Figure 8 shows the effects of several morphological operators on an image. Again, B ( f ) and δ B ( f ), and γ B ( f ) and φ B ( f ) are duals of each other (Serra, 1982), which is interpreted as stating that they process the foreground and the background symmetrically. If, by convention, we choose to represent low values with dark pixels in an image (background) and large values with white pixels (foreground), erosions enlarge dark areas and shrinks the foreground. From all the previous definitions, it can be seen that erosions, dilations, openings, and closings are spatial operators, as defined previously. They use values taken in the neighborhood. Heijmans (1994) and other authors have shown that set operators can be extended to function operators and hence the entire apparatus of
186
M. Van Droogenbroeck
(a) f(x)
(b)
(c) δB (f )
B (f )
(d) γ B (f )
(e) φB (f )
FIGURE 8 Original image (a), erosion (b), dilation (c), opening (d), and closing (e), with a 15 × 15 square.
morphology on sets is applicable in the grey-scale case as well. The underlying idea is to slice a function f into a family of increasing sets obtained by thresholding f . Without further details, consider a complete lattice Fun(E , R). We associate a series of threshold sets to f as defined by (Figure 9) X (t) = {x ∈ E | f (x) ≥ t}.
(14)
Note that X (t) is decreasing in t and that these sets obey the continuity condition X (t) =
\
X (s).
(15)
s 0 which have no lower bound, we have previously restricted the range of grey-scale values to a finite set (which means that it is countable) or a closed interval. Likewise, we must deal with
188
M. Van Droogenbroeck
finite structuring elements to be able to count the number of anchors. Both finiteness assumptions are used in the following. Consequently, there is at least one global minimum, and at least one anchor point. That is, Theorem 1. Consider a finite structuring element B, the set of anchors of a morphological opening is always non-empty: Aγ B ( f ) 6= ∅.
(16)
We provide an improved formal statement on the number of anchors for openings later. Note that the position of the origin in B has no influence on the set of anchors of γ B ( f ). This originates from the corresponding property on the operator itself, that is, γ B ( f ) = γ B p ( f ), for any p (on a infinite domain). Similar properties do not hold for erosions. In fact, the set of anchors of a morphological erosion may be empty, and the location of the origin plays a significant role; a basic property states that X B p = (X B)− p . Figure 10 shows two erosions with a same but translated structuring element. Note that the choice of the origin in the middle of B is no guarantee for the number of anchors to be larger. Again, based on the interpretation of openings in terms of threshold sets, larger structuring elements are less likely to lead to large sets of anchors. Indeed, large structuring elements do not fit into higher threshold sets, so that at higher grey-scale levels there are fewer anchors. Figure 11 shows the evolution of the cardinality of Aγ B ( f ), as the size of B increases.
2.3. Local Existence of Anchors W Because δ B ( f ) is defined as b∈B f (x −b), the dilation is a spatial operator. So the supremum (or maximum for real images) is reached at a given location p such that δ B ( f )(x) = f ( p), where p = x − b0 . But if b0 ∈ B, then p ∈ Bˇ x ; Bˇ x is the symmetric of B translated by x. Up to a translation, Bˇ x = x + Bˇ defines the neighborhood where the supremum for x can be found. Intuitively, there are as many anchor candidates for δ B ( f ) as disjoint sets like Bˇ x . Similar arguments lead to a relation valid for erosions. The following proposition gives the respective neighborhoods: Proposition 1. If B is finite and x is any point in the domain of definition of f , then δ B ( f )(x) = f ( p)
(17)
B ( f )(x) = f (q)
(18)
for some p ∈ Bˇ x , and some q ∈ Bx .
Anchors of Morphological Operators and Algebraic Openings
(b)
(a) f(x)
(d) Anchors of
B (f )
(c)
B (f )
(e) Anchors of
189
Bp(f )
Bp(f )
FIGURE 10 Original image (a), erosion by B (b), erosion by B p (c), and their anchor sets marked in E , respectively, (d) and (e). B is a 11 × 11 centered square and p = (5, 5).
We can combine Eqs. (17) and (18) to find the neighborhoods of openings and closings. From Eq. (12), we have γ B ( f ) = δ B ( B ( f )). Therefore, γ B ( f )(x) = B ( f )( p) with p ∈ Bˇ x . Similarly, B ( f )( p) = f (q) with q ∈ B p . ˇ x . For the closing, So we have γ B ( f )(x) = f (q), and q ∈ ( Bˇ ⊕ B)x = (B ⊕ B) the neighborhood is identical. This can be summarized as Proposition 2. If B is finite and x is any point in the domain of definition of f , then γ B ( f )(x) = f ( p) φ B ( f )(x) = f (q)
(19) (20)
ˇ x. for some p, q ∈ (B ⊕ B) As mentioned previously, the openings and closings are insensitive to the location of the origin of the structuring element. Let us consider Br instead
190
M. Van Droogenbroeck 100 Boat Theoretical lower bound
10
1
0.1
0.01
0.001
10
20
30
40
50 60 Size n
70
80
90
100
FIGURE 11 Percentage of opening anchor with respect to a size parameter n; B is an n × n square structuring element and the percentage is the ratio of the cardinality of Aγ B ( f ) to the image size. The figure also displays the lower bound established later in the chapter.
ˇ −r , this of B and compute the corresponding neighborhood. As (Bˇr ) = ( B) ˇ ˇ ˇ neighborhood becomes (Br ⊕ Br )x = (B ⊕ B)x+r −r = (B ⊕ B)x . Also, note ˇ x in all that B ⊕ Bˇ always contains the origin, which means that x ∈ (B ⊕ B) cases. To the contrary, if B does not contain the origin, the neighborhood of x for a dilation (that is, Bˇ x , does not contain x, nor does Bx for the erosion). Let us now consider that o ∈ B. Then the dilation is extensive: f (x) ≤ δ B ( f )(x). If f is bounded, then there exists r ∈ E such that f (r ) is the upper bound of f . As r belongs to its own neighborhood and f (r ) is an upper bound, δ B ( f )(r ) ≤ f (r ) too. This means that (r, f (r )) is an anchor with respect to the dilation: δ B ( f )(r ) = f (r ). In other words, the (r, f (r )) pair of an upper bound is an anchor for the dilation when the structuring element B contains the origin. In contrast to the cases of dilations and erosions, the number of anchors for the opening is not limited by the number of lower or upper bounds. To get a better lower bound of the cardinality of anchors, we establish a relationship between erosion anchors and openings. By definition and according to Eq. (18), B ( f )(x) =
^
f (x + b) =
b∈B
^
f (q).
(21)
q∈Bx
As B is finite, there exists q ∈ Bx such that B ( f )(x) = f (q).
(22)
Anchors of Morphological Operators and Algebraic Openings
191
Next we show that (q, f (q)) is an anchor for the opening. As before, note ˇ q . Now that q ∈ Bx implies x ∈ ( B) γ B ( f )(q) =
_
B ( f )(r )
(23)
ˇ q r ∈( B)
≥ B ( f )(x) = f (q).
(24) (25)
As before, we use the anti-extensivity property of an opening, that is γ B ( f ) ≤ f . This proves that γ B ( f )(q) = f (q) and therefore (q, f (q)) is an anchor for the opening. The following theorem establishes a formal link between erosion and opening anchors. Theorem 2. If B is finite and x is any location in the domain of definition of f , then B ( f )(x) = γ B ( f )( p)
(26)
for some p ∈ Bx . Moreover ( p, f ( p)) is an anchor for γ B ( f ), that is γ B ( f )( p) = f ( p).
(27)
The density of anchors for the opening is thus related to the size of Bx . It ˇ x -like neighborhood, there is an anchor for is also true that for each (B ⊕ B) γ B ( f ). To prove this result, remember that γ B ( f )(x) = B ( f )( p) = f (q)
(28)
for some p ∈ Bˇ x and q ∈ B p . Next, we want to prove that (q, f (q)) is an anchor. By definition, γ B ( f )(q) may be written as γ B ( f )(q) =
_
B ( f )(r ).
(29)
ˇ q r ∈( B)
ˇ q implies that q ∈ Br . Then, according to Eq. (28), However r ∈ ( B) γ B ( f )(q) =
_
B ( f )(r ) ≥ B ( f )( p)
(30)
ˇ q r ∈( B)
and, as B ( f )( p) = f (q), γ B ( f )(q) ≥ B ( f )( p) = f (q).
(31)
192
M. Van Droogenbroeck
But openings are anti-extensive, which means that γ B ( f )(q) ≤ f (q). This proves that (q, f (q)) is an opening anchor: Theorem 3. If B is finite and x is any point in the domain of definition of f , then γ B ( f )(x) = γ B ( f )(q) = f (q)
(32)
ˇ x. for some q ∈ (B ⊕ B) Theorems 2 and 3 lead to bounds for the number of anchors because they establish the existence of anchors locally. Intuitively, regions with a constant grey-scale value contain more anchor points; in such a neighborhood all points will be anchors. But the number of anchors is also related to the size of the structuring element. Theorem 3 specifies that at least one opening ˇ x . Surprisingly, it is Theorem 2, anchor exists for each region of type (B ⊕ B) which links erosion to opening, that provides the tightest lower bound for the density of opening anchors: 1 . #(B)
(33)
This limit is the minimum proportion of opening anchors contained in an image; it is plotted on Figure 11. It is reachable only if E can be tiled by translations of B. Where such tiling is not possible, for example, when B is a disk, this bound is conservative. Note also that the number of opening anchors is expected to decrease when the size of B increases. This phenomenon is illustrated in Figure 12, where opening anchors have been overwritten in black.
2.4. Algorithmic Properties of Morphological Anchors In addition to providing a weak bound for the number of anchors, Theorem 3 has an important practical consequence. It shows that all the information needed to compute γ B ( f ) is contained in its opening anchors. In other words, from a theoretical point of view, it is possible to reconstruct γ B ( f )(x) from a subset of Aγ B ( f ). The only pending question is how to determine this subset of Aγ B ( f ). Should an algorithm be able to detect the location of opening anchors that influence their neighborhood, it would provide the opening for each x immediately. Unfortunately, unless f (x) has been processed previously and information on anchors has been collected, there is no way to locate anchor points. But with an appropriate scanning order and a linear structuring element, it is possible to retain some information about f to locate anchor points effectively. Such an algorithm has been proposed by Van Droogenbroeck and Buckley (2005). Figure 13 shows the computation times of such an algorithm for a very large image and a linear structuring
Anchors of Morphological Operators and Algebraic Openings
(a) f
(c) γ 11×11(f )
193
(b) γ 3×3(f )
(d) γ 21×21(f )
FIGURE 12 Density of opening anchors for increasing sizes of the structuring element. From left to right, and top to bottom: original (a) and openings with a squared structuring element B (of size 3 × 3, 11 × 11, and 21 × 21 respectively).
element L whose length varies. For this figure, one image was built by tiling pieces of a natural image, the other was filled randomly to consider the worst case. An interesting characteristic of this algorithm is that the computation times decrease with the size of the structuring element. To explain this behavior, remember that the number of anchors also decreases with the size of B. Because the algorithm is based on anchors, there are fewer anchors to be found. Once an anchor is found, it is efficient in propagating this value in its neighborhood. We have thus so far worked on the opening, but we can use Theorem 2 and anchors for a different algorithm to compute the erosion. Because the set of erosion anchors may be empty, we cannot rely on erosion anchors to develop an algorithm to compute the erosion. However, it is known (Heijmans, 1994) that the erosion of f is equal to the erosion of γ B ( f ):
194
M. Van Droogenbroeck 1.6 Opening by anchors (random image) Opening by anchors (natural image)
Computation time [s]
1.4 1.2 1 0.8 0.6 0.4 0.2 0
0
5
10
15
20
25
30
Length of the structuring element (in pixels)
FIGURE 13
Computation times on two images (of identical size).
1.4 Erosion by anchors (natural image) Opening by anchors (natural image)
Computation time [s]
1.2 1 0.8 0.6 0.4 0.2 0
0
5 10 15 20 25 Length of the structuring element (in pixels)
30
FIGURE 14 Computation times of two algorithms that use opening anchors to compute the erosion and the morphological opening.
B ( f ) = B (γ B ( f )) for any function f and B. The conclusion is that the computation of erosions should be based on opening anchors rather than on erosion anchors. Computation times of such an algorithm for several erosions are displayed in Figure 14, side by side to that of the opening. The algorithm for the erosion is slower for two reasons: Anchors are to be propagated in a smaller neighborhood and the propagation process is more complicated than in the case of the opening. However, this shows that opening anchors are also useful for the computation of erosions.
Anchors of Morphological Operators and Algebraic Openings
195
Note that the relative position of the computation times curves is unusual. Openings are defined as the cascade of an erosion followed by a dilation, so slower computation of openings would be expected. Figure 14 contradicts this belief. To close the discussions on morphological anchors, let us examine the impact of the shape of B on the implementation. The shape of B is usually not arbitrary: Typical shapes include lines, rectangles, circles, hexagons, and so on. If B is constrained to contain the origin or to be symmetric, we can derive useful properties for implementations. Suppose, for example, that ( p, f ( p)) is an anchor with respect to the erosion B ( f ) and that B contains the origin o. Then the dilation is extensive (δ B ( f ) ≥ f ) and therefore f ( p) = B ( f )( p) ≤ δ B ( B ( f ))( p) = γ B ( f )( p).
(34)
But openings are anti-extensive (γ B ( f ) ≤ f ) so that γ B ( f )( p) = f ( p). In other words, an anchor for B ( f ) is always an anchor for γ B ( f ) when B contains the origin as below, Theorem 4. If o ∈ B and ( p, f ( p)) is an anchor for the erosion B ( f ), then ( p, f ( p)) ∈ Aγ B ( f ) . Another interesting case occurs when B is symmetric (that is, when ˇ This covers B being a rectangle, a circle, an hexagon, and B = B). so on (many software packages propose only morphological operations with symmetric structuring elements to facilitate handling border effects). Anchors of operations with B and Bˇ then coincide and it is equivalent to scan images in one order or in the reverse order.
3. ANCHORS OF ALGEBRAIC OPENINGS The existence of anchors has been proven for morphological openings. The question is whether the existence of anchors still holds for other types of openings, or even for any algebraic opening. From a theoretical perspective, an operator is called an algebraic opening if it is increasing, anti-extensive, and idempotent. Therefore, algebraic openings include but are not limited to morphological openings. Known algebraic openings are area openings (Vincent, 1992), openings by reconstruction (Salembier & Serra, 1995), attribute openings (Breen & Jones, 1996), and so on. The family of algebraic openings is also extensible, as there exist properties, like the one given hereafter, that can be used to engineer new openings.
196
M. Van Droogenbroeck
Proposition 3. If γi is an algebraic opening for every i ∈ I , then the supremum W i∈I γi is an algebraic opening as well. Attribute openings are most easily understood in the binary case. Unlike morphological openings, attribute openings preserve the shape of a set X , because they simply test whether or not a connected component satisfies some increasing criterion 0, called an attribute. An example of valid attribute consists of preserving a set X if its area is superior to λ and removing it otherwise. This is, in fact, the surface area opening. More formally, the attribute opening γ0 of a connected set X preserves this set if it satisfies the criterion 0: γ0 (X ) =
X, if X satisfies 0, ∅, otherwise.
(35)
The definition of attribute openings can be extended to nonconnected sets by considering the union of all their connected components. Since the attribute is increasing, attribute openings can be directly generalized to greyscale images using the threshold superposition principle. Such openings always have anchors. But do all openings have anchors? The reason we fail to prove that all openings have anchors is as follows. Let us consider an algebraic opening γ . Since γ is increasing (as it is an opening), γ is upper bounded by the identity operator: γ ≤ I . Assume now that A f (γ ) = ∅, then γ < I . Remember that γ is also anti-extensive; it follows that γ γ ≤ γ I . Would it instead be here that γ γ < γ I (this property is not true!), then using the property of idempotence γ γ = γ , and one would conclude that γ < γ , which is impossible and anchors would exist in all the cases. But γ γ ≤ γ I and not γ γ < γ I , so that we derive that the anti-extensivity itself does not provide a strict order and that it gives some freedom on the operator to allow functions not to have some anchors. The properties of an algebraic opening are not sufficient to guarantee the existence of anchors. We need to introduce additional requirements on an algebraic opening to ensure the existence of anchors. Openings that explicitly refer to the threshold value can have no anchor. Remember the case of the quantization operator Q2 applied on an odd image. Obviously, if f (x) = 3, Q2 ( f )(x) = 2; there is no anchor. Similarly, consider an operator ψ( f )(x) = x ∧ f (x). This operator is an opening, but if g(x) = x + 1, ψ(g)(x) = x; again, there is no anchor. This time the opening does not refer to threshold levels but explicitly to the location, and not the relative location. Two constraints are considered hereafter. The first constraint, spatiality, relates to the usual notion of neighborhood as used in the section on morphological anchors, and the second constraint, shift invariance, relates to the ordering of function values.
Anchors of Morphological Operators and Algebraic Openings
197
Definition 6. An operator ϕ is shift-invariant if for every function f , it is equivalent to translate f vertically by v (v ∈ R) and apply ϕ or to apply ϕ on the vertical translate f v (see previous definition of a vertical translate). In formal terms, for every function f and every real value v (v ∈ R): ϕ f v (x) = ϕ( f (x) + v) = ϕ( f )(x) + v. (36)
3.1. Spatial and Shift-Invariant Openings Section 2 showed that the minima of a function automatically provide anchors for every morphological opening. A simple example suffices to show that this property does not necessarily hold for any opening. Let us ¯ defined as reconsider the previous examples and a constant function k, ¯ = 2. ¯ In addition, ¯k(x) = k for all x ∈ E . If ψ( f ) = Q2 ( f ), then ψ(3) ¯ 6= 3. ¯ Therefore, the processing for ψ( f )(x) = x ∧ f (x), we have ψ(3) of a constant function by an algebraic opening can produce a nonconstant function or a constant that takes a different value. If entropy is meant here as the cardinality of grey-scale values after processing, then to the contrary of what morphological operators suggest, the entropy of an algebraic opening may increase. Obviously, these situations do not occur for spatial openings. Morphological openings are a particular case of spatial openings, denoted ξ hereafter. We have proven that the minimum values of a function are anchors with respect to a morphological opening. Let us denote by min f , the minimum of a lower bounded function f , and assume that the minimum is reached for p ∈ E . Because ξ is an opening, ξ( f ) ≤ f for any function f . In particular, ξ( f )( p) ≤ f ( p) = min f . By definition of spatiality, for every location, including p, there exists a location q such that ξ( f )( p) = f (q). But such a value is lower bounded by min f . Therefore, ξ( f )( p) ≥ min f , and ξ( f )( p) = f ( p) = min f . Theorem 5. Consider a spatial opening ξ . Then global minima of f provide all anchors for ξ . This theorem can also be rephrased in the following terms: Provided a set of grey-scale values of a function processed by an opening is a subset of the original set of grey-scale values, there are anchors. Indirectly, it also proves the existence of anchors for any spatial opening; to some extent, it generalizes Theorem 1. Let us now consider the shift-invariance property. From a practical point of view, shift-invariance means that functions can handle offsets, or equivalently that offsets have no impact on the results except that the result is shifted by the same offset. This is an acceptable theoretical assumption, but in practice images are defined by on a finite set of integer values (typically {0, . . . , 255}); handling an offset requires redefining the range of grey-scale values to maintain the full dynamic of values.
198
M. Van Droogenbroeck
Consider a shift-invariant operator ϕ. Imagine, for a moment, that there is no anchor with respect to ϕ. Since ϕ is anti-extensive (as it is an opening), ϕ( f )(x) ≤ f (x) becomes ϕ( f )(x) < f (x)
(37)
for every x ∈ E . In other words, there exists λ > 0 such that ϕ( f )(x) + λ ≤ f (x).
(38)
By increasingness, ϕ(ϕ( f ) + λ) ≤ ϕ( f ). After some simplifications and using the shift-invariance property, ϕ(ϕ( f ) + λ) = ϕ(ϕ( f )) + λ = ϕ( f ) + λ ≤ ϕ( f ), which is equivalent to λ ≤ 0. But this conclusion is incompatible with our initial statement on λ. Therefore, Theorem 6. Every shift-invariant opening ϕ has one or more anchors. For every function f , Aϕ ( f ) 6= ∅.
(39)
A subsequent question is whether the minimum is an anchor, regardless of the type of opening. Let us build a constant function filled with the minimum value of f (x); this function is denoted τ¯min . Since an anchor does exist for τ¯min , at least some of the values of τ¯min are anchors, though not necessarily all of them (see previous discussions for ψ( f )(x) = x ∧ f (x)). Through increasingness, τ¯min ≤ f implies γ (τ¯min ) ≤ γ ( f ), where γ is an algebraic opening. Anti-extensivity implies that γ ( f ) ≤ f . We can conclude that there exists p ∈ E such that γ (τ¯min )( p) = τ¯min ( p) ≤ γ ( f )( p) ≤ f ( p). So, if f ( p) = τmin , then τmin = γ ( f ( p)). Therefore, Theorem 7. If the set of anchors with respect to an algebraic opening is always non-empty, then at least one global minimum of a function f is an anchor for that opening. This theorem applies for morphological, spatial, and shift-invariant openings but in the two first cases, we have proven that all minima are anchors. Note, however, that anchors should always exist for this property to be true. Neither the quantization operator Q2 nor ψ( f )(x) = x ∧ f (x) meet this requirement.
3.2. Granulometries In practice, one uses openings that filter images with several different degrees of smoothness. For example, one opening is intended to maintain many details; another opening filters the image to obtain a background image. When the openings are ordered, we have a granulometry.
Anchors of Morphological Operators and Algebraic Openings
199
Definition 7. A granulometry on Fun(E ) is a one-parameter family of openings {γr |r > 0}, such that γs ≤ γr ,
if s ≥ r.
(40)
If γs ≤ γr , then γs γr ≥ γs γs = γs . Also, γr ≤ I implies that γs γr ≤ γs , so that γs γr = γs . The identity γr γs = γs is proved analogously. It follows that a family of operators of granulometry also satisfies the semigroup property: γr γs = γs γr = γs ,
s ≥ r.
(41)
As a result, anchor sets are ordered like the openings of a granulometry as below. Theorem 8. Anchor sets of a granulometry {γr |r > 0} on Fun(E ) are ordered according to Aγs ( f ) ⊆ Aγr ( f ) .
(42)
There is a similar statement for morphological openings. Suppose B contains A (that is, A ⊆ B) and B ◦ A = B, then, according to Haralick, Sternberg, and Zhuang (1987), γ B ( f ) ≤ γ A ( f ).
(43)
For example, B is a circle and A is a diameter, or B is a square and A is one side of the square. Note that A ⊆ B is not sufficient to guarantee that γ B ( f ) ≤ γ A ( f ). Applying Theorem 8, we obtain Corollary 1. For any function f , if A ⊆ B, B ◦ A = B, and A, B are both finite, then Aγ B ( f ) ⊆ Aγ A ( f ) .
(44)
This theorem is essential for morphological granulometries. It tells us that if we order a family of morphological openings, anchor sets will be ordered (reversely) as well. In fact, Vincent (1994) developed on algorithm based on the concept of opening trees that is based on this property.
4. CONCLUSIONS Anchors are features that characterize an operator and a function. This chapter has discussed the properties of an opening and shown how they related to anchors. First, we have established properties valid for
200
M. Van Droogenbroeck
morphological operators. Anchors then depend on the size and shape of the chosen structuring element. For example, it has been proven that anchors do always exist for openings and that global minima are anchors. The concept of a structuring element is not explicitly present any longer for algebraic openings. It also appears that some algebraic openings have no anchor for some functions. However, with additional constraints on the openings (that is, spatiality or shift-invariance), the framework is sufficient to ensure the existence of anchors for any function f . In addition, it has been proven that the existence of anchors then implies that some global minima are anchors. This is an interesting property that could lead to new algorithms in the future.
REFERENCES Arce, G., & Gallagher, N. (1982). State description for the root-signal set of median filters. IEEE Transactions on Acoustics, Speech and Signal Processing, 30(6), 894–902. Arce, G., & McLoughlin, M. (1987). Theoretical analysis of the max/median filter. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(1), 60–69. Astola, J., Heinonen, P., & Neuvo, Y. (1987). On root structures of median and median-type filters. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(8), 1199–1201. Beucher, S., & Lantu´ejoul, C. (1979). Use of watersheds in contour detection. In International workshop on image processing, Rennes, CCETT/IRISA. pp. 2.1–2.12. Breen, E. J., & Jones, R. (1996). Attribute openings, thinnings, and granulometries. Computer Vision and Image Understanding, 64(3), 377–389. Eberly, D., Longbotham, H., & Aragon, J. (1991). Complete classification of roots to onedimensional median and rank-order filters. IEEE Transactions on Acoustics, Speech and Signal Processing, 39(1), 197–200. Eckhardt, U. (2003). Root images of median filters. Journal of Mathematical Imaging and Vision, 19(1), 63–70. Gallagher, N., & Wise, G. (1981). A theoretical analysis of the properties of median filters. IEEE Transactions on Acoustics, Speech and Signal Processing, 29(6), 1136–1141. Haralick, R., Sternberg, S., & Zhuang, X. (1987). Image analysis using mathematical morphology. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(4), 532–550. Heijmans, H. (1994). Morphological image operators. In Advances in electronics and electron physics series. Boston: Academic Press. Maragos, P. (1989). Pattern spectrum and multiscale shape representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 701–716. Matheron, G. (1975). Random sets and integral geometry. New York: Wiley. Najman, L., & Talbot, H. (2008). Morphologie math´ematique 1: approches d´eterministes. Paris: Hermes Science Publications. Salembier, P., & Serra, J. (1995). Flat zones filtering, connected operators, and filters by reconstruction. IEEE Transactions on Image Processing, 4(8), 1153–1160. Serra, J. (1982). Image analysis and mathematical morphology. New York: Academic Press. Van Droogenbroeck, M. (1994). On the implementation of morphological operations. In J. Serra, & P. Soille (Eds.), Mathematical morphology and its applications to image processing (pp. 241–248). Dordrecht: Kluwer Academic Publishers. Van Droogenbroeck, M., & Buckley, M. (2005). Morphological erosions and openings: fast algorithms based on anchors. Journal of Mathematical Imaging and Vision, 22(2–3), 121–142. (Special Issue on Mathematical Morphology after 40 Years). van Herk, M. (1992). A fast algorithm for local minimum and maximum filters on rectangular and octogonal kernels. Pattern Recognition Letters, 13(7), 517–521.
Anchors of Morphological Operators and Algebraic Openings
201
Vincent, L. (1992). Morphological area openings and closings for greyscale images. In Proc. Shape in Picture ’92, NATO Workshop, Driebergen, The Netherlands: Springer-Verlag. Vincent, L. (1994). Fast grayscale granulometry algorithms. In J. Serra, & P. Soille (Eds.), Mathematical morphology and its applications to image processing (pp. 265–272). Dordrecht: Kluwer Academic Publishers. Vincent, L., & Soille, P. (1991). Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(6), 583–598.
Chapter
6 Temporal Filtering Technique Using Time Lenses for Optical Transmission Systems Dong Yang, Shiva Kumar, and Hao Wang
Contents
1. Introduction 2. Configuration of a Time-Lens–based Optical Signal Processing System 3. Wavelength Division Demultiplexer 4. Dispersion Compensator 5. Optical Implementation of Orthogonal Frequency-Division Multiplexing Using Time Lenses 6. Conclusions Acknowledgment Appendix A Appendix B Appendix C References
203 206 211 216 219 226 227 227 228 229 231
1. INTRODUCTION The analogy between the spatial diffraction and temporal dispersion has been known for years (Kolner, 1994; Papoulis, 1994; Van Howe & Xu, 2006). In spatial domain, an optical wave propagating in free space diverges due to diffraction. As an analog, in the temporal domain, an optical pulse propagating in a dispersive medium broadens due to dispersion. This spacetime duality can also be extended to lenses. The conventional space lens produces quadratic phase modulation on the transverse profile of the input Department of Electrical and Computer Engineering, McMaster University, Canada Advances in Imaging and Electron Physics, Volume 158, ISSN 1076-5670, DOI: 10.1016/S1076-5670(09)00011-1. c 2009 Elsevier Inc. All rights reserved. Copyright
203
204
Dong Yang et al. (1)
β 21
(2)
β 21
(1)
β 22
Time lens 1 f1
(2)
β 22
Time lens 2 f1
Temporal filter
f2
f2
FIGURE 1 Scheme of a typical 4-f system. PM1 = phase modulator 1; PM2 = phase modulator 2. (Based on Yang et al. (2008).)
beam and its analog, time lens, simply applies a quadratic phase modulation to a temporal optical waveform. On this basis, several temporal analogs to spatial systems based on thin lenses have been created and many real-time optical signal processing applications, including temporal imaging, pulse compression, and temporal filtering based on time-lens systems have been proposed (Azana & Muriel, 2000; Berger, Levit, Atkins, & Fischer, 2000; Kolner & Nazarathy, 1989; Lohmann & Mendlovic, 1992). The temporal filtering technique was first proposed by Lohmann and Mendlovic (1992). In their pioneering work, a temporal filter was introduced in a 4-f configuration consisting of time lenses. In the spatial domain, a conventional lens produces the Fourier transform (FT) at the back focal plane of an optical signal placed at the front focal plane, which is known as a 2-f configuration or 2-f subsystem. The spatial filter placed at the back focal plane modifies the signal spectrum, and a subsequent 2-f subsystem provides the Fourier transform of the modified signal spectrum, which returns the signal to the spatial domain with spatial inversion. There exists an exact analogy between spatial filtering and temporal filtering techniques (Lohmann & Mendlovic, 1992). In the case of temporal filtering, the spatial lens is replaced by a time lens (which is nothing but a phase modulator), and spatial diffraction is substituted with secondorder dispersion. Yang, Kumar and Wang (2008) discuss a modified 4-f system consisting of two time lenses (Figure 1). In this 4-f configuration, the subsystem T1 provides the Fourier/inverse FT of the input signal. The signs of chirp and dispersion coefficients are reversed in the 2-f subsystem T2 after the temporal filter. In contrast, Lohmann and Mendlovic (1992) proposed the 4-f configuration in which the signs of the dispersion coefficients of the 2-f subsystems are identical. This implies that the second 2-f subsystem (Lohmann & Mendlovic, 1992) provides the FT of the modified signal spectrum leading to a time-reversed bit pattern, whereas in the approach used by Yang et al. (2008), it provides the inverse Fourier transform (IFT) so that the output bit pattern is not reversed in time. The approach proposed
Temporal Filtering Technique Using Time Lenses for Optical Transmission Systems
205
by Yang et al. (2008) has no spatial analog since the sign of spatial diffraction cannot be changed (Goodman, 1996, chap. 4). Based on the 4-f configuration consisting of time lenses, three applications have been numerically implemented. One of them is a tunable wavelength division demultiplexer (Yang et al., 2008). The wavelength division multiplexing (WDM) demultiplexer is realized using the temporal band pass filter in a 4-f system. The temporal filter is realized using an amplitude modulator, and the channel to be demultiplexed can be dynamically changed by changing the input voltage to the amplitude modulator. The passband of the temporal filter is chosen to be at the central frequency of the desired channel with a suitable bandwidth such that only the signal carried by the channel to be demultiplexed passes through it with the least attenuation. The wavelength division multiplexed signal passes through the 2-f subsystem T1 (see Figure 1) and its FT in the time domain is obtained. After the temporal filter, the signals in any other undesired channels are blocked and the signal in the desired channel then passes through the 2-f subsystem T2, which finally produces the demultiplexed signal as its original bit sequence by the exact IFT. Another application of the 4-f temporal filtering scheme is a higherorder dispersion compensator (Yang et al., 2008). The temporal filter in this application is realized by a phase modulator. To compensate for fiber dispersion, the time domain transfer function of the phase modulator has the same form as the frequency domain transfer function of fiber but the signs of dispersion coefficients are opposite. At the temporal filter, the Fourier transformed input signal is multiplied by the time domain transfer function of the phase modulator so that fiber dispersion–induced phase shift is canceled out. Finally, an implementation of orthogonal frequency-division multiplexing (OFDM) in the optical domain using Fourier transforming properties of time lenses is discussed (Kumar & Yang, 2008). The first 2-f subsystem provides the IFT of the input signal that carries the information. This input signal is obtained by the optical/electrical time division multiplexing of several channels. The kernel of the IFT is of the form exp (i2π f t) and therefore, IFT operation can be imagined as the multiplication of the signal samples of several channels by the optical subcarriers. These subcarriers are orthogonal and therefore, the original signal can be obtained by a Fourier transformer at the receiver, which acts as a demultiplexer. The temporal filter between T1 and T2 is characterized as the transfer function of a fiber-optic link with nonlinearity. The second 2-f subsystem provides the FT of the output from the fiber link so that the convolution of the input optical signal and the fiber transfer function in time domain is converted into the product of the FT of both, but still in time domain. The output of the Fourier transformer is the product of the original input signal (input signal of T1) and the phase correction due to the fiber transfer
206
Dong Yang et al.
function. The photodetector responds only to the intensity of the optical signal and therefore, the deleterious effects introduced by the fiber that appear as the phase correction are removed by the photodetector. This occurs because the fiber transfer function due to dispersive effects is of the type exp [iβ( f )]. In the absence of fiber nonlinearity, optical carriers are orthogonal and the Fourier transformer at the receiver demultiplexes the subcarriers without introducing any penalty. However, strong nonlinear effects can destroy the orthogonality of the optical subcarriers. Nevertheless, the simulation in Kumar and Yang (2008) shows that the time-lens–based optical OFDM system scheme has good tolerance to the fiber nonlinearity.
2. CONFIGURATION OF A TIME-LENS–BASED OPTICAL SIGNAL PROCESSING SYSTEM Figure 1 shows the modified 4-f system based on time lenses (Yang et al., 2008). This 4-f system consists of two cascaded 2-f subsystems T1 and T2, each of which contains a time lens and two segments of single-mode fibers (SMFs) that are symmetrically placed on both sides of the time lens. A time lens is a temporal analog of a space lens and it is realized by an electro-optic phase modulator that can generate quadratic phase modulation. The signal spectrum can be modified using a temporal filter. The temporal filter can be realized using an amplitude and/or phase modulator. The transfer function of the temporal filter can be changed by changing the input voltage to the amplitude and/or phase modulator. For the proper operation of this system, two phase modulators and the temporal filter should be properly synchronized. If Tk is the absolute time at the phase modulator k, k = 1, 2, then they are related by T2 = T1 + f 1 /υg1 + f 2 /υg2 ,
(1)
where υg1 and υg2 are the group speeds of the fibers after the time lens 1 and just before the time lens 2, respectively. f 1 and f 2 are the fiber lengths as shown in Figure 1. This delay between the driving voltages of the phase modulators can be achieved using a microwave delay line. Similarly, the absolute time T f at the temporal filter is related to T1 by T f = T1 + f 1 /υg1 ,
(2)
where f 1 is the length of the SMF in T1. Propagation of the optical signal in a system consisting of a dispersive SMF and the time lens (such as the 2-f subsystem T1 in Figure 1) results in the FT of the input signal at the length 2 f 1 . We use the following definitions
Temporal Filtering Technique Using Time Lenses for Optical Transmission Systems
207
of FT pairs: U˜ (ω) = F{u(t)} = u(t) = F −1 {U˜ (ω)} =
1 2π
Z
+∞
u(t) exp(iωt)dt −∞ Z +∞
U˜ (ω) exp(−iωt)dω,
(3) (4)
−∞
where u(t) is a temporal function and U˜ (ω) is its FT. The time lens is implemented using an optical phase modulator whose time domain transfer function is given by h j (t) = exp iC j t 2 ,
j = 1, 2,
(5)
where C j is the chirp coefficient of the phase modulator j in the 2-f subsystem T j , j = 1, 2, and t is the time variable in a reference frame that moves with an optical pulse. Using Eq. (5) in Eq. (3), we obtain the FT of the transfer function h j (t): p ω2 H˜ j (ω) = iπ/C j exp −i 4C j
! ,
j = 1, 2.
(6)
Suppose that the field envelope of the input signal is u(t, 0), and the (k) (k) corresponding FT is U˜ (ω, 0). Let β21 and β22 be the dispersion coefficients of the first and second fibers in the subsystem Tk , k = 1, 2, respectively, and f k be the length of the SMF in Tk , k = 1, 2. Before the time lens of T1 (z = f 1− ), the temporal signal is (Agrawal, 1997) 1 u(t, f 1− ) = 2π
Z
+∞
−∞
U˜ (ω, 0) exp
i (1) 2 β f 1 ω − iωt dω. 2 21
(7)
The behavior of the time lens is described as follows: u(t, f 1+ ) = u(t, f 1− ) h 1 (t).
(8)
Because the product in the time domain becomes a convolution in spectral domain, taking the FT of Eq. (8), we obtain U˜ (ω, f 1+ ) = F {u(t, f 1− )h 1 (t)} Z ∞ 1 = U˜ ω − ω0 , f 1− H˜ 1 (ω0 )dω0 , 2π −∞
(9)
208
Dong Yang et al.
where U˜ (ω, f 1− ) is the FT of u(t, f 1− ). Hence, at the end of the first 2-f subsystem T1, we obtain 1 u(t, 2 f 1 ) = 2π
Z
∞
U˜ (ω, f 1+ ) exp
−∞
i (1) 2 β f 1 ω − iωt dω. 2 22
(10)
Substituting Eq. (9) into Eq. (10), by choosing the focal length as (Lohmann & Mendlovic, 1992) (1) f1 = 1 2β22 C1 ,
(11)
and after some algebra (see Appendix A), we obtain the following: √ iπ/C1 ˜ U u(t, 2 f 1 ) = (1) 2π β22 f 1
t (1)
β22 f 1
! , 0 exp (−iφres ) ,
(12)
where the residual phase term is given by
φres
(1) (1) β + β 21 22 2 1 = (1) − t . 2 (1) β22 f 1 2 β22 f1
(13)
Let us consider the case when the temporal filter is absent. This can be divided into two cases. (1)
(1)
(1)
Case 1: β22 = β21 = β2 In this case, φres = 0 and
√ iπ/C1 ˜ U u(t, 2 f 1 ) = (1) 2π β2 f 1
t (1)
β2 f 1
! ,0 .
(14)
. (2) (2) (2) (2) Choosing β21 = β22 = β2 and f 2 = 1 2β22 C2 in the 2-f subsystem T2 and noting that (1) (1) (1) F{U˜ (t/β2 f 1 , 0)} = 2π β2 f 1 u −β2 f 1 ω, 0 ,
(15)
Temporal Filtering Technique Using Time Lenses for Optical Transmission Systems
209
we finally obtain (see Appendix B) (1)
u(t, 2 f 1 + 2 f 2 ) = u −
β2 f 1 (2)
β2 f 2
! t, 0 ,
(16)
where 2 f 1 + 2 f 2 is the total length of the time-lens system. For the 4-f configuration proposed by Lohmann and Mendlovic (1992), the magnification factor is defined as (1)
M =−
β f1 C2 = − 2(2) . C1 β f2
(17)
2
(1) (2) If sgn β2 = sgn β2 , from Eq. (16) it follows that M is negative. Defining a positive stretching factor m = |M|, Eqs. (17) and (16) can be rewritten as (1)
(2)
β2 f 1 = mβ2 f 2
and C2 = mC1
(18)
and u (t, 2 f 1 + 2 f 2 ) = u (−mt, 0) ,
(19)
which shows that T2 provides the scaled FT of its input and then leads to an inverted image of the signal input of the 4-f configuration (Lohmann & Mendlovic, 1992). If M = −1, T2 provides the exact FT and this leads to the reversal of the bit sequence within a frame, which requires additional signal processing in the optical/electrical domain to recover the original bit sequence. In Yang et al. (2008), the 4-f of Lohmann and Mendlovic (1992) system (1)
is reconfigured. Suppose sgn β2 rewritten as (1)
(2)
(2)
= −sgn β2
β2 f 1 = −mβ2 f 2
, Eqs. (17) and (16) can be
and C2 = −mC1
(20)
and u(t, 2 f 1 + 2 f 2 ) = u(mt, 0),
(21)
which shows that the 2-f subsystem T2 provides the scaled IFT so that the signal at the end of the 4-f system is not time-reversed. If M = 1, the output signal is identical to the input signal. We note that in spatial optical
210
Dong Yang et al.
TABLE 1 Comparison of fiber dispersions and phase coefficients for different time-lens systems Parameters
PM1
PM2
T1
Function
T2
C1
C2
(1) β21
(1) β22
(2) β21
(2) β22
Case 1
+ − + −
+ − − +
+ − + −
+ − + −
+ − − +
+ − − +
Time reversal Time reversal No time reversal No time reversal
Case 2
+ −
− +
− +
+ −
− +
+ −
No time reversal No time reversal
Based on Yang et al. (2008).
signal processing, it is not possible to obtain both direct and inverse Fourier transformation since the sign of diffraction cannot be changed (Goodman, 1996, chap. 4). Table 1 lists the signs of fiber dispersions and chirp coefficients of subsystems T1 and T2 to produce signals with and without time reversal. (1) (1) Case 2: β22 = −β21 In this case, φres = exp −i
!
t2
(22)
(1)
β22 f 1
and √ iπ/C1 ˜ U u(t, 2 f 1 ) = (1) 2π β22 f 1
t (1)
β22 f 1
! , 0 exp −i
t2
!
(1)
β22 f 1
.
(23)
If we configure the 2-f subsystem T2 with reversed parameters, (2)
(1)
β21 f 2 = −β22 f 1 ,
C2 = −C1 ,
(2)
(1)
β22 f 2 = −β21 f 1 ,
(24)
then we find that the input signal of T1 can be exactly recovered at the end of T2 (see Appendix C), u(t, 2 f 1 + 2 f 2 ) = u(t).
(25)
Lines 5 and 6 in Table 1 present two possible configurations of Case 2. The (2) (1) result of Eq. (25) has a simple physical explanation. When β21 f 2 = −β22 f 1 , the accumulated dispersion of the second fiber of T1 is compensated by that
Temporal Filtering Technique Using Time Lenses for Optical Transmission Systems
211
of the first fiber of T2, leading to unity transfer function. After that, since chirp coefficients C1 and C2 are of opposite sign, they cancel each other too, making the transfer function from PM1 to PM2 (see Figure 1) to unity. Finally, the accumulated dispersion of the first fiber of T1 is compensated by that (1) (2) of the second fiber of T2 when β22 f 2 = −β21 f 1 . Thus, the total transfer function of the 4-f system is unity. By inserting a temporal filter between two 2-f subsystems (see Figure 1), we can easily fulfill various kinds of optical signal processing in the time domain. The important advantage of the time-lens–based temporal filtering technique is that the transfer function of the temporal filter can be dynamically altered by changing the input voltage to the amplitude/phase modulator and therefore, this technique could have potential applications for switching and multiplexing in optical networks. Next, we provide two examples of optical signal processing based on this time-lens temporal filtering technique to show the potential advantages of this temporal filtering technique. One is a tunable wavelength division demultiplexer and the other is a higher-order fiber dispersion compensator.
3. WAVELENGTH DIVISION DEMULTIPLEXER For fiber-optic networks, tunable optical filters are desirable so that center wavelengths of the channels to be added or dropped at a node can be dynamically changed. Tunable optical filters are typically implemented using directional couplers or Bragg gratings. Yang et al. (2008) discuss a temporal filtering technique for the implementation of a tunable optical filter. As an example, let us consider the case of a 2-channel WDM system at 40 Gb/s channel with a channel separation of 200 GHz. Let the input to the 4-f system be the superposition of two channels as shown in Figure 2b. Here we ignore the impairments caused by the fiber-optic transmission and assume that the input to the 4-f system is the same as the transmitted multiplexed signal. In this example, we have simulated a random bit pattern consisting of 16 bits in each channel. The bit “1” is represented by a Gaussian pulse of width 12.5 ps. We assume that the bit rate is 40 Gb/s and therefore the signal bandwidth for each channel is approximately 80 GHz (Returnto-zero (RZ) signal with duty cycle of 0.5). Thus, the channel separation of 200 GHz is wide enough to avoid channel interference. Assume that channel 1 is centered at the optical carrier frequency ω0 and channel 2 is centered at ω0 + 21ω, where 21ω is the channel separation. A band pass filter that is also centered at ω0 and with double-side bandwidth of 21ω can allow channel 1 to transmit while it blocks the channel 2. In this case, the temporal band pass filter is realized using an amplitude modulator—for instance, an
212
Dong Yang et al. (b) 8
Channel 1 Power (mW)
1
0 –200
–100
(a)
0 Time (ps)
100
0 –200
M
200
4
U
0 Time (ps)
200
100
Power (mW)
CH2
200
0 –200
200
1
H(t)
q(t,2f1–)H(t)
–80
0 80 Time (ps) H(t)
Temporal Filter
T2 1
0 200
Transmission
Power (mW)
X 1
–100
100
CH1
|q(t,2f1– )|2
2 Power (mW)
0 Time (ps)
(c)
Channel 2 2
0 –200
–100
H(t) (Arb. units)
Power (mW)
2
q(t,2f1–) T1
q(t,0)
Demuliplexer 0 –200
–100
0 Time (ps)
100
200
(d) Demultiplexed signal
FIGURE 2 A WDM demultiplexer based on a 4-f time-lens system (M = 1). (a) Input signals from channel 1 and channel 2. (b) Multiplexed output signal. (c) Combined signals before and after the temporal filter. (d) Demultiplexed signal in channel 1. (Based on Yang et al. (2008).)
electroabsorption modulator (EAM), with a time domain transfer function H (t) = (1)
(1)
1, 0,
(1)
|t/β2 f 1 | ≤ 1ω otherwise,
(26)
(1)
where β21 = β22 = β2 is the dispersion coefficient of the SMF in the 2-f subsystem T1 and 1ω/2π = 100 GHz. In this section and in Section 4, ( j) ( j) ( j) we assume that β21 = β22 = β2 , j = 1, 2, and M = 1 unless otherwise specified. Considering the realistic implementation of the highspeed amplitude modulator, we need to choose the parameters of the 4-f system carefully. The state-of-the-art EAM operates up to a bit rate of 40 Gb/s (Choi et al., 2002; Fukano, Yamanaka, Tamura & Kondo, 2006), which implies that it can be turned on and off with a temporal separation of 25 ps. From Eq. (26), we see that the amplitude modulator should be turned on for a duration (1) 1T = 2 β2 f 1 1ω
(27)
Temporal Filtering Technique Using Time Lenses for Optical Transmission Systems
213
and then it is turned off. Setting 1T ≥ 25 ps, we find that (1) β2 f 1 ≥ 20 ps2 .
(28)
Equation (28) can be satisfied using a dispersion compensation module (1) (DCM) with dispersion coefficient β2 = 123 ps2 /km and length f 1 = 1 km. From Eq. (27), we find 1T = 155 ps. To implement IFT in T2, we use a (2) standard SMF with dispersion coefficient β2 = −21 ps2 /km and length (2) (1) f 2 = 5.86 km, which leads to β2 f 2 = −β2 f 1 and M = 1. Figure 2 shows a wavelength division demultiplexer based on a 4-f timelens system. After the 2-f subsystem T1, we obtain the FT of the multiplexed signal in time domain, given by h i q(t, 2 f 1− ) = U˜ 1 (ω) + U˜ 2 (ω)
, (1) ω=t/ β2 f 1
(29)
where U˜ 1 and U˜ 2 are the spectra of the signals for channel 1 and channel 2, respectively. Then it passes through the temporal filter defined by Eq. (26), and the output is given by h i q(t, 2 f 1+ ) = H (t) U˜ 1 (ω) + U˜ 2 (ω) = U˜ 1 (ω)
(1) ω=t/ β2 f 1
, (1) ω=t/ β2 f 1
(30)
and as shown in Figure 2c, the signal from channel 2 is blocked. Thus, at the input end of the 2-f subsystem T2, only the signal from channel 1 is retained. Finally, we obtain the demultiplexed signal for channel 1 at the output of the 2-f subsystem T2 as shown in Figure 2d. According to Eq. (21), it is q(t, 2 f 1 + 2 f 2 ) = u 1 (t) for m = 1, which is identical to the original input of the channel 1. As can be seen, the data in channel 1 can be successfully demultiplexed. For a practical implementation, the quadratic phase factor of Eq. (5) cannot increase indefinitely with time and therefore, a periodic time lens is introduced in Kumar (2007). It is given by
h j (t) =
+∞ X n=−∞
h 0 j t − nt f ,
(31)
214
Dong Yang et al. (a) Input 2 1
Power (mW)
0 –600
–400
–200
0
200
400
600
200
400
600
200
400
600
(b) Output, M = –1 tf
2 1 0 –600
–400
–200
0 (c) Output, M = 1
2 1 0 –600
–400
–200
0 Time (ps)
FIGURE 3 Input and output bit sequences of the WDM demultiplexer based on a 4-f time-lens system. (a) Input. (b) Output with time reversal, M = −1. (c) Output without time reversal, M = +1. Guard time tg = 0 and t f = 400 ps. (Based on Yang et al. (2008).)
where h 0 j (t) =
( exp iC j t 2 , otherwise
|t|