ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 144
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES-CNRS Toulouse, France
HONORARY ASSOCIATE EDITORS
TOM MULVEY BENJAMIN KAZAN
Advances in
Imaging and Electron Physics
E DITED BY
PETER W. HAWKES CEMES-CNRS Toulouse, France
VOLUME 144
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier 525 B Street, Suite 1900, San Diego, California 92101-4495, USA 84 Theobald’s Road, London WC1X 8RR, UK ∞ This book is printed on acid-free paper.
Copyright © 2006, Elsevier Inc. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the Publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (www.copyright.com), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-2005 chapters are as shown on the title pages. If no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/2006 $35.00 Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, E-mail:
[email protected]. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.” For information on all Elsevier Academic Press publications visit our Web site at www.books.elsevier.com ISBN-13: 978-0-12-014786-1 ISBN-10: 0-12-014786-6 PRINTED IN THE UNITED STATES OF AMERICA
06 07
08 09
9 8 7 6 5 4 3 2 1
CONTENTS
C ONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . ix P REFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi F UTURE C ONTRIBUTIONS . . . . . . . . . . . . . . . . . . . . . . xiii
Recent Progress in High Frequency Electron Cyclotron Resonance Ion Sources D ENIS H ITZ Introduction . . . . . . . . . . . . . . . . . . . . . . . . I. Fundamental Aspects of ECRIS . . . . . . . . . . . . . . . A. Production of Multiply Charged Ions . . . . . . . . . . B. Electron Confinement and Heating . . . . . . . . . . . C. Ion Temperature and Confinement . . . . . . . . . . . . D. Plasma Effects . . . . . . . . . . . . . . . . . . . . . II. Wave Coupling in an ECRIS . . . . . . . . . . . . . . . . A. rf Waves . . . . . . . . . . . . . . . . . . . . . . . . B. Examples of Utilized Couplings in ECRIS . . . . . . . . C. Conclusion . . . . . . . . . . . . . . . . . . . . . . . III. VUV Diagnostics of ECRIS Plasmas . . . . . . . . . . . . A. Experimental Set-up and Data Processing . . . . . . . . B. Experimental Results . . . . . . . . . . . . . . . . . . C. Discussion . . . . . . . . . . . . . . . . . . . . . . . D. Conclusion . . . . . . . . . . . . . . . . . . . . . . . IV. Magnetic Confinement . . . . . . . . . . . . . . . . . . . A. Scaling Laws . . . . . . . . . . . . . . . . . . . . . . B. Examples of Magnetic Systems . . . . . . . . . . . . . C. Conclusion . . . . . . . . . . . . . . . . . . . . . . . V. Design of Various Electron Cyclotron Resonance Ion Sources A. Main Parameters of a Well-Performing ECRIS . . . . . . B. All Permanent Magnet ECRIS . . . . . . . . . . . . . . C. Room Temperature ECRIS . . . . . . . . . . . . . . . D. Compact Superconducting ECRIS . . . . . . . . . . . . E. Fully Superconducting ECRIS . . . . . . . . . . . . . . v
. . . . . . . . . . . . . . . . . . . . . . . . .
. 2 . 4 . 5 . 6 . 10 . 12 . 15 . 15 . 20 . 31 . 31 . 32 . 38 . 44 . 47 . 47 . 48 . 59 . 75 . 75 . 75 . 76 . 88 . 99 . 115
vi
CONTENTS
F. Discussion . . . . G. Conclusion . . . . VI. Industrial Applications A. Implantation . . . B. Photon Lithography C. Conclusion . . . . VII. Conclusion . . . . . . References . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
116 133 135 136 138 148 148 150
Fixed Points of Lattice Transforms and Lattice Associative Memories G ERHARD R ITTER AND PAUL G ADER I. II. III. IV. V. VI. VII. VIII. IX. X. XI. XII. XIII. XIV. XV.
Introduction . . . . . . . . . . . . . . . . . . . . Pertinent Basic Properties of Lattices . . . . . . . . Matrices and Lattice-Ordered Groups . . . . . . . . Lattice Dependence and Independence . . . . . . . Lattice Associative Memories . . . . . . . . . . . . Lattice Dependence and Fixed Points . . . . . . . . Convex Sets and Polytopes in Rn . . . . . . . . . . Linear Subspaces and Orientation in Rn . . . . . . . The Shape of F (X) . . . . . . . . . . . . . . . . . Remarks Concerning the Dimensionality of F (X) . . Strong Lattice Independence . . . . . . . . . . . . Pattern Reconstruction from Noisy Inputs . . . . . . Kernel Vectors . . . . . . . . . . . . . . . . . . . Associative Memories Based on Dendritic Computing Conclusion . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
165 167 170 171 175 181 185 187 193 198 203 213 221 228 238 238
An Extension of Mathematical Morphology to Complex Signals J EAN -F. R IVEST I. Introduction . . . . . . . . . . . . . . II. Order Relationship and Complementation A. Properties of the Order Relationship . B. Order Relationship . . . . . . . . . C. Complementation . . . . . . . . . . D. Umbra . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
243 247 248 250 252 253
vii
CONTENTS
E. Maximum (∨) and Minimum (∧) . . . . . . . . III. Dilations and Erosions . . . . . . . . . . . . . . . IV. Openings, Closings, and Morphological Filters . . . V. Geodesy . . . . . . . . . . . . . . . . . . . . . . A. Structuring Element . . . . . . . . . . . . . . . B. Dilations and Erosions . . . . . . . . . . . . . C. Reconstructions . . . . . . . . . . . . . . . . . D. Openings and Closings by Reconstruction . . . . E. Regional Maxima and Minima . . . . . . . . . . F. Domes and Lakes . . . . . . . . . . . . . . . . VI. Top Hats . . . . . . . . . . . . . . . . . . . . . . VII. Morphological Gradients . . . . . . . . . . . . . . VIII. Complex Watershed . . . . . . . . . . . . . . . . IX. Measurements . . . . . . . . . . . . . . . . . . . A. Basic Measurements and Minkowski Functionals B. Granulometries and Pattern Spectra . . . . . . . C. Power Granulometry . . . . . . . . . . . . . . D. Examples . . . . . . . . . . . . . . . . . . . . X. Conclusion . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
254 255 258 261 262 262 264 266 268 269 272 272 275 278 278 279 280 282 286 288
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
291 292 294 295 295 297 298 300 300 301 303 303 303 308 310 310
Ranking Metrics and Evaluation Measures J IE Y U , JAUME A MORES , N ICU S EBE , AND Q I T IAN I. Introduction . . . . . . . . . . . . . . . . . . . . A. Similarity Estimation in Computer Vision . . . . B. Distance Metric as Similarity Measurement . . . II. Distance Metric Analysis . . . . . . . . . . . . . . A. Maximum Likelihood Approach . . . . . . . . . B. Distance Metric Analysis . . . . . . . . . . . . C. Generalized Distance Metric Analysis . . . . . . III. Boosting Distance Metrics for Similarity Estimation . A. Motivation . . . . . . . . . . . . . . . . . . . B. Boosted Distance Metrics . . . . . . . . . . . . C. Related Work . . . . . . . . . . . . . . . . . . IV. Experiments and Analysis . . . . . . . . . . . . . A. Distance Metric Analysis in Stereo Matching . . B. Distance Metric Analysis in Motion Tracking . . C. Boosted Distance Metric on Benchmark Data Set D. Boosted Distance Metric in Image Retrieval . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
viii
CONTENTS
V. Discussion and Conclusions . . . . . . . . . . . . . . . . . . . 312 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 I NDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
CONTRIBUTORS
Numbers in parentheses indicate the pages on which the authors’ contributions begin.
JAUME A MORES (291), IMESIA Research Group, INRIA, Rocquencourt, France PAUL G ADER (165), University of Florida, Gainesville, Florida 32611, USA D ENIS H ITZ (1), Commissariat à l’Energie Atomique, Direction des Sciences de la Matière, Département de Recherche Fondamentale sur la Matière Condensée, Service des Basses Températures, CEA-Grenoble, F38054 Grenoble cedex 9, France G ERHARD R ITTER (165), University of Florida, Gainesville, Florida 32611, USA J EAN -F. R IVEST (243), Defense R&D Canada, Ottawa K1A 0Z4, Canada N ICU S EBE (291), Faculty of Science, The University of Amsterdam, Amsterdam, The Netherlands Q I T IAN (291), Department of Computer Science, The University of Texas at San Antonio, San Antonio, Texas 78249, USA J IE Y U (291), Department of Computer Science, The University of Texas at San Antonio, San Antonio, Texas 78249, USA
ix
This page intentionally left blank
PREFACE
The four chapters that make up this volume range over ion sources, neural networks, mathematical morphology, and image retrieval. We begin with a masterly account of electron cyclotron resonance ion sources by D. Hitz, with which many heavy-ion facilities are equipped. But although the first sources of this kind came into operation some decades ago, there are aspects of their physics that remain imperfectly understood. This presentation, which has the status of a monograph on the subject, is intended at once to help designers of these sources to find good configurations, and also to advance theoretical knowledge about their performance. Next comes a chapter by G.X. Ritter (who has already contributed a most instructive account of image algebra to these pages) and P. Gader on lattice associative memories. This contains extensive discussion of artificial neural networks using lattice-based matrix operations. Readers from the world of mathematical morphology will find much to interest them here, as will students of neural networks. The meticulous and also very readable account of the mathematical background will be much appreciated. In the third contribution, J.-F. Rivest explores a very new aspect of mathematical morphology. From the way that this subject is usually presented, it seems to be applicable only to real signals, but is this necessarily true? J.-F. Rivest shows that mathematical morphology can be extended to complex signals, at least to those that are best analysed in terms of their (complex) envelope. The first problem is, of course, that there is no intuitive way of ordering complex quantities and J.-F. Rivest therefore begins by creating such an ordering. This enables him to extend the familiar morphological operators to the complex domain and in subsequent sections he examines all the usual themes of mathematical morphology: geodesy, morphological gradient, watersheds, and granulometry among others. The volume ends with a discussion of ranking metrics and evaluation measures by J. Yu, J. Amores, N. Sebe, and Q. Tian. The notion of similarity is essential for image retrieval and it is the various measures of similarity that are the subject of this contribution. As always, I thank all the contributors for the trouble they have taken to make their material accessible to a wide readership. Forthcoming contributions are listed in the following pages. Peter W. Hawkes xi
This page intentionally left blank
FUTURE CONTRIBUTIONS
G. Abbate New developments in liquid–crystal-based photonic devices S. Ando Gradient operators and edge and corner detection A. Asif Applications of noncausal Gauss–Markov random processes in multidimensional image processing C. Beeli Structure and microscopy of quasicrystals V.T. Binh and V. Semet Cold cathodes G. Borgefors Distance transforms A. Buchau Boundary element or integral equation methods for static and time-dependent problems B. Buchberger Gröbner bases T. Cremer Neutron microscopy H. Delingette Surface reconstruction based on simplex meshes A.R. Faruqi Direct detection devices for electron microscopy R.G. Forbes Liquid metal ion sources C. Fredembach Eigenregions for image classification xiii
xiv
FUTURE CONTRIBUTIONS
S. Fürhapter Spiral phase contrast imaging L. Godo and V. Torra Aggregation operators A. Gölzhäuser Recent advances in electron holography with point sources M.I. Herrera The development of electron microscopy in Spain K. Ishizuka Contrast transfer and crystal images J. Isenberg Imaging IR-techniques for the characterization of solar cells K. Jensen Field-emission source mechanisms L. Kipp Photon sieves G. Kögel Positron microscopy T. Kohashi Spin-polarized scanning electron microscopy W. Krakow Sideband imaging R. Leitgeb Fourier domain and time domain optical coherence tomography B. Lencová Modern developments in electron optical calculations H. Lichte (vol. 150) New developments in electron holography Z. Liu Exploring third-order chromatic aberrations of electron lenses with computer algebra W. Lodwick Interval analysis and fuzzy possibility theory
FUTURE CONTRIBUTIONS
L. Macaire, N. Vandenbroucke and J.-G. Postaire Color spaces and segmentation M. Matsuya Calculation of aberration coefficients using Lie algebra S. McVitie Microscopy of magnetic specimens S. Morfu and P. Marquié Nonlinear systems for image processing M.A. O’Keefe Electron image simulation D. Oulton and H. Owens Colorimetric imaging N. Papamarkos and A. Kesidis The inverse Hough transform R.F.W. Pease (vol. 150) Miniaturization K.S. Pedersen, A. Lee and M. Nielsen The scale-space properties of natural images I. Perfilieva Fuzzy transforms E. Rau Energy analysers for electron microscopes H. Rauch The wave-particle dualism E. Recami Superluminal solutions to wave equations P.E. Russell and C. Parish Cathodoluminescence in the scanning electron microscope G. Schmahl X-ray microscopy J. Serra (vol. 150) New aspects of mathematical morphology R. Shimizu, T. Ikuta and Y. Takai Defocus image modulation processing in real time
xv
xvi
FUTURE CONTRIBUTIONS
S. Shirai CRT gun design methods H. Snoussi Geometry of prior selection T. Soma Focus-deflection systems and their applications I. Talmon Study of complex fluids by transmission electron microscopy G. Teschke and I. Daubechies Image restoration and wavelets M.E. Testorf and M. Fiddy Imaging from scattered electromagnetic fields, investigations into an unsolved problem M. Tonouchi Terahertz radiation imaging N.M. Towghi Ip norm optimal filters D. Tschumperlé and R. Deriche Multivalued diffusion PDEs for image regularization E. Twerdowski Defocused acoustic transmission microscopy Y. Uchikawa Electron gun optics C. Vachier-Mammar and F. Meyer Watersheds K. Vaeth and G. Rajeswaran Organic light-emitting arrays M. van Droogenbroeck and M. Buckley Anchors in mathematical morphology M. Wild and C. Rohwer Mathematics of vision
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 144
Recent Progress in High Frequency Electron Cyclotron Resonance Ion Sources DENIS HITZ Commissariat à l’Energie Atomique, Direction des Sciences de la Matière, Département de Recherche Fondamentale sur la Matière Condensée, Service des Basses Températures, CEA-Grenoble, F38054 Grenoble cedex 9, France
Introduction . . . . . . . . . . . I. Fundamental Aspects of ECRIS . . . . . . A. Production of Multiply Charged Ions . . . B. Electron Confinement and Heating . . . . 1. Magnetic Configuration . . . . . . 2. Electron Confinement . . . . . . . 3. Electron Heating . . . . . . . . C. Ion Temperature and Confinement . . . . 1. Ion Temperature . . . . . . . . 2. Ion Confinement . . . . . . . . D. Plasma Effects . . . . . . . . . II. Wave Coupling in an ECRIS . . . . . . A. rf Waves . . . . . . . . . . . 1. Microscopic Description . . . . . . 2. Macroscopic Description: Eigenmodes . . B. Examples of Utilized Couplings in ECRIS . 1. Coaxial Coupling . . . . . . . . 2. Rectangular Coupling . . . . . . . 3. Utilization of the Double Frequency . . . 4. Application to Pulsed Regime . . . . 5. Frequencies Above 20 GHz . . . . . 6. Multifrequency Transmission Line . . . C. Conclusion . . . . . . . . . . . III. VUV Diagnostics of ECRIS Plasmas . . . . A. Experimental Set-Up and Data Processing . 1. Experimental Set-Up . . . . . . . 2. Data Processing . . . . . . . . . B. Experimental Results . . . . . . . . 1. Quadrumafios Source . . . . . . . 2. Caprice Source . . . . . . . . . C. Discussion . . . . . . . . . . . D. Conclusion . . . . . . . . . . . IV. Magnetic Confinement . . . . . . . . A. Scaling Laws . . . . . . . . . . 1. Axial Magnetic Confinement: Injection Side 2. Axial Magnetic Confinement: Extraction Side
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 4 5 6 6 8 9 10 10 11 12 15 15 15 20 20 21 24 26 27 28 30 31 31 32 32 34 38 38 42 44 47 47 48 49 53
1 ISSN 1076-5670/05 DOI: 10.1016/S1076-5670(06)44001-5
Copyright 2006, Elsevier Inc. All rights reserved.
2
HITZ
3. Axial Magnetic Confinement: Minimum-B . . . . 4. Radial Magnetic Confinement . . . . . . . 5. Magnetic Scaling Laws . . . . . . . . . B. Examples of Magnetic Systems . . . . . . . . 1. Axial Magnetic Field . . . . . . . . . . 2. Radial Magnetic Field . . . . . . . . . . C. Conclusion . . . . . . . . . . . . . . V. Design of Various Electron Cyclotron Resonance Ion Sources A. Main Parameters of a Well-Performing ECRIS . . . B. All Permanent Magnet ECRIS . . . . . . . . 1. Microwave Coupling . . . . . . . . . . 2. Source Design . . . . . . . . . . . . C. Room Temperature ECRIS . . . . . . . . . 1. Example of Source Design . . . . . . . . 2. Usual Performances of Room Temperature ECRIS . D. Compact Superconducting ECRIS . . . . . . . 1. Mirror Field . . . . . . . . . . . . . 2. Hexapolar Field . . . . . . . . . . . . 3. Total Magnetic Field . . . . . . . . . . 4. Cryogenic Aspect . . . . . . . . . . . 5. Mechanical Design . . . . . . . . . . . 6. Existing Hybrid ECRIS: Some Results . . . . . E. Fully Superconducting ECRIS . . . . . . . . F. Discussion . . . . . . . . . . . . . . 1. Microwave Power . . . . . . . . . . . 2. Pulsed ECRIS . . . . . . . . . . . . 3. Charge State Distribution . . . . . . . . . 4. Ion Beam Shape . . . . . . . . . . . 5. Plasma Electrode Position . . . . . . . . . G. Conclusion . . . . . . . . . . . . . . VI. Industrial Applications . . . . . . . . . . . A. Implantation . . . . . . . . . . . . . B. Photon Lithography . . . . . . . . . . . 1. Orders of Magnitude . . . . . . . . . . 2. Experimental Results . . . . . . . . . . 3. Powerful ECR Light Source . . . . . . . . 4. Other Applications of EUV Sources . . . . . . 5. Conclusion . . . . . . . . . . . . . C. Conclusion . . . . . . . . . . . . . . VII. Conclusion . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54 56 58 59 59 67 75 75 75 76 80 84 88 89 90 99 100 105 109 109 113 114 115 116 116 123 127 130 132 133 135 136 138 140 141 143 147 148 148 148 150
I NTRODUCTION For about 30 years electron cyclotron resonance ion sources (ECRISs) have equipped a large number of heavy ion facilities for atomic or nuclear physics and for other applications such as cancer treatment. Owing to their ion pro-
ELECTRON CYCLOTRON RESONANCE ION SOURCES
3
duction efficiency, their long-term stability, and their long lifetimes, transient applications can be achieved with this type of ion source. For example, “Gesellschaft für Schwerionen Forschung” (GSI) in Darmstadt, Germany uses a 14.5 GHz ECRIS (Hitz et al., 1996) for a large variety of applications like hadrontherapy with carbon ions or discovery of new super heavy elements. 269 110 (Darmstadium) was obtained by fusing 208 Pb with 62 Ni and 271 110 by fusing 208 Pb with 64 Ni (Hofman et al., 1996; Hofman, 1997). ECRISs are the outgrowth of the so-called open-ended mirror machines dedicated to controlled fusion research (Simonen, 1981). These machines were initially built for their confinement properties, but did not fulfill the Lawson criterion for fusion. As they were, however, well suited for the production of multiply charged ions, smaller open-ended machines were specially designed and optimized for their ion loss rates, e.g., the ions being extracted. Geller (1965, 1970), Consolino et al. (1969), Stix (1969) and Postma (1970) proposed the use of ECR heating (ECRH) to produce intense multiple ionizations of heavy ions. As soon as they produced multiply charged ions, atomic physics data relevant to magnetic confinement fusion were obtained (see, for example, Bliman et al., 1981). As they are first optimized for their ion losses, ECRISs are always under a fundamental compromise: having high losses and strong confinement at the same time! And even though the physics of open-ended mirror machines is now well understood, ECRISs still suffer from a lack of understanding, although there are many groups working on this subject, such as Ivanov and Wiesemann (2005). In an ECRIS, multiply charged ions are produced in hot electron plasma confined by a magnetic structure. The physics governing the electron population is similar to that used in a tokamak. However, the ion population, as in an electron beam ion source (EBIS), is not directly controlled; this control is only indirect where cold plasma physics (ion energy is only a few eV), atomic physics (ionization, excitation, recombination, etc.), and surface physics (interaction plasma/wall leading to coating, electron emission and desorption) are all involved, making data interpretation more confusing. The complexity of the ECR hot electron plasmas, which are far from thermodynamical equilibrium, does not allow for precise evaluations or calculations to be performed for ECRIS. This article is written to help ECR ion source developers in the design of new ion source or improvement of existing machines. In Section II, fundamental aspects of ECRISs are presented, with a discussion of electron temperature and confinement and ion confinement. Then, as microwaves play a key role in these machines, Section III presents major guidelines for microwave launching and coupling to ECR plasma. Once ECR plasma is created, understanding this plasma is important in ion sourcery. Section IV
4
HITZ
is dedicated to plasma diagnostics with an emphasis on the determination of electron and ion density and temperature by vacuum ultraviolet (VUV) spectroscopy. Section V deals with the role of magnetic confinement and presents updated scaling laws. Section VI presents different types of ECRISs designed according to the main parameters previously described. Even if ECRISs are mostly used for the ions they produce, this chapter gives only a little information about extraction aspects and techniques to improve this ion source. For more details, the reader can refer to a complete description in ion source books (Geller, 1996; Brown, 2004). Finally, some industrial applications of ECRISs and ECR plasmas in general are presented.
I. F UNDAMENTAL A SPECTS OF ECRIS ECRISs are small mirror machines in which hot electron plasma is created by injecting a radiofrequency wave: electrons are heated when crossing a |B|-surface where the wave frequency is equal to the electron Larmor frequency. The magnetic configuration of these ion sources is realized by the superposition of an axial and a radial magnetic field. Gas or vaporized metal atoms are injected into the source and are ionized step by step by colliding with hot electrons. Ions are then extracted from the source by applying a high voltage between the plasma chamber and an extraction nozzle maintained at the ground potential. The extracted ion current to be optimized corresponds to the losses of the magnetic trap and can be written as: Iq ≈
nq qeV , τq
(1)
where nq is the density of the ion of charge state q (for simplicity only one atomic species is considered), τq is the confinement time of this ion, and V is the main plasma volume. Since their discovery, ECRISs have shown spectacular improvements in terms of extracted ion currents and ionized charge states, owing to a better understanding of the relevant plasma physics and to several technological developments. For example, xenon ions up to Xe13+ were observed in 1973 (Apard et al., 1973); the 5 µA level shifted from Xe17+ with TRIPLEMAFIOS (Bliman et al., 1978) up to Xe35+ with GTS (Grenoble Test Source); O6+ intensity rose from 15 µA with SUPERMAFIOS up to 2 mA with GTS, that has the same coil technology (Hitz et al., 2004a), and even more than 2 mA with superconducting ECRISs (Leitner and Lyneis, 2005; Sun, 2006). Major parameters involved in this type of ion source are presented in the following sections. After a brief description of the production of multiply
ELECTRON CYCLOTRON RESONANCE ION SOURCES
5
charged ions, a section is dedicated to electron confinement and heating, while a final section deals with ion physics. A. Production of Multiply Charged Ions To produce multiply charged ions inside plasma, several criteria must be fulfilled at the same time. First, as ionization is done by electron impact, these electrons in the plasma must have enough energy to ionize the atoms up to the suitable charge state. Electron impact ionization cross sections can be calculated by semiempirical formulas, as, for example, those proposed by Lotz (1968) and Phaneuf et al. (1987). On the other hand, the cross section for a multiple ionization is very low, therefore, in an ECRIS, an ion Aq+ is obtained after successive ionizations of species A, i.e., A+ , A2+ , . . . , A(q−1)+ . This shows the importance of keeping these ions inside the plasma during the stripping phase. Finally, the confinement must not be perfect as the ions extracted from the ion source are those that leave the confinement trap. The quantity of ions Aq+ is then determined by the production, essentially defined by the electron impact ionization, and the losses, which could appear mainly by ionization of Aq+ giving A(q+1)+ , or charge exchange with neutrals. In a second order, losses could also appear by radiative or dielectronic recombination. Two types of ion sources use the electron impact ionization process: EBIS invented by Donets et al. (1969), and ECRIS invented by Geller et al. (Consolino et al., 1969). General information about EBIS, ECRIS, and other ion sources can be found in the book edited by Brown (2004), while general history of ECRIS is described by Geller in several articles like Geller (1976) or Geller (1998). In the following, rules governing ECRIS are presented. An ECRIS is basically a machine with a well-magnetized plasma chamber where microwaves are injected. The frequency microwave ωrf is equal to electron Larmor frequency on a closed magnetic surface inside the plasma chamber: eBecr ωce = = ωrf , (2) me where e is the electron charge, me is the electron mass, and Becr is the magnetic field value on the closed surface. When crossing this surface, those electrons that are in phase with the microwave electric field are accelerated and attain enough energy to ionize the atoms. Moreover, electron confinement is performed by a magnetic structure able to ensure a good confinement time to produce high charge states. The element to be ionized is injected either from a gas bottle or from a system able to produce a metallic vapor inside the plasma chamber (oven, laser, sputtering,
6
HITZ
etc.). Ions are produced inside the plasma and diffuse toward the extraction area where an electric field drags them to the accelerator. An ECRIS is the result of many compromises that could be presented as follows. In such a multiply charged ion source, the main ionization process is step by step. And then, to reach a charge state q, the ion of charge state q − 1 has to stay in the trap long enough to have a good probability of being ionized: the ion confinement time τq−1 has to be larger than the ionization ionis . The condition τ ionis time τq−1→q q−1 > τq−1→q is easier to fulfill for low charge states since they require short ion confinement times. However, maximizing the ion confinement time to get high charge states lowers the losses of the trap, i.e., the output current Iq we are interested in. In other words, maximizing Iq is the result of a compromise between two extreme situations: (1) τq large for having the largest density nq in the trap through the ionization mechanism, but in turn Iq small, and (2) τq small not to reduce the ion losses Iq , but in turn nq small and then Iq too! B. Electron Confinement and Heating 1. Magnetic Configuration An ECRIS is usually defined as a hot electron mirror machine characterized by two maxima (mirror throats), each being at one side of the ion source (the so-called injection and extraction) and a multipolar minimum-B structure. Most of todays ECRISs keep the initial design of R. Geller, which consists of a mirror field created by two sets of coils combined with a radial hexapolar field. Resulting magnetic flux tubes are shown in Figure 1 (when mapping the plasma center to the end of the source) as presented by Melin (1997). This design, done by Geller and co-workers, is an outgrowth of open-ended mirror machines initially built for their confinement properties in fusion research. Geller defined semiempirical scaling laws, deduced from experiments and from general plasma understanding, that still serve as basic rules for ECRIS designers (Geller, 1996). This magnetic profile can be achieved either by permanent magnets, copper coils, or superconducting coils. However, the radial magnetic field, which is usually hexapolar, is obtained by permanent magnets or superconducting coils. In a minimum-B structure, field lines are open, while iso |B| lines are closed with an eggshape (Figure 1). As compared with a simple mirror machine, this configuration minimizes radial particle drift and experiment performed with a pyrometric probe inserted inside the plasma showed that the hot electron plasma volume is roughly limited by this surface (Melin et al., 1990; Delaunay et al., 1991).
ELECTRON CYCLOTRON RESONANCE ION SOURCES
7
F IGURE 1. ECRIS magnetic flux tubes given by the sum of an axial mirror field and a radial hexapolar field (Melin, 1997).
As magnetic confinement is a key parameter for the production of intense beams of any charge, considerable work has been done to understand and optimize it. During the 1980s B. Jacquot and co-workers presented a new ECRIS (called Caprice) with a strong axial and radial confining field. They experimentally showed that the ECRIS output increased considerably as soon as the radial field reached, at the plasma chamber wall, a value of at least twice the resonance field. They also showed the importance of a high magnetic field at the injection throat. The final version of the 10 GHz Caprice source could deliver intense beams of highly charged ions (Jacquot et al., 1988). This first demonstration of the radial magnetic field value related to the resonance magnetic field was somewhat limited as the radial field was not adjustable (the radial magnetic field of the Caprice ECRIS was made of permanent magnets, where only the amount and type of magnetic material could be varied). And then, during the 1990s, T. Antaya, S. Gammino, and G. Ciavola performed experiments with the MSU superconducting ECRIS whose magnetic fields were all adjustable. They also demonstrated the importance of the so-called “High-B” mode where the magnetic field at injection is much higher than the magnetic field at extraction (Gammino et al., 1996). This latter experiment was performed at just one frequency (6.4 GHz) and had to be confirmed at several frequencies. Later, after the construction of the fully superconducting ion source SERSE by CEA-Grenoble and INFN/LNSCatania, other measurements were performed at 14 and 18 GHz showing the importance of the high-B mode concept (Gammino et al., 1999). However, the magnetic gradient at resonance defines the time the electrons stay around the resonance surface. Therefore, several new ECRISs use more
8
HITZ
than two coils to define the axial mirror magnetic field. This feature will be shown in another section in which several ECRISs are presented. 2. Electron Confinement In an ECRIS the electrons spiral around the magnetic field lines. They are axially confined because the magnetic field increases at each end of the field lines while they are confined in the radial direction due to the conservation of their magnetic moment and total energy: μe =
1 2 2 me v⊥
|B|
,
(3)
1 2 me v⊥ + v 2 , (4) 2 where v⊥ and v are the perpendicular and parallel components of the electron velocity. When the magnetic field increases, v⊥ also increases. As the total energy Ee is constant, v decreases. When Ee = μB, the parallel velocity is zero and the electron goes in a backward direction. If, at the maximum magnetic field, the electron still has a nonzero parallel velocity, it is going to be lost: it is in the so-called loss cone defined by (in velocity space): Ee =
Ee > μBmax
(5)
2 v 2 v⊥ (R − 1),
(6)
or
where R is defined as the mirror ratio: R≡
Bmax . Bmin
(7)
Bmax and Bmin are, respectively, the maximum and minimum magnetic fields. The loss cone equation can be written as 1 (8) μ μLC ≡ 1 − . R As they have greater mobility than ions, the electrons leave the plasma faster than the ions. And then, the plasma is set at a positive voltage to limit the electron losses and facilitate the ion losses. This fact has to be taken into account in the total energy conservation law. Let us define the plasma potential
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 2.
9
Region of confined plasma in a diagram (v, μ).
Vp as Vp =
me vp2 2e
.
(9)
Equation (8) then becomes μ2 μ2LC +
vp2
. (10) Rv 2 Low-energy electrons are confined by the plasma potential. Figure 2 shows the region of plasma confinement in (v, μ) space. In zone (a), i.e., lowenergy electrons, confinement is done by plasma potential. In zone (b), electrons are trapped in the magnetic mirror, with, however, some losses because of collisional processes. These two zones are separated by velocity vp determined by the plasma potential Vp . 3. Electron Heating In an ECRIS, electrons are heated by radiofrequency (rf) waves (frequency ω and wave vector k) that are injected in a direction parallel to the magnetic field. A resonance phenomenon appears when the frequency wave is equal to the Larmor frequency of the electrons. Another section will present waves that can be used for the heating process. Electron cyclotron heating plays an important role in electron confinement. The electron moves back and forth along the magnetic field lines and receives kicks of transverse velocity from the rf electric field. These kicks, which
10
HITZ
can be positive or negative, appear when the electron crosses the resonance surface. Example of electron heating is shown by Biri et al. (1995). At a resonance zone between wave and particle, there is a stochastic heating limited at high energy by adiabatic invariants (Jaeger et al., 1972) and the electron heating can be modeled as a diffusion process in velocity space. In an ECRIS, an electron mostly receives perpendicular energy by rf diffusion. And, it can be assumed that mean energy velocity of the electrons is similar to wave velocity, since too energetic electrons go to the lost cone by angular elastic diffusion. For a whistler mode, the dispersion relation is 2 ωp2 kc , (11) =1− ω ω(ω − ωc ) 2
where ωp is the plasma frequency (ωp = εne0 m ) and c the light velocity, which gives, when ω is close to the cyclotron resonance, an estimate of the wave vector 2 ωp2 kc ≈ . (12) ω ωkvm vm is the electron mean velocity, which could be replaced by the wave phase velocity as follows: ne Ee ≈ nc me c2 , (13) 2 is the electron mean energy, n is the electron density, and where Ee ≈ me vm e 2
nc is defined by n2c = ε0 me2e ω . Equation (13) shows the relationship between electron density and electron energy and shows the necessity of operating at the highest frequency possible to get dense and energetic plasma. C. Ion Temperature and Confinement 1. Ion Temperature As shown above, ions are produced by a step-by-step ionization process and then the ions must stay a sufficient amount of time inside the plasma. As the electrons have a mean energy of several keV, they might transfer their energy to the ions. This time needed for the energy transfer between electrons and ions is given by (Huba, 1994) q 2 3.210−9 Ln Λei q ni q , (14) υet e→i = 3/2 Ai Te i
ELECTRON CYCLOTRON RESONANCE ION SOURCES
11
where υet e→I is the energy transfer frequency (s−1 ), Ln Λ is the Coulomb q logarithm, Ai is the mass number of species i, ni is the ion density of species i having the charge q. This frequency, which is in order of Hz, gives an energy transfer time in the range of seconds, while the electron confinement time is in the millisecond range. Therefore, the ions are not heated by electrons and maintain an energy of a few eV. The lower the ion temperature, the better the confinement, as shown below. In addition, to enhance the ion intensity produced by an ECRIS, an additional gas is now commonly used, which is known as the “gas mixing” method. The role of gas mixing, extensively described by Drentje (2003), is now supposed to work as ion cooling, thereby giving a better ion confinement. 2. Ion Confinement The ion–ion collision frequency is given by Huba (1994): vij =
6.810−8 q 2 Ln Λij 1 q 2 Aj nj q , 3/2 Ai T q i
(15)
j
with an effective charge given by qeff =
1 q 2 n q . ne q j
(16)
Equation (15) shows that the time between two ion collisions τij = 1/vij is in the order of 10−8 s for a typical 14.5 GHz ECRIS. Ions are almost unmagnetized, with their mean free path λ being much shorter than their Larmor radius ρi . For a typical Ar10+ ion produced by a 14.5 GHz ECRIS, λ ≈ 0.05 mm and ρi ≈ 0.2 mm. Due to the high ion–ion collision frequency, the energy equipartition terms between ions rapidly becomes negligible, and ions are Maxwellian with the same temperature Ti , regardless of ion charge or species. Ion confinement is governed by diffusion through thermal motion and by ambipolar electric field because of the high electron mobility. For each mechanism, it is possible to consider an ion confinement time given by (for a charge q) (Douysset et al., 2000) √ ne qeff τq = 7.110−20 L2 q 2 Ln Λ A 5/2 (17a) Ti and
√ ne qeff τq = 7.110−20 Lq Ln Λ A 3/2 , Ti E
(17b)
12
HITZ
where L is a characteristic length (typically half-length of the plasma) and E is the electric field. Equation (17a) governs the random thermal motion while Eq. (17b) reflects the ion diffusion under electric field with an ion mobility given by qe μq = . (18) mvij Spectroscopic measurements showed that the ion confinement time is proportional to the ion charge and is not quadratic with the charge (Douysset et al., 2000) and it can be concluded that an ambipolar electric field mostly governs the ion confinement time. D. Plasma Effects In an ECRIS, it has been theoretically (Geller, 1996) and experimentally (Melin and Girard, 1997) shown that the electron distribution function (EDF) is not Maxwellian, although it can be divided into three electron populations, all being Maxwellian, for the sake of simplicity: cold, warm, and hot electrons. However, EDF can roughly be described by a single Maxwellian distribution, that of warm electrons that carry almost all electron energy. Tail electrons, or high-energy electrons, exist in ECRIS, as proven by several measurements (Barué et al., 1992, 1994); however, they are too hot to efficiently contribute to the ionization process. As colder electrons leave the plasma faster than the ions, a positive potential appears to accelerate the ions and to retard the electrons, so that the total current at the walls is zero. This plasma potential can be compared with that of a sheath between a wall and a plasma (Chen, 1984). Several experiments have been performed to measure its value and to check its behavior under different source conditions (see, for example, Xie and Lyneis, 1994; Tarvainen et al., 2004). This plasma potential strongly depends upon the ion density charge state distribution (CSD). Actually the high charge of ions partially compensates for their low mobility, so that ECRISs delivering very high charge state ions have a lower plasma potential. An ion source with low plasma potential also has other advantages; plasma chamber sputtering is minimized and the ion beam has a better emittance. The magnetic confinement also leads to a compromise. Indeed, in an ECRIS, the plasma confinement is limited by the following condition to keep a magnetohydrodynamic (MHD) stability: Pplasma β = 2 < 1, B 2μ0
(19)
ELECTRON CYCLOTRON RESONANCE ION SOURCES
13
where Pplasma is the plasma kinetic pressure, and B is the average value of the magnetic field in the region where the plasma lies. The electrons have a transverse energy with respect to the magnetic field lines. The kinetic pressure is written as Pplasma ≈ ne kTe .
(20)
The limitation (19) is caused by a loss of equilibrium that jeopardizes the plasma confinement; it is often called the mirror-mode limit. It basically comes from the perturbations of the magnetic field lines caused by the diamagnetic effect of the plasma, i.e., the depression of the magnetic field caused by the charged particles (both ions and electrons, although here the ions may be neglected, because they do not have significant energy). Assuming β constant, the achievement of a high electron density ne , and consequently of high ion densities of various charge state q through the charge neutrality equation ne = nq q (21) q
is subject to the condition (19) of having a high magnetic field, which may also require having a high rf frequency to fulfill condition (2). As a first conclusion, the expected ion losses of a magnetic trap, i.e., the extracted ion currents of an ECR ion source, would follow the scaling Iq ≈
nq qeV ∝ B 2. τq
(22)
Iq ∝ ωrf2 ,
(23)
Hence, the scaling
which is often referred to, as the ECR ion source performances, is confusing. It is rather a consequence of the magnetic field dependence through condition (2). According to the scaling (22), it can be argued that the higher the magnetic field, the better the source performances. This is, nevertheless, a potentiality: the magnetic trap can hold a higher electron density, which in turn can increase the ion densities, and then the extracted currents. But the question is how to increase the electron density ne ? A rough power balance of the ECR plasma in steady state can be written as Prf ne kTe ≈ , (24) V τe where Prf is the absorbed rf power, V the hot plasma volume, and τe the electron energy confinement time. The power spent to ionize the ions is
14
HITZ
negligible as compared to that necessary to sustain a given density ne of electrons at an average energy of about kTe . The electron energy confinement time τe has a dependence similar to the Spitzer collision time, i.e., 3/2
Te τe ≈ α , (25) na where α is roughly constant, depending upon qeff and weakly upon the axial mirror ratio. Experiments show that electron energy essentially depends upon neutral pressure. Thus, to the extent that Te is about constant, ECR plasma behavior is governed by the law n2e V 1/2 ≈ αTe ≈ constant. (26) Prf This expression (26) is only usable as long as ECRIS parameters vary between reasonable limits. It could typically be applied when comparing similar sources having close dimensions, frequency, magnetic fields, etc. The hot electron plasma is generally considered to be limited by the socalled resonance surface. And to get a thinner ion beam extracted from an ECRIS, one possibility would be to increase the radial magnetic field. However, a too large increase would lead to a plasma volume reduction and would reduce the desired ion current. But, to compensate for any volume reduction as well as to obtain the benefit of the increased magnetic field according to Eq. (19), it is necessary to increase the rf power so that the rf power density available in the reduced plasma volume allows the electron density ne to increase [according to Eq. (26)]. Thus, one may think that a strong radial magnetic field only presents advantages as it increases the source brightness (a higher density in a smaller volume along the source axis). However, handling a higher rf power makes the source running unstable (Hitz et al., 2000b). These instabilities are caused by the strong anisotropy of hot electrons in the velocity space: the transverse velocity is much larger than the longitudinal one with respect to the magnetic field lines. In an ECRIS, a large number of electrons that can trigger this instability exists; these electrons increase their free energy. This instability more or less always exists in ECRIS and is characterized by low-frequency relaxations (at a few Hz) that could be seen on the extracted currents. These relaxations themselves are the envelopes of much faster phenomenon (at a few GHz) called whistler microinstability, as studied by Garner et al. (1990). Then to compensate for this problem, an increase of the radial magnetic field necessarily involves an increase of rf frequency. Although, in that case, the plasma volume is kept almost constant, increasing Prf would increase ne [according to Eq. (26)] up to the limit defined by Eq. (19), but this will also trigger the kinetic instability, however at a higher rf power level.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
15
Any ECRIS designer keeps in mind that to get rid of this instability, a larger plasma volume ECRIS with higher magnetic fields would give the best performances. But, at this stage, another parameter also limits the source design: the number of multiply charged ions extracted from an ECRIS is directly related to the source cost. After this short presentation on electron and ion behavior in an ECRIS, the following section deals with one major parameter involved during any ECRIS design, i.e., principal aspects of microwave couplings.
II. WAVE C OUPLING IN AN ECRIS A. rf Waves Despite the high level of performance now achieved by ECRIS, launching rf power in ECR plasma still remains empirical. Most ECRISs work in a domain where the rf vacuum wavelength (λ0 ) is not very small compared to the plasma dimensions: for example, a 14 GHz room temperature ECRIS typically has a plasma chamber diameter of 70 mm for a vacuum wavelength of 20 mm. Boundary condition effects dominate and an ECRIS’s plasma chamber is usually considered as a multimode cavity with mode overlapping. 1. Microscopic Description In the following, we consider a single particle approximation as only electrons are sensitive to high frequency waves. However, as presented in the previous section, the hot plasma volume is considered as limited by the resonance surface. An electron moving along a magnetic field line in a mirror geometry with two resonances on both sides will rapidly become trapped between the resonances, as it undergoes a perpendicular energy increase when crossing the resonances. Injecting microwaves in a magnetic mirror like ECRIS leads to a stochastic heating (Jaeger et al., 1972). During the interaction, when an electron gains the energy δE = vδp, there is an energy loss by the microwave h¯ ω. This could be written as δE − h¯ ω = 0 = v⊥ δp⊥ + v δp − hω. ¯
(27)
When the electron gains an impulse δp parallel to the magnetic field line, the wave loses the same value h¯ k = hk. ¯ Equation (27) becomes vϕ = 0, (28) v⊥ δp⊥ + v δp 1 − v
16
HITZ
where vϕ = kω is the wave phase velocity, and v⊥ and p⊥ are, respectively, electron velocity and impulse perpendicular to the magnetic field line. If the electron velocity is much smaller than the phase velocity, it will mostly receive perpendicular energy with respect to the local magnetic field. However, if the electron velocity is much larger than the wave velocity, the rf interaction becomes an angular diffusion as Eq. (28) becomes v⊥ δp⊥ + v δp = 0.
(29)
However, angular diffusion lets the electrons go into the loss cone. In an ECRIS, electron velocity is then in the range of wave velocity, as above this velocity, electrons are going to be lost. In other words, during its back and forth movement along a magnetic field line, an electron receives a kick in perpendicular velocity when crossing the resonance zone only if there is an electric field Erf transverse to the magnetic field. An electric field parallel to the magnetic field would cause the electron to fall into the loss cone. In a first approximation, despite the magnetic field line shape in ECRIS, it can be assumed that the meaning of parallel is related to the main axis, since the hot plasma is created close to this axis. Thus, it is desirable to have the rf wave launched parallel to the main axis, with an electric field perpendicular to the axis, rather than parallel. As an rf wave is launched into an ECRIS, its energy has to be coupled to a plasma wave that will carry the rf power to the resonance surface to heat the electrons. In most ECRIS, the vacuum wavelength λ0 is shorter than the plasma dimensions. The plasma waves that may be involved are (a) in parallel or quasi-parallel propagation (wave vector k parallel to B); there are two possibilities as presented in Figure 3. 1. The whistler wave (electric field circularly polarized, perpendicular to k) ωrf < ωce absorbed at ωrf ≈ ωce in a decreasing magnetic field. This is the wave one should work with. Figure 4 presents two possibilities to inject this wave in an ECRIS. Iso-|B| contours obtained by the superposition of an axial mirror magnetic field and a hexapolar radial field are shown and as well as a magnetic field line. 2. A high frequency wave, sometimes called a maser wave, ωrf > ωce where ωpe =
ne e 2 ε 0 me
1/2 ,
(30)
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 3.
F IGURE 4.
17
Plasma waves in parallel propagation (k B).
Possible axial and radial whistler wave injection in an ECRIS.
√ the plasma frequency being fpe ≈ 9000 ne (fpe in Hz and ne in cm−3 ); eB ωce = , (31) me the cyclotron frequency being fce = 28 109 /T (fce in Hz); and ωR =
2 + ω2 )1/2 + ω (4ωpe ce ce
(right wave). (32) 2 (b) In perpendicular or quasi-perpendicular propagation (wave vector k perpendicular to B) (Figure 5). 1. O mode, whose electric field is linearly polarized, perpendicular to k, and parallel to B.
18
HITZ
F IGURE 5.
Plasma waves in perpendicular propagation (k⊥B).
2. X mode, whose electric field is elliptically polarized in the plane perpendicular to B where 2 2 1/2 + ωpe (upper hybrid frequency) (33) ωUH = ωce ωL =
2 + ω2 )1/2 − ω (4ωpe ce ce
(left wave). (34) 2 The waves experience resonances at (ωce , ωUH ) and cut-offs at (ωpe , ωR , ωL ). Figure 5 shows that the limitation given by the formula ωpe ωrf
(35)
which leads to the well-known scaling often referring to ECRIS performance (Iq )max ∝ ωrf2
(36)
is only for the O mode a priori not excited in ECRIS. Figure 6 presents how to inject these modes in an ECRIS. It is clear that such an injection needs an open magnetic structure for the radial magnetic field. Such an injection has been tested with the Grenoble Test Source (GTS), which showed that beam intensities extracted from the ion source are identical to those obtained with an axial whistler mode launching (Hitz et al., 2000b). Similar behavior has more recently been noticed (Koivisto et al., 2005). Such a perpendicular launching may be useful when the injection side of the plasma chamber is dedicated for another purpose, such as monocharged radioactive ion injection under the 1+ → n+ method.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 6.
19
Quasiperpendicular injection with O or X mode.
Considering the whistler right wave, at a resonance, there is absorption of cyclotron waves in a weakening magnetic field, which has been called a magnetic beach, in analogy with the dissipation of water waves on a sloping ocean beach (Stix, 1992). Figure 7 gives an image of this effect in an ECRIS. At a cut-off, the wave may convert into other waves or is reflected, or is partially absorbed. The preferred wave to use is the whistler wave, having a right-
F IGURE 7. Electron cyclotron heating by transverse acceleration of electrons in a magnetic field: “magnetic beach effect.”
20
HITZ
handed circularly polarized (RCHP) electric field that rotates in the electron gyromagnetic direction. This wave carries the rf power from the launching waveguide to the cyclotron resonance where it couples to the electrons. The fraction of the rf power not absorbed at the cyclotron resonance, owing to reflections either on cut-offs or on the chamber walls, may also be absorbed as an X wave at the upper hybrid resonance ωrf = ωUH . 2. Macroscopic Description: Eigenmodes Waves in ECRIS may also be described macroscopically, i.e., by considering the plasma chamber as a multimode cavity where many electromagnetic modes exist and may be selected depending on the rf coupling system. Eigenmodes of a cylindrical cavity (ECRIS plasma chamber) are TEm,n,p or TMm,n,p , where the indices stand, respectively, for the Ψ azimuthal, r radial, and z axial or parallel coordinates. They are actually standing waves formed by oblique waves with respect to the parallel direction, as their dispersion relation in vacuum can always be written in the form, 2 (m, n) + k 2 (p), k02 = k⊥
(37)
where k0 is the vacuum wave number. In an ideal situation, rf coupling in ECRIS is performed by injecting a wave that is coupled only to the right wave (parallel propagation, whistler wave), and is almost completely absorbed when crossing the resonance surface. In a real situation, a linearly polarized wave is injected in parallel propagation (along the main axis of the source), and may be reasonably coupled to the whistler wave. However, only some power of the right wave may be absorbed, and the power of the left wave is not absorbed at all. Therefore this excess of power not absorbed when crossing the resonance zone may be reflected on the walls of the plasma chamber. So there is a possible coupling to the cavity modes of the plasma chamber. As an example, let us consider the Caprice ECRIS (Jacquot et al., 1987) designed to be operated at 10 GHz frequency. The plasma chamber of this ion source is a multimode cavity 160 mm in length and 66 mm in diameter. Calculated frequencies for all modes in vacuum between 9.5 and 10.5 GHz for this cavity are shown in Table 1 (Lyneis, 1992). However, the rf coupling system may allow a specific mode to be selected to avoid the degeneracy of the electric field, in spite of the overlapping rf modes. One would prefer the TE1,n,p modes to the TMm,n,p modes. B. Examples of Utilized Couplings in ECRIS Most of the existing ECRIS have their plasma chamber dimensions several times larger than the vacuum wavelength. Plasma volume is also small
ELECTRON CYCLOTRON RESONANCE ION SOURCES
21
TABLE 1 P OSSIBLE M ODES IN VACUUM FOR A C APRICE ECRIS BETWEEN 9.5 GH Z AND 10.5 GH Z (Lyneis, 1992) Mode
Frequency
Mode
Frequency
TE3.1.8 TE1.1.10 TM3.1.3 TM5.1.3 TM2.1.6 TE2.2.1 TM0.1.10 TE2.2.2 TM3.1.4 TM1.1.9 TE0.1.9 TE5.1.4 TE4.1.7 TE1.2.7
9.55394 9.59107 9.63604 9.68456 9.71339 9.74641 9.84895 9.87620 9.93960 9.97518 9.97518 9.98665 10.0400 10.0556
TE2.2.3 TM1.2.0 TM1.2.1 TE0.2.1 TE2.1.10 TM2.1.7 TE3.1.9 TM3.1.5 TM1.2.2 TE0.2.2 TE5.1.5 TE2.2.4 TE1.1.11
10.0888 10.1513 10.1930 10.1930 10.2185 10.2657 10.2815 10.3167 10.3172 10.3172 10.3621 10.3791 10.4793
with respect to that of the rf cavity. As the plasma is absorbent, many existing eigenmodes may overlap and be present simultaneously at the same frequency. However, these modes do not have the same weight, and therefore the electromagnetic field configuration in the cavity essentially depends on the rf coupling that can select the eigenmode of interest. Most ECRISs now use either coaxial or rectangular waveguides, both methods being presented with some other systems as well. 1. Coaxial Coupling For the sake of compactness, B. Jacquot utilized coaxial rf launching in the Caprice type source as presented in Figure 8 (Jacquot et al., 1987), and since then, many compact ECRISs utilize the same type of coupling (Sortais et al., 1995; Hasegawa et al., 1995; Drentje et al., 1995; Kutner et al., 1995; Efremov et al., 1995; Liu et al., 1995). However, it is likely that the rf mode structure looks like a TM0,m,n mode within the plasma chamber. For this mode the electric field is zero on the axis where the main plasma is located (Figure 9). Moreover, by measurement of the diamagnetism effect in axisymmetrical plasma as in Caprice, the plasma energy content nE has been determined. Actually, by MHD equilibrium, the pressure gradient is balanced by a Lorentz force due to the diamagnetic current j , which induces a magnetic field opposite to the main confinement magnetic field P = j × B.
(38)
22
HITZ
F IGURE 8.
Coaxial rf coupling in a Caprice type ECRIS.
A loop surrounding the plasma can measure any variation of the magnetic flux it encloses. And this variation induces a voltage V :
nkT 1 2 , (39) V dt = −NΦ ≈ Nπr B 2 2 B /2μ0 where N is the number of turns of the loop. The diamagnetism reflects the energy content. Diamagnetism measurement performed with Caprice ECRIS shows that the actual rf power absorbed by the plasma is about 20% of the input rf power (Figure 10). Such rf coupling as presented in Figure 8 presents a major drawback in the rectangular to coaxial transition: this system can create standing waves in this transition avoiding in such a way the normal propagation of the microwaves to the plasma and heating up the transition abnormally. In addition, when microwaves travel along the coaxial waveguide, they encounter a resonance point before entering the plasma chamber. If the
F IGURE 9.
TEM waves. Solid lines, electric lines: dashed lines, magnetic lines.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 10.
23
Efficiency of the coaxial coupling into a Caprice ECRIS.
local gas pressure is too high, some rf power can be absorbed. Even if it is useful to have a compact rf launching in small ECRIS, this method of microwave coupling is somewhat tricky and presents poor efficiency. To maintain the advantage of compact rf coupling, a careful excitation of the TE1,1 mode of the coaxial waveguide by using a (0, 0) phased rectangular waveguide bijunction is recommended. The bijunction field is progressively converted into a coaxial waveguide TE1,1 mode (Figure 11). In the plasma chamber this mode is coupled to the TE1,n,p . This coupling structure has been calculated from Collin (1960) and Masterman and Clarricoat (1971) and successfully tested in the Caprice source (Hitz et al., 1995). In Figure 11, the source axis (represented by a circle) remains free for the element to be ionized (gas or metal).
F IGURE 11. Excitation of the TE1,1 mode of the coaxial waveguide by using a (0, 0) phased rectangular waveguide bijunction.
24
HITZ
2. Rectangular Coupling Another system used to launch microwaves into the plasma is the use of fundamental rectangular waveguides up to the plasma chamber entrance. With these types of waveguides, the TE10 mode is used (Figure 12). Even if this type of coupling is more efficient than the coaxial one, a single waveguide coupling is not selective enough and a double waveguide coupling system, sometimes called a “bijunction,” is recommended. To maintain accessibility to the main axis of the plasma chamber (for gas or metallic elements injection systems), different rf couplings are possible. Figure 13 shows a TE1,n,p mode excitation by an E-plane (0, 0) phased. Both input waveguides are in phase. These modes are expected to efficiently couple to the whistler waves on the source main axis. These two waveguides can easily be obtained by splitting the initial waveguide with a commercially available power divider. Figure 14 shows a TM0,n,p mode excitation by a (0, π ) phased E-plane bijunction. However, these are not the best modes to be used in ECRIS because the electric field is zero on the main axis where the hot plasma is. In addition, the main electric field component of these modes is axial or parallel to the magnetic field, and therefore it may increase electron losses. This TM0,n,p mode excitation is then less efficient than a coupling to a whistler mode.
F IGURE 12. and direction).
TE10 mode commonly used in ECRIS (arrows presents the electric field intensity
F IGURE 13.
TE1,n,p mode excitation.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 14.
25
TM0,n,p mode excitation.
Figure 15 presents the excitation of the axisymmetric TE0,n,p mode by an H-plane (o, π ) phased bijunction. The two (o, π ) phased waveguides can be obtained by using a rectangular waveguide 3 db directional coupler with a π/2 phase shifter in one of the output waveguides. These modes should also easily couple to the whistler wave; however, their transverse electric field is zero on the source main axis. This excitation system is then less efficient than the TE1,n,p modes. From this, it appears that TE1,n,p mode excitation seems to be the best mode to be used in ECRIS. However, a source designer also has to cope with the plasma shape, which is very often triangular, and also has to place (within a rather small place) one or more ovens and at least one gas pipe. To avoid any outgassing in the waveguides, they must be placed out of the so-called plasma star. Figure 16 presents the theoretical position of a bijunction with a plasma star shape given by the superposition of a hexapolar radial magnetic field and a mirror axial magnetic field. This figure shows how difficult it is for the designer to accommodate theory and shows that some compromise must be made.
F IGURE 15.
TE0,n,p mode excitation.
26
HITZ
F IGURE 16.
TE1,n,p mode excitation with a possible plasma star shape.
An example is shown in Figure 17, which shows the injection system of a room temperature ECRIS called GTS-LHC. This ECRIS designed by CEAGrenoble for the CERN-LHC project has two rectangular waveguides, each being dedicated to one frequency (the source is going to be operated both at 14.5 and 18 GHz). As shown in Figure 17, these waveguides are not in bijunction, but are situated out of plasma impact. 3. Utilization of the Double Frequency This technique is utilized with great efficiency by LBL-AECR, which uses at the same time 10 and 14 GHz (Xie, 1998). The electron that moves along a magnetic field line (in a helix motion of guiding, the center trajectory given by the field line), and which is going to be lost on the plasma chamber walls in spite of the first resonance crossings, may be saved by crossing
F IGURE 17. Injection system of the GTS-LHC ion source. Left: two rectangular waveguides, two micro ovens, and one gas pipe. Right: plasma star shape.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
27
F IGURE 18. Iso-|B| contours in an ECRIS. Only closed surfaces are represented. When injecting two microwave frequencies (rf1 and rf2), two closed surfaces are resonance surfaces.
a second resonance zone. Figure 18 presents a typical magnetic field line around which spirals an electron inside an ECRIS plasma chamber. Iso-|B| contours obtained by the superposition of an axial mirror magnetic field and a hexapolar radial field are shown. When two microwaves are injected at the same time, for example, 10 and 14.5 GHz for LBL-AECR, iso-|B| lines that are used as resonance surfaces are 0.36 T and 0.52 T respectively. However, it seems that multiple frequency heating leads to a decrease in ion production time and a shift of the charge state distribution to higher charges (Vondrasek et al., 2005; Koivisto et al., 2006). 4. Application to Pulsed Regime An ECRIS, when connected to a synchrotron, for example, has to be operated in a pulsed regime. The ion source pulsation is given by the rf pulsation and when the rf power is stopped, electrons that are no longer confined escape from the magnetic well. So do the ions for the sake of plasma neutrality. One way to empty the source more rapidly is to give parallel momentum to some electrons to accelerate the electron losses and therefore the ion losses too, which means increasing the ion currents. The utilization of Landau damping can achieve this enhancement of electron losses. Figure 19 gives an example where 2.45 GHz is introduced radially in the plasma chamber through a small loop having a length of about half of the main wavelength. For a main rf power frequency at 14 GHz, the loop length is about 1 cm. It gives a local electric field, which gives parallel
28
HITZ
F IGURE 19.
Utilization of Landau damping to accelerate electron losses.
momentum to the electrons. This additional frequency must be added at the end of the main frequency pulse. Figure 20 presents the relative position in time of main rf pulse and auxiliary rf pulse to enhance beam intensity at the end of the main rf pulse when the afterglow pulse starts. 5. Frequencies Above 20 GHz Below 20 GHz, microwaves are usually transmitted through rectangular waveguides, as this is the usual output from the transmitter. However, above 20 GHz, high-power microwaves sources are gyrotrons, whose output is either with circular waveguides or quasi-optical, for example, 28 GHz output mode is TE02 in an oversized waveguide. The wave transmission line between the transmitter and ion source plasma chamber must be carefully designed to avoid high power losses due to spurious mode conversion. These modes are generated in regions of waveguide imperfections (tilt, bend, diameter change, high voltage insulator, vacuum window, etc.). Figure 21 presents a schematic diagram of the transmission line. Two possibilities are presented, depending on the mode launched into the ECRIS. An arc detector prevents the gyrotron window from any possible spark. A bidirectional coupler measures both
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 20.
29
Typical application of Landau damping during the afterglow period.
incident and reverse rf power. A mode filter damps any mode other than TE02 . The TE02 mode is then converted into the TE01 , which has low ohmic losses during long distance transport. The mode converter has a small axisymmetric, periodic radius perturbation. The TE01 mode can then be converted into TE11 or into hybrid HE11 (via TM11 mode) or can be kept and launched into the ion source after the insulated transmission line (since the ion source is at a positive high voltage) and the source vacuum window. The waveguide bend is a critical component of the transmission line because conversion between modes with phase velocity close together generally could occur in an oversized waveguide with axial curvature (Hunger, 1957). The dc break is usually a waveguide made alternately of stacked aluminum and Teflon rings, with a period of half-wavelength. An optimized TE01 bend utilized to couple 28 GHz to an ECRIS is presented in detail by Plaum et al. (2001) and some experimental results obtained by an 28 GHz ECRIS are shown in Gammino et al. (2001) and Hitz et al. (2002b, 2002d).
30
HITZ
F IGURE 21. Schematic diagram of a 28 GHz transmission line with two coupling possibilities (arrows indicate the electric field).
Another solution would be to transport microwave power with a quasioptical system made of mirrors, as commonly used in fusion research for frequencies above 30 GHz. 6. Multifrequency Transmission Line As shown previously, multiple frequency heating can strongly enhance ion currents. Either all frequencies are launched separately into the ECRIS (Hitz et al., 2002a) or can be launched in one place of the ion source if available space is not important. For example, a preliminary study of 18 + 28 GHz multiplexing into an ECRIS has been done (Kasparek et al., 2002). Both microwaves coming out of their generator go to a diplexer (Figure 22).
F IGURE 22.
Sketch of a rectangular waveguide diplexer (Kasparek et al., 2002).
ELECTRON CYCLOTRON RESONANCE ION SOURCES
31
C. Conclusion As usual, it is difficult to accomodate theory when designing a machine like ECRIS, specially for microwave launching. For example, leaving room available to put a waveguide in a radial direction leads to a weaker radial confinement. By chance an ECRIS is a multimode cavity with a complex magnetic configuration and there is always a place inside the cavity where the electrons can undergo transverse energy. However, of the total incident rf power launched into this multimode cavity, almost half is lost in different processes: 1. 2. 3. 4. 5.
losses in the waveguide and matching system, losses through electron scattering into the loss cone, rf power radiated through various ECRIS insulators, reflected power, losses on the walls. The absorbed rf power by an ECRIS can be written as Pabs ≈
ne kTe V , τe
(40)
where τe is the electron confinement time and V is the hot plasma volume. To efficiently transport and couple microwaves up to an ECRIS, special care has to be taken first on the design of the rf line to minimize the losses. Above 1 kW of rf power, oversized waveguides are strongly recommended. However, the position of the waveguide(s) inside the ion source has to be carefully chosen to minimize any possible outgassing due to parasitic resonances. Once plasma is created, it is of major importance to diagnose it to obtain knowledge of electron and ion density and temperature. This is the purpose of the next section, which deals with one typical nonperturbative diagnostic.
III. VUV D IAGNOSTICS OF ECRIS P LASMAS To enhance ECRIS performances, it is necessary to carry out studies on the plasma created within this ion source. A probe inserted into the ECR plasma would certainly give some hints; however, a nonperturbative diagnostic is by far more helpful. The main ECRIS parameters are electron density and temperature as well as ion densities and temperature. To determine the total electron density, microwave interferometry can be utilized. For example, a 63 GHz Gunn diode can be connected to the ECRIS plasma chamber (Hutchinson, 1990). Electron density ne is related to microwave frequency delivered by the Gunn diode, the plasma length d
32
HITZ
crossed by the wave along one arm of the interferometer and the dephasing ϕ between the two waves by the following formula: ne (cm3 ) = 118.4
f (Hz) ϕ (rad). d (cm)
(41)
Another diagnostic, which could be utilized to determine the plasma potential, is a retarded field analyzer installed at one end of the ion source to measure the plasma potential. This analyzer is, for example, composed of four electrodes on which different potentials are applied to repel energetic electrons and accelerate ions (Klein, 1995; Perret, 1998). Plasma potential is determined by the curve current VS applied voltage, which presents an inflection point. Another method is to place the analyzer after a first beam selection, as done by Tarvainen et al. (2004). In this case, different charge states can be utilized for this diagnostic. Ions in ECRIS can also be diagnosed by using optical measurements (visible, UV, and X-ray). The latter approach gave interesting evaluations of the ECRIS ion confinement times (Douysset et al., 2000), which are important parameters in view of further ECRIS improvements. As previous measurements have been presented elsewhere (Barué et al., 1992; Girard, 1992), this chapter only deals with one optical diagnostic that has been performed on two different ECRISs. The purpose of this diagnostic is to determine electron density and temperature and to diagnose the ions. So far only a few studies on ECR plasmas by VUV spectroscopy have been carried out (Pöffel et al., 1990; Petty et al., 1991; Kato et al., 1993; Vinogradov et al., 1994). The goal of ECR plasma diagnostics is to forecast extracted ion current intensity in the future. For that purpose, ion confinement times should be compared with ECR plasma modeling under different source conditions. Different models of ion confinement time have been proposed (West, 1982; Pastukhov, 1987; Whaley and Getty, 1990) but none of them could really explain the evolution of this plasma parameter with the ion charge state. However, Kato’s model (see below) and the Corona model could be applied to ECR plasmas, which are linked to Tokamak (de Michelis and Mattioli, 1981) and solar corona plasmas (Finkenthal et al., 1987). A. Experimental Set-Up and Data Processing 1. Experimental Set-Up Two different types of ECRIS sources have been employed during the experiment. First, Quadrumafios (Girard et al., 1994) has specially been built to perform plasma diagnostics. One characteristic of this machine is to have view ports through the radial quadrupolar magnetic structure made of permanent
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 23.
33
Grazing incidence spectrometer connected to the 18 GHz Quadrumafios ECRIS.
magnets (Figure 23). However, the major drawback of this configuration is the shape of the magnetic field lines leading to a weak electron confinement. Thus, extracted ion intensities are rather low for a 18 GHz ECRIS. On the other hand, VUV measurements have also been performed on the plasma of a 10 GHz Caprice source (Hitz et al., 1992). This is a powerful ion source able to deliver high intensities of very high charge states (up to Xe30+ ). But this type of source requires a very strong magnetic confinement, which does not allow radial view ports. The spectrometer has then been installed on the source axis though the extraction region as shown in Figure 24.
F IGURE 24.
Grazing incidence spectrometer connected to the 10 GHz Caprice ECRIS.
34
HITZ
For both sources, to avoid any perturbation coming from the plasma chamber (wall effects), slits are installed along the light path depending on the geometry chosen for these two different sources. The solid angle of the observed plasma volume is equal to 2.3×10−3 mm3 ·strd for the Quadrumafios source and 1.7 × 10−3 mm3 · strd for Caprice. VUV photons are detected by a grazing incidence spectrometer having a 3 m radius of curvature and equipped with microchannel plates (MCPs) (Romand and Vodar, 1962). A 600 lines/mm holographic grating is placed tangentially to the Rowland circle to cover the range 10–100 nm with a resolution of 0.045 nm at 58.4 nm. MCPs are placed at a fixed distance from the center of the grating on the Rowland circle with an 8◦ angle relative to the tangent of the circle. MCPs are funneled and coated with a 150 nm width layer of MgF2 to enhance quantum efficiency at lower wavelengths. The relative calibration curve of the VUV spectrometer is determined by two different methods. First, the branching ratio method is utilized over the range of 30–900 nm with different elements that could be directly produced by the plasma of both ion sources (Hibst and Bukow, 1988; Bastert et al., 1992; Klose et al., 1993; Yang and Cunningham, 1994). To cover the lowest part of the wavelength range, the charge exchange method is employed (Bouchama and Druetta, 1989). For such a calibration, the Caprice ECRIS also delivers multicharged ions, which are then sent to a charge exchange experiment filled with a gas target, typically H2 . Figure 25 shows the relative calibration curve of the VUV apparatus (spectrometer, grating, and detector). The overall error on the efficiency calibration curve is estimated at 30%. Thus, each relative line intensity (Irel ) is deduced from the measured intensity (Imes ) by the following formula: Irel (λ) =
Imes (λ) . ε(λ)
(42)
2. Data Processing a. Line Intensity Ratio Method to Determine Electron Density and Temperature Kato’s model is based on line intensity ratios of elements belonging to the beryllium isoelectronic sequence such as OV (Kato et al., 1985, 1990), MgVII (Finkenthal et al., 1987). These ion charge states have metastable states that are useful to determine electron density in hot lowdensity plasma. This is the case for ECR plasmas where neutral pressure inside the chamber is low (10−5 –10−6 mbar). Figure 26 shows the Grotrian diagram and the corresponding lines for OV. In low-density plasma, collisional excitation is counterbalanced by spontaneous emission, at least for low-lying levels (de Michelis and Mattioli, 1981). As metastable states of OV have a long life time compared to other
ELECTRON CYCLOTRON RESONANCE ION SOURCES
35
F IGURE 25. detector.
Relative efficiency calibration curve of the VUV spectrometer: grating and
F IGURE 26. determination.
OV Grotrian diagram with lines used for electron density and temperature
excited levels, upper states of triplet multiplicity can be populated by electron collision from these metastable states, in particular the 2p2 levels from which the 76 nm line group is emitted. However, the 2s2p 1 P, from which the 63 nm
36
HITZ
line is emitted, is populated by electron collision from the ground state 2s2 Thus, the line intensity ratio 76/63 is proportional to the electron density. The theoretical line intensity ratio relative to electron density can be written as (Keenan, 1992)
1 S.
R=
Qf →m (Te ) Iim = Ijf Qf →j (Te ) 1 +
1
Amf ne Qm→i (Te )
,
(43)
where the Qf →m (Te ), Qf →j (Te ) and Qm→i (Te ) electron collisionnal excitation rates are averaged with a Maxwellian electron distribution function (MEDF). The electron temperature can also be deduced from the line intensity ratio method by considering two excited levels relatively far from each other and populated by electron collision from the same level. For example, 17.2/63 and 19.3/76 line ratios are particularly well suited to determine the electron temperature. The ratio 21.5/22 can also be used to determine both electron density and temperature, even though their energy level is almost the same. Indeed, excitation rate coefficients for 2s3s 3 S and 2s3d 1 D levels have a different electron temperature dependency (Kato et al., 1990). And the theoretical line ratio is written as R =
γf →i − Ei −Ej bi Iik = e kTe , Ijl γf →j bj
(44)
where γf →i and γf →j are effective collision strength and bi , bj are the branching ratios of levels i and j . b. Corona Model to Determine Ion Densities and Ion Confinement Times of Every Charge State of Oxygen Knowledge of electron density and temperature with atomic data relative to each process occurring in ECR plasmas is necessary to determine the ion density of each charge state. Atomic data of oxygen are well known today, and a corona model adapted to our plasmas can be used (Itikawa et al., 1985; Bathia and Kastner, 1993). This model has the same hypothesis as Kato’s model and can be used for lowdensity, optically thin plasmas. From the line intensity measurement of each oxygen charge state, we can calculate the relative densities from charges #1+ to 5+ ions at their ground state according to the following formula:
q+ ng
gi<j Aij 4π I (λih ) 1 + Aih , = Ω ε(λih ) ne Qg→i (Te )
(45)
where ε(λih ) is the relative efficiency of the whole grating detector set at the specific wavelength.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
37
In this model, metastable states from which excited levels can be populated by electron impact are not taken into account. Actually, low lying levels of the same multiplicity as the ground state (resonance lines) can be chosen. Nevertheless, ion densities can be determined by taking into account metastable states with an iterative method, where an approximate value of ion confinement time of one charge state is taken. Ionization cross sections of lower and upper charge states of the considered ion charge state are also known. With this method, the ratio of two metastable states 2s2 2p2 1 S and 2s2 2p2 1 D of OIII relative to its ground state is estimated to be about 20 and 5% respectively (Berreby, 1997). We can note in Eq. (45) that knowledge of electron density and temperature is necessary. These values can be determined by the line intensity ratio method in the VUV range. However, the calibration curve of the spectrometer is only relative and not absolute. But, under plasma electro-neutrality, it is possible to deduce absolute ion densities and then ion confinement times. The procedure to obtain absolute ion densities from relative ones is to write two iterative formulas: q+
q+ nabs
ntot e =
5
q+
qnabs
=
nrel
n1+ rel
n1+ abs ,
(46)
(plasma electro-neutrality condition).
(47)
q=1
The absolute ion density of O+ can then be determined and consequently the density of any other oxygen charge created in the ECR ion source. O+ density is given by 1+ ntot e nrel = n1+ 5 abs q+ . q=1 qnrel
(48)
Line intensities utilized in this calculation are presented in Table 2. TABLE 2 OXYGEN L INES U SED FOR THE D ETERMINATION OF I ON D ENSITIES Ion
λ (nm)
Lower level
Upper level
OII OIII OIV OV OVI
53.959 50.778 78.895 62.973 17.3008
1s2 2s2 2p3 - 4 S◦ 1s2 2s2 2p2 - 3 P 1s2 2s2 2p - 2 P◦ 1s2 2s2 - 1 S 1s2 2p - 2 P◦
1s2 2s2 2p2 (3 P)3s - 4 P 1s2 2s2p3 - 3 S◦ 1s2 2s(1 S)2p2 - 2 D 1s2 2s(2 P)2p - 1 P◦ 1s2 3d - 2 D
38
HITZ
The ion confinement time of each charge state within ECR plasma is then deduced from ion densities and ion extracted currents according to the formula (Keenan, 1992) q+
τ
q+
1 nabs = q+ SL, 2 iext
(49)
where S is the resonance zone area of the plasma, L is the plasma length, and q+ iext is the ion extracted current of charge q. In Eq. (49), the factor ½ means that the magnetic configuration of an ECRIS has two symmetrical ends and the ion beam is extracted on one side only. B. Experimental Results 1. Quadrumafios Source Because of the quadrupolar magnetic confinement, the emittance of such a source is rather bad and the overall beam line transmission is then poor (about 30%). In this set of experiments, only injected rf power and gas pressure are changed. a. rf Injected Power Dependence on Electron Density and Temperature The total electron density is determined by line intensity ratios 76/63 (nm) and 21.5/22 (nm) crossing curves in a (Te –ne ) graph (Berreby, 1997). Figure 27 shows the evolution of the apparent electron density determined by two different methods: VUV spectroscopy and microwave interferometry. For an injected rf power larger than 120 W, Figure 27 shows a strong discrepancy between both methods. Microwave interferometry gives a total electron density rising from 1.86 × 1011 cm−3 up to 4.47 × 1011 cm−3 , while the evolution of UV measurements is decreasing from 1.16×1011 cm−3 down to 4.7 × 1010 cm−3 . There is about one order of magnitude between both the electron densities at large rf power. In Figure 28 is plotted the evolution of apparent electron temperature measured by VUV spectrometry and plasma potential determined by an electrostatic analyzer with rf injected power. Both parameters present similar behavior. b. Ion Densities Evolution with the rf Power With the calculation procedure presented above, it is possible to determine densities of oxygen ions from charges 1+ to 5+ at different rf power. In Figure 29 both evolution of extracted currents and ion densities with microwave power can be compared.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
39
F IGURE 27. Quadrumafios ECRIS: electron density measured by VUV spectroscopy and by microwave interferometry, as a function of rf power.
F IGURE 28. the rf power.
Evolution of apparent electron temperature and plasma potential as a function of
40
F IGURE 29. injected power.
HITZ
Evolution of ion extracted currents (a) and ion densities (b) as a function of rf
ELECTRON CYCLOTRON RESONANCE ION SOURCES
41
F IGURE 30. Evolution of electron density determined by VUV spectroscopy and microwave interferometry as a function of oxygen pressure.
c. Pressure Dependence on Electron Density and Temperature Figure 30 shows a strong discrepancy between the evolution of apparent electron density and that measured by microwave interferometer with gas pressure. Both curves converge at high oxygen pressure, other source parameters being constant. Apparent electron density rises from 2.4×1010 to 4.2×1011 electrons/cm3 , while total electron density determined by microwave interferometer is increasing from 2.4 × 1011 to 5.2 × 1011 electrons/cm3 . We also note a silmilar evolution of apparent electron temperature determined by UV measurements and plasma potential with oxygen pressure (Figure 31). d. Ion Densities and Confinement Time Measurements with Oxygen Pressure Figure 32 shows the evolution of ion densities, ion extracted currents, and ion confinement times with oxygen pressure injected in the source. Ion densities determined by VUV spectroscopy present the same behavior with the injected gas pressure as extracted currents. They present a maximum at about 4 × 10−6 mbar for O4+ and O5+ , with this pressure corresponding to source optimization for highly charged ions. Ion confinement times, which are in the order of a few milliseconds, increase with charge state and decrease with pressure for all ions.
42
F IGURE 31. gas pressure.
HITZ
Evolution of apparent electron temperature and plasma potential as a function of
2. Caprice Source Similar measurements are performed with a more performing ion source. After optimization of the magnetic configuration and gas pressure of the source, VUV studies at various rf power are then peformed. a. Electron Density and Temperature Measurements with rf Injected Power Figure 33 shows the evolution of the apparent electron density given by VUV spectroscopy. At the same time, it shows the O4+ current produced by Caprice. Apparent electron density measured by VUV spectroscopy is larger than that determined for Quadrumafios; however, its evolution is similar for both sources. Regarding the electron temperature, it also has similar behavior to the rf power in both cases (Figure 34) (Druetta and Hitz, 1992). b. Ion Densities and Confinement Time Evolution with the rf Power Figure 35 shows the evolution of ion extracted currents, ion densities, and ion confinement times with rf power. For various charge states, there is a fast saturation of the ion density. However, ion confinement times are relatively constant and below 1 ms.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
43
F IGURE 32. Evolution of ion intensities extracted from the Quadrumafios source (a), of ion density with oxygen pressure (b) and of ion confinement time for different oxygen charge states (c).
44
HITZ
F IGURE 33. with rf power.
Evolution of electron density and O4+ intensity extracted from Caprice ECRIS
F IGURE 34. the rf power.
Evolution of electron temperature and O4+ intensity extracted from Caprice with
C. Discussion VUV measurements were carried out orthogonally to the main magnetic field axis for the Quadrumafios source and axially for Caprice because of their magnetic structure. Indeed, radial magnetic confinement is achieved by permanent magnets in quadrupolar geometry for Quadrumafios and hexapolar for Caprice. This gives different plasma shapes more or less well confined
ELECTRON CYCLOTRON RESONANCE ION SOURCES
45
F IGURE 35. Evolution of the ion intensities extracted from Caprice source (a), of ion density (b), with rf power, and evolution of ion confinement time for different oxygen charge states (c).
46
HITZ
within the chamber. The interpretation of the results obtained for both sources must then take into account this fundamental difference. Data concerning electron density and temperature can be correlated for Quadrumafios with other diagnostics such as a retarded field analyzer and microwave interferometer. These results can be interpreted through the evolution of EDF as a function of rf power and pressure at a fixed magnetic field configuration. Figure 27 and Figure 28 show that EDF is almost Maxwellian at low rf power; therefore VUV spectroscopy can only diagnose a cold electron population. This is confirmed by the fact that plasma potential, which cannot keep electrons having energy higher than its value, has the same behavior as electron temperature determined by VUV measurements. Then, when rf power increases, electrons are heated very rapidly and the cold electron population decreases down to saturation. The tail of EDF is then spread out to high energy as shown by bremstrahlung and electron cyclotron emission measurements (Barué et al., 1994; Gaudart, 1995). Figure 30 and Figure 31 show that a similar interpretation of EDF can be achieved when oxygen pressure inside the plasma chamber is high. Collisions between electrons and neutral atoms increase and the high energetic electron tail disappears. At low rf power and high oxygen pressure, the EDF is close to Maxwellian. Without any other diagnostics installed simultaneously on Caprice, it is nevertheless possible to obtain some plasma information with a VUV diagnostic. A cold electron population inside the plasma of Caprice source but with a quite larger density (about one order of magnitude) as compared with Quadrumafios is also obtained. But the electron population seen by VUV spectroscopy is the cold one, and it is not responsible for the ionization of high charge state ions. This result can be utilized for the calculation of ion densities of each oxygen charge state. Indeed, all excitation rate coefficients by electron impact have been calculated with a Maxwellian distribution function, which is reasonable according to the considered energy range. Regarding results relative to ion confinement time measurements, note that their values with Caprice are smaller than with Quadrumafios by at least one order of magnitude. This can be explained by considering both plasma chamber size and plasma length, as ion confinement time is proportional to the square of plasma length (Melin and Girard, 1997). Ion confinement time is also proportional to the ion density and extracted current ratio, but these two quantities are much larger for Caprice than Quadrumafios. As multiply charged ions are created step by step by electron impact, ions must stay a long enough time to be ionized before leaving the plasma by diffusion. As could be expected, ion confinement time increases with charge state for different plasmas with Quadrumafios. However, the ion confinement times determined with Caprice remain nearly constant with
ELECTRON CYCLOTRON RESONANCE ION SOURCES
47
charge state. Anyway, due to its optimized hexapolar radial confinement, Caprice, which is a high performance ECRIS, offers a better compromise than Quadrumafios between ion confinement time and ionization time. It could be supposed that highly charged ions are created inside the plasma without being extracted. And then, a plasma potential dip at the top of the plasma potential would confine ions and permit them to be stripped during a rather long period. However, a rather simple experiment, done by spectroscopy, shows that if no ions are extracted from the source, they are not inside the plasma. Actually, the VUV spectrometer is able to simultaneously observe, on microchannel plates, different line intensities corresponding to several charge states, these lines being correlated with the corresponding current intensities. And VUV measurements showed that if there is no extracted O4+ on the Faraday cup, there is no photon detected by the spectrometer (Figure 35). This fact seen with these two different ECRISs may not be correct if other tricks are installed into the ion source. We will see, in a next section, that a small polarized disk could have a strong influence on ion confinement. D. Conclusion This study shows that a good ECR ion source relies on compromise between ion confinement and ionization times. To deliver a large ion current of high charge state, it is necessary to drastically increase the electron density independently of the source emittance. A precise determination of the EDF of ECR plasma is nevertheless necessary; and determination of the absolute intensity calibration of the VUV spectrometer should permit ion densities to be calculated without using the plasma electroneutrality condition. Indeed, this study does not take into account ionized impurities and high charge state ion intensities because their ion densities could not be calculated with a simple corona model. As VUV measurements are limited to a low charge state population, other optical diagnostics such as X-ray spectroscopy are necessary to obtain a more precise estimate of electron and ion densities and temperature; however, such a simple diagnostic confirms the role of a large plasma chamber and good confinement on source performances. The next section focuses on magnetic confinement and related scaling laws that need to be fulfilled by an efficient ECRIS.
IV. M AGNETIC C ONFINEMENT Confinement in an ECRIS is essential. Indeed, production of multiply charged ions can be achieved only if plasma parameters such as electron density (ne )
48
HITZ
and temperature (Te ) correspond to the formation of the desired ion. These two parameters take place in the ionization rate (σ vion q−1→q (Te ) to go from charge q − 1 to charge q), and in the ionization time (τqion ); the control of ne and Te is achieved by magnetic confinement. Electron magnetic confinement can enhance electron density as one electron, reflected by the magnetic field, will complete many round trips inside the magnetic configuration, enhancing the overall electron density and the probability of ionizing collisions (one electron can ionize several times). In addition, along its path in the magnetic configuration, the electron will cross the resonance surface several times and then will increase its energy. The simplest magnetic structure providing this confinement is a mirror machine consisited of two sets of coils. This structure seems to be well appropriate to an ion source: on one side of the axis, it is possible to inject gas to be ionized and microwave power, while on the other side are the lost ions that follow those electrons that are gone into the loss cone. However, the confinement of a simple mirror machine is not sufficient to produce multiply charged ions. Actually, one additional condition must be fulfilled for plasma confinement in a magnetic structure: magnetic pressure surfaces (iso-|B|) must also be plasma iso-pressure surfaces, which is the electron kinetic pressure ne kTe for ECR plasma where the energy is only within the electron population. When an iso-pressure surface touches the walls, plasma is no longer confined. And then the confining magnetic structure must have iso-|B| surfaces closed, nested in each other, and with |B| increasing from inside to outside. In other words, magnetic field lines must have a concavity facing outside. Such a structure, called minimum-|B|, is obtained by the use of a radial multipolar magnetic field. Most present ECRISs keep this initial design of a mirror field created by two or more sets of coils combined with a radial multipolar field that gives magnetic flux tubes nested in each other. This pioneer design done by R. Geller and co-workers is an outgrowth of open-ended mirror machines initially built for their confinement properties in fusion research. A. Scaling Laws To improve ECRIS performances, Geller (1996) defined semiempirical scaling laws for plasma density and ion confinement. These laws, deduced from experiments and from general plasma understanding, still serve as basic rules for ECRIS designers. Furthermore, ECRIS requires sophisticated magnetic configuration to efficiently control the ionization process. And a lot of work has been done by several laboratories to get good understanding of magnetic confinement,
ELECTRON CYCLOTRON RESONANCE ION SOURCES
49
as presented earlier (page 7). Already at the beginning of ECRIS history, people wanted to build ion sources with large magnetic fields. For example, a cryogenic version of Supermafios was proposed by Claudet (1977). In this report, use of superconducting wires led to double the hexapolar radial field as compared with water-cooled copper coils. However, one had to wait until year 2000 to have a precise idea of suitable magnetic configuration thanks to various studies performed in Grenoble, Catania, East Lansing, RIKEN, etc. After this brief history of magnetic confinement, we will go into more detail and see what is required for an efficient ECRIS. 1. Axial Magnetic Confinement: Injection Side The purpose of the axial mirror magnetic configuration is to efficiently confine the electrons. However, the purpose of an ion source is to deliver ions and then a compromise has to be made between confinement and ion production. Fortunately, ions are extracted on one side of the ion source, which means that confinement can be optimum on the other side. This fact leads to the socalled high-B mode. The necessity of a high magnetic field at the injection side, already shown by Gammino et al. (1996) has been confirmed with the fully superconducting ion source SERSE (Hitz et al., 2000a). Figure 36 shows the evolution of one typical charge state produced by an ECRIS (Xe20+ ) with the magnetic field at injection. The axial mirror ratio at this side of the source is defined as: Rinj = Binj /Becr where Binj is the magnetic field value at the mirror throat at the injection side of the source and Becr is the resonance magnetic field value (Becr = 0.50 T at 14 GHz, 0.64 T at 18 GHz and 1 T at 28 GHz). Experiments performed at 14 GHz clearly show that Rinj ≈ 4 seems to be an optimum value. For other frequencies, data stop at a lower mirror ratio because of technological limitations (the maximum magnetic field at this side of the source is 2.7 T). Figure 36 shows another scaling law already described by Geller (1996). Beam intensity is proportional to the square of the frequency. In this figure, for the maximum mirror ratio possible at 28 GHz (i.e., 2.6), beam intensities obtained at 18 and 28 GHz are proportional to (28/18)2 . But Figure 36 also shows that this is not valid when comparing 14 GHz with other frequencies. As presented in a previous section (“fundamental aspects of ECRIS”), extracted current is proportional to the hot plasma volume, considered to be defined by the resonance surface. This means that extracted current is proportional to the resonance length. In addition, this current is also proportional to the square of the frequency as follows: Iq ≈
nq qeV τq
and
Iq ∝ ωrf2 .
50
HITZ
Thus, intensities plotted in Figure 36 have to be correlated with resonance length to check the frequency scaling law. Let us consider data obtained by SERSE ECRIS at three different frequencies, presented by Gammino et al. (2001) at mirror ratio = 2.6. Resonance lengths are of course different for all frequencies utilized. Table 3 lists the Xe20+ beam intensity at each frequency (I20+ ). This intensity is first divided by the corresponding resonance length (RL ) and the new value is divided by the square of the frequency. Data obtained and presented in the last column of Table 3 are not very far from each other. The difference probably arises from microwave coupling into the ion source. Table 3 reveals that 28 GHz seems to have the
F IGURE 36. 2001).
Evolution of beam intensity with mirror ratio at the injection side (Gammino et al.,
TABLE 3 X E 20+ B EAM I NTENSITY O BTAINED WITH S ERSE ECRIS (Gammino et al., 2001) AND S CALING L AWS Frequency, f
Resonance length, RL (mm)
I20+ (taken from Figure 36)
I20+ /RL
(Itot /RL )/f 2
14.5 GHz 18 GHz 28 GHz
105 135 220
35 90 215
0.33 0.67 0.98
1.60 × 10−3 2.06 × 10−3 1.25 × 10−3
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 37.
51
Serse injection flange. The diameter corresponds to the plasma chamber inner size.
poorest efficiency. In this case, microwave coupling is done in a TE0,1 mode with a circular waveguide, while 14.5 and 18 GHz are launched in a TE1,0 mode through rectangular waveguides. Considering the intensity obtained as a function of the rf power launched into the source, to obtain 90 µA, 1800 W was necessary at 18 GHz, while 4 kW was sent into the source at 28 GHz to obtain 215 µA of the same specie (Xe20+ ). Both values correspond to about 50 nA/W. Such a calculation does not fit with the frequency scaling and indicates that the use of the TE01 mode may not be as efficient as the TE10 one. Figure 37 shows the injection flange of the 18–28 GHz Serse ECRIS. As usual, both waveguides are situated outside the “star impact,” which corresponds to the magnetic field lines followed by the electrons. Another microwave coupling at 28 GHz would be to use the TE11 or HE11 mode as shown in Figure 38; however, only a few ECRISs now use this frequency and all of them have a TE01 mode. One reason is the complexity of conversion from the axisymmetric TE01 mode to the asymmetric TE11 mode, which may be achieved by a one plane sinusoidally perturbated axial curvature waveguide section. Another alternative would be to choose the HE11 hybrid mode for coupling into the ion source cavity. In this case, the TE01 mode is converted into the HE11 mode via the intermediate TM11 mode by a smooth bend and corrugated wall waveguide.
52
HITZ
F IGURE 38.
28 GHz microwave coupling. Left: usual coupling. Right: more efficient possibil-
ity.
Nevertheless, the necessity of having a large mirror ratio Rinj at the injection side of the source is clearly indicated. From Rinj = 2.6 to Rinj = 3.9, the gain in intensity can go from 50% to more than 100% as shown in Figure 39.
F IGURE 39.
Importance of a strong axial mirror ratio.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
53
F IGURE 40. Evolution of Ar11+ intensity produced by Caprice ECRIS and comparison with magnetic scaling laws obtained with xenon beams. Year corresponds to the fabrication date of the ion source.
These data showing the benefit given by a high injection mirror ratio can be compared with previous data obtained with other ECRISs. A good example could be taken from Caprice ECRIS. This ion source designed by B. Jacquot et al. (1988) during the 1980s was improved several times as a result of better magnetic confinement (Hitz, 1995). Figure 40 presents the evolution of Ar11+ produced by different models of 10 GHz Caprice ECRIS whose magnetic field profile was improved. Minimafios 10 GHz, designed in the early 1980s is also represented as a starting point of ECRIS development. 2. Axial Magnetic Confinement: Extraction Side As ions are extracted at this source side, the magnetic field at the mirror throat has to be large for optimum confinement and low to facilitate plasma losses. It has been found that the best compromise is to have a magnetic field value close to the radial magnetic field value as shown in Figure 41 (Gammino et al., 2001). In this figure, extraction magnetic field also refers to the radial field as, for an optimum operation of the ion source, the plasma electrode has to be put on the last closed magnetic surface (Hitz et al., 2002b).
54
HITZ
F IGURE 41.
Mirror ratio at the extraction side.
3. Axial Magnetic Confinement: Minimum-B Until recently, minimum-B value was set to get a reasonable resonance length, but there was no clear evidence of its role in ECRIS performance. Thanks to the flexibility of fully superconducting ECRIS, it has been possible to emphasize the importance of this parameter (Hitz et al., 2002b, 2002d). Since then, other experiments confirm the following scaling rule (Arai et al., 2002; Nakagawa et al., 2004; Leitner et al., 2005). Actually, the role of the minimum magnetic field of the mirror is multiple: 1. It defines the resonance length. 2. It defines the magnetic field gradient at resonance. This controls time spent by the electrons in the region where they get kicks of energy. 2 = 3. It defines the last closed magnetic surface, Blast defined by Blast 2 2 Bmin + Brad , where Brad is the radial magnetic field value at chamber wall. Blast also defines the last closed magnetic isobar that confines the warm electrons in the plasma (Girard et al., 1995); the value of this surface must be as large as possible and still compatible with the rules defined by Bmin and Brad . For example, Figure 42 presents the minimum-B sensitivity on the performances of Serse for a typical ion Xe20+ . During this experiment, rf power
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 42.
55
Serse 28 GHz: evolution of Xe20+ intensity with different minimum-B values.
was pulsed to study at the same time the so-called “afterglow mode.” At the end of the rf pulse, electrons are no longer confined and leave the trap. So do the ions for the sake of plasma electroneutrality. This leads to a beam intensity enhancement at the end of the rf pulse. In Figure 42, the rf is stopped at 100 ms, 0 being at the pulse start. Other source parameters (rf power, gas pressure, other magnetic fields) were kept constant. A plot representing beam intensity versus Bmin /Brad is shown in Figure 43. The ratio Bmin /Brad is chosen as it is related to the last closed magnetic surface as presented previously. Usually, the radial magnetic field value, Brad , is defined either according to scaling laws as presented in the next section, or by technology if the radial field is made of permanent magnets. Having fixed a value for the optimum radial magnetic field, Figure 43 shows that there is a limited authorized value for Bmin . In this figure, two sets of experiments
56
HITZ
F IGURE 43. Evolution of beam intensities at 18 and 28 GHz for two xenon charge states delivered by two fully superconducting ECRIS, Serse (Gammino et al., 2001), and Venus (Leitner et al., 2006; Leitner, 2006).
have been performed, first with Serse (Gammino et al., 2001) and recently with Venus (Leitner et al., 2006), both sources being tuned at 18 and 28 GHz. The range where minimum-B is optimal, according to the chosen radial field, which is shown in Figure 43, is also confirmed with Venus ECRIS where a systematic study was recently done (see Figure 44) (Leitner et al., 2006). 4. Radial Magnetic Confinement As presented above, a radial magnetic confinement is necessary to produce high charge states. To reduce source cost and size, this confinement is usually achieved with permanent magnets. However, the drawback of such a system is that the magnetic field is fixed. During the 1980s, Caprice ECRIS was designed and equipped with an exchangeable hexapole. Whatever the frequency 10 or 14.5 GHz, Caprice showed that a larger radial magnetic field gave larger beam intensities (Hitz, 1995). Later, a fully superconducting ECRIS was built at Michigan State University, making it been possible to clearly study the importance of the radial field (Gammino et al., 1996). Experiments at 6.4 GHz showed that a larger radial field gave better performances. More recently, in 2000, Serse ECRIS was utilized for a set of experiments at several frequencies (14, 18, and 28 GHz). Figure 45 also shows a huge
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 44.
F IGURE 45.
57
Venus 18 GHz: evolution of Xe26+ vs. Bmin (Leitner et al., 2006).
Evolution of Xe27+ with the radial magnetic field at three different frequencies.
intensity improvement with the larger radial field. One curve indicates saturation and then a decrease in intensity with larger radial field. The main reason could be that the plasma diameter becomes too small, leading to a smaller ion
58
HITZ
F IGURE 46.
Same as Figure 45 with a radial mirror ratio.
lifetime. Such a negative effect of a too strong radial magnetic field has also been shown with another ECRIS (Drentje et al., 1999). As the radial field is related to plasma confinement, it must also be referred to the magnetic mirror ratio and Figure 46 indicates that an optimum value for the radial field is about twice the resonance magnetic field. Due to limitations in technology, experiments performed with Serse did allow such a high radial field at 28 GHz (which would necessitate 2 T at the chamber wall). Nevertheless, such a scaling law has recently been confirmed by Venus ECRIS (Leitner and Lyneis, 2005; Leitner et al., 2005). 5. Magnetic Scaling Laws Updated magnetic scaling laws, summarized in Hitz et al. (2002d), are presented in Table 4. Other measurements have been carried out by other laboratories that confirm the new scaling laws (Arai et al., 2002; Nakagawa et al., 2004; Leitner et al., 2005). For example, Table 4 lists the magnetic field profiles of four different ECRISs: the first and last versions of Caprice, the liquidHe free source Ramses, and the new fully superconducting source Venus. Where: Binj is the axial magnetic field at the injection throat, Bext is the axial magnetic field at the extraction throat, Becr is the magnetic field corresponding to the resonance: at 14 GHz, Becr = 0.5 T, at 18 GHz, Becr = 0.64 T, at 28 GHz, Becr = 1 T, Bmin is the minimum axial magnetic field, and Brad is the radial magnetic field value at the plasma chamber wall.
59
ELECTRON CYCLOTRON RESONANCE ION SOURCES
TABLE 4 T YPICAL M AGNETIC F IELD S CALING L AWS FOR P ERFORMING ECR I ON S OURCES AND C OMPARISON WITH THE M AGNETIC P ROFILE OF F OUR D IFFERENT ECRIS: T HE F IRST 10 GH Z C APRICE (Hitz, 1995), THE L AST 14 GH Z C APRICE (Hitz et al., 1996), V ENUS (Leitner et al., 2005), AND R AMSES (Arai et al., 2002) Magnetic scaling laws
Binj /Becr Bext /Brad Bmin /Brad Brad /Becr Blast /Becr
∼4 ∼0.9 ∼0.30 to 0.45 ∼2 2
Caprice 10 GHz (1984)
Caprice 14 GHz (1995)
Ramses 18 GHz (2002)
Venus 28 GHz (2004)
1.4 1.25 0.75 1.1 1.4
2.7 1 0.32 2.1 2.2
4.7 1.6 0.39 1.88 2.02
3.5 1 0.38 2.1 2.15
B. Examples of Magnetic Systems Apart from physics parameters, cost of fabrication is also an important factor. The cheapest ECRISs are those built totally with permanent magnets and operated at 2.45 GHz, and the most expensive ECRISs are fully superconducting. In between, most ECRISs use copper coils to produce the axial magnetic field and permanent magnets for the radial field, which is mostly hexapolar. This section shows some typical examples of magnetic systems. 1. Axial Magnetic Field a. Permanent Magnets For small but cheap ECRISs, the axial magnetic field can be produced by permanent magnets. Rare earth permanent magnets such as NdFeB can now give strong magnetic fields in nevertheless a smaller source size. Here also, a simple calculation can give a precise idea of the mirror. The use of an iron plug is also helpful to get a high-B mode profile, even if it is somewhat tricky to insert ferromagnetic material inside a magnetic system, because of the huge forces encountered. Figure 47 presents the magnetic mirror of an all permanent magnet ECRIS. More details for this source can be found in Hitz et al. (2005a, 2005b) and Meyer et al. (2006). The axial mirror of this 12–14 GHz source is given by three magnet cylinders and one iron plug. Injection and extraction cylinders are made of 24 magnets whose magnetic orientation faces the source axis, while the central cylinder has a magnetic orientation parallel to the axis. Its purpose is to give a suitable value for Bmin . In addition, this central cylinder is movable to fine tune Bmin . Figure 48 shows the calculation of this magnetic structure, which fits well with the measured values presented in Figure 47. This new ECRIS called SOPHIE (SOurce de PHotons et d’Ions par résonance cyclotron Electronique,
60
HITZ
F IGURE 47. configuration.
F IGURE 48.
Magnetic configuration of an all permanent magnet ECRIS in high-B mode
12–14 GHz all permanent magnet ECRIS. Calculation of the magnetic system.
or photon and ion source by electron cyclotron resonance) also fits with the scaling laws as shown in Table 5. Such an ECRIS can easily be installed
61
ELECTRON CYCLOTRON RESONANCE ION SOURCES
TABLE 5 M AGNETIC S CALING L AWS AND DATA O BTAINED WITH AN A LL P ERMANENT M AGNET ECRIS Magnetic scaling laws Binj /Becr
∼4
Bext /Brad
∼0.9
Bmin /Brad
∼0.30 to 0.45
SOPHIE 12.75 GHz → Binj ∼ 1.8 T 14.5 GHz → Binj ∼ 2.1 T 12.75 GHz → Bext ∼ 1.0 T 14.5 GHz → Bext ∼ 1.14 T 0.33 < Bmin < 0.50 T
Binj = 1.8 T Bext = 1.1 T Bmin = 0.43 T
on a high-voltage platform as the electrical consumption is limited to the rf generator and plasma chamber cooling unit. b. Room Temperature Coils For larger ECRIS, the most common way to achieve a mirror field is to use two sets of room temperature coils surrounded by an iron yoke. To control Bmin , for example, to have a tuning on the electron temperature trough magnetic gradient at resonance, a third coil is added between the mirror coils. Simple calculation codes such as Poisson/Superfish by the Los Alamos Accelerator Group (1987) give precise results for the design of such an ECRIS. Figure 49 presents magnetic field lines of the Grenoble Test Source (GTS-LHC) designed for CERN/Large Hadron Collider according to the former GTS ECRIS presented by Hitz et al. (2003, 2004a). The axial magnetic field of his 14 + 18 GHz ECRIS is achieved by three sets
F IGURE 49. Poisson Superfish magnetic calculation of a 14 GHz ECRIS. Magnetic field lines are represented and show the good efficiency of the iron yoke.
62
HITZ
of coils surrounded by a thick iron yoke to achieve the magnetic field that is necessary to run the ion source either at 14 or 18 GHz or both frequencies at the same time. In addition, the magnetic field at the injection side is reinforced by a thick iron plug that gives a high-B mode magnetic profile. Whatever the frequency, 14.5 or 18 GHz, the calculated magnetic field profile is in agreement with the semiempirical scaling laws, as shown in Table 6. Figure 50 shows real magnetic profile measured without iron plug, that is, in a classical mirror field, and with an iron plug placed at the injection side in a high-B mode configuration. Plotted data are for coil current of 1200 A (injection), 300 A (central), and 1200 A (extraction) while calculated values are for maximum coil current (1300 A, 300 A, and 1300 A). Both data are in very good agreement. TABLE 6 A XIAL M AGNETIC F IELD OF THE GTS-LHC ECRIS Magnetic scaling laws
Binj /Becr Bext /Brad Bmin /Brad
∼4 ∼0.9 ∼0.30 to 0.45
GTS-LHC 14.5 GHz
18 GHz
5.2 1 0.33 to 0.41
4.2
F IGURE 50. GTS-LHC: axial magnetic field measured for coil current set at 1200 A (injection), 300 A (central) and 1200 A (extraction).
ELECTRON CYCLOTRON RESONANCE ION SOURCES
63
F IGURE 51. Magnetic system of GTS-LHC ECRIS: (a) calculation phase, (b) design according to calculation, and (c) final design.
Once the calculation is done, the design phase starts as shown in Figure 51. This figure also shows the final drawing, which is slightly different from the design phase because of technological constraints. c. Superconducting Coils When an ECRIS is going to be installed on a high-voltage platform and/or when required beam intensities are important, the use of superconducting technology becomes necessary to reach high magnetic fields. This nevertheless leads to a more complicated technology as electrical wires must work below their critical temperature. Superconducting wires can be classified into two classes: low-temperature superconducting (LTS) and high-temperature superconducting (HTS). Table 7 presents main characteristics of superconducting wires. Most superconducting ECRISs now use NbTi wires as they fulfill the required magnetic fields, but must work at 4 K. For example, typical current densities for the Serse magnetic system are 246 A/mm2 (hexapole) and ≈180 A/mm2 for mirror coils. Another
64
HITZ TABLE 7 M AIN C HARACTERISTICS OF S UPERCONDUCTING M ATERIALS
Material
Critical temperature, Tc
Irreversible field (T)
Current density, Jc (A/mm2 )
LTS
NbTi
10 K
12 T (4.2 K)
Nb3 Sn
18 K
27 T (4.2 K)
MgB2
39 K
15 T (4.2 K)
4 × 103 (5 T, 4.2 K) 104 (5 T, 4.2 K) 104 (5 T, 4.2 K)
YBa2 Cu3 O7−x
95 K
>100 T (20 K)
BiSCCO
96 K
>100 T (20 K)
111 K
≈100 T (20 K)
HTS
Bi2 Sr2 CaCu2 O8−x called Bi2212 Bi2 Sr2 Ca2 Cu3 O10−x called Bi2223
106 (77 K) 103 (20 T, 20 K) 500 (77 K)
ECRIS uses HTS BiSCCO-type wire at 20 K with a current density of 90 A/mm2 (Kanjilal et al., 2005). However, BiSCCO wire is made of sintered powder with crystalline defects and grain boundaries that are obstacles to high currents. The choice of the superconducting wire depends not only on the magnetic field to be reached, but also on the environment. If the ECRIS is situated not far from the cryogenic station, the easier choice is to use NbTi wires in a liquid helium (LHe) bath at 4 K. If LHe consumption is not too high, autonomous cryocoolers can be used instead. Cooling fluid has two main purposes: it absorbs thermal losses and it also absorbs perturbating energy to ensure wire stability. The choice of the working temperature then defines the cooling fluid and its thermodynamic state: 1. boiling He at atmospheric pressure for an operation at 4.2 K, 2. supercritical (P > 0.22 MPa) for a forced cooling circulation at 4.2 K or at T > 20 K (P = 5 MPa) for HTS wires, 3. superfluid He (T < Tλ = 2.17 K, P = 0.1 MPa) for a cooling at 1.8 K, 4. boiling nitrogen at atmospheric pressure at T = 77 K and P = 0.1 MPa for HTS wires. Superconducting devices can also be LHe free at 4 K as RIKEN ECRIS (Kurita et al., 2000). Figure 52 shows a magnetic mirror made of NbTi coils. An iron plug is added at each side of the source to reach the required magnetic field on the
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 52.
65
Axial magnetic system with superconducting coils and iron plugs.
source axis, while keeping the coil current density as low as possible. Two central coils will tune Bmin and magnetic gradient at resonance as well. The resulting axial magnetic field is presented in Figure 53 and more details about this ECRIS are presented in the next section.
F IGURE 53.
Typical axial magnetic field that could be obtained with superconducting wires.
66
HITZ
Inserting several coils between injection and extraction coils also leads to longer plasma, which is favorable for the production of highly charged ions. Figure 54 presents the design of a future ECRIS under design for the RIKEN
F IGURE 54.
RIKEN ECRIS project (Nakagawa et al., 2006).
67
ELECTRON CYCLOTRON RESONANCE ION SOURCES
TABLE 8 VARIOUS O PERATING M ODES FOR S U SI-ECRIS D EPENDING ON THE N UMBER OF A XIAL C OILS T HAT A RE E NERGIZED Plasma electrode position
Maximum Middle Minimum
Before mirror throat At mirror throat After mirror throat
Distance between mirror throats (mm)
14.5 GHz
Resonance length (mm) 18 GHz
460 408 340
120 98 78
154 126 102
radioactive beam factory project (Nakagawa et al., 2006). This new ion source has four central coils giving a plasma volume three to four times larger compared to the same source equipped with only one central coil (dashed line in Figure 54). Inserting several middle coils is also advantageous as it can give a flexible magnetic field, where it is possible not only to change the resonance zone volume, but also to tune the distance between resonance zone and plasma electrode as proposed by Zavodsky et al. (2005, 2006). SuSI, the MSU-ECRIS can run in three different modes as shown in Table 8. This is a way to tune the electrode position without opening the ion source; however, this configuration makes two main changes at the same time and it would be difficult to define which parameter (plasma electrode position or resonance length) leads to better ion beam. Moreover, as these three operating modes also change the magnetic gradient at resonance, larger resonance length may not lead to larger beam intensities, which is expected if no other parameters are changed. Furthermore, Suominen et al. (2004b) also showed that plasma electrode position influences beam emittance. However, once rough tuning of electrode position is done, during source commissioning, having this possibility of external position change during source operation would certainly greatly appreciate in order to match the ion beam with the accelerator beam line. 2. Radial Magnetic Field Most ECRISs have a radial magnetic field given by a hexapolar system. A quadrupole would give a kind of rectangular ion beam, suitable for implantation, but difficult to transport. Configurations with 8 or 12 poles are also possible, keeping in mind that a large number of poles gives a more cylindrical ion beam and larger plasma diameter, but provides weaker a magnetic field as compared with a hexapole. a. Permanent Magnets One of the simplest systems is to use six bars made of permanent magnet, giving three north poles and three south poles
68
HITZ
F IGURE 55.
Caprice hexapole.
facing the source axis. Figure 55 is an image of the hexapole built for Caprice ECRIS (Jacquot et al., 1988). It is composed of six elementary bars having the easy axis oriented to the source axis. Six trapezoidal magnets are situated in between to reinforce the magnetic field value at each north or south pole. Such a system can give a radial field around 1 T at the plasma chamber wall (diameter 66 mm). The optimum configuration for such a system is given by the so-called Halbach array that concentrates magnetic flux on one side of the array and cancels it on the other (Halbach, 1980). A representation is given in Figure 56 where 36 poles are utilized to create a strong magnetic field. In this figure, two possibilities are shown. In the first version, all magnets are identical while the second version, actually more difficult to realize, concentrates more magnetic flux where necessary, that is, at the pole tips. Figure 57 presents the difference between both configurations. 3% difference is actually negligible compared to the technical difficulties to machine such a nonsymmetric system. And then, a symmetric system has been chosen for CERN-LHC ECRIS (Figure 58). Considering this Halbach array, the radial magnetic field obtained at the plasma chamber wall depends on the following:
ELECTRON CYCLOTRON RESONANCE ION SOURCES
69
F IGURE 56. GTS hexapoles: left: 36 identical magnets; right: larger magnetic flux concentration on six main poles (labeled 1, 7, 13, 19, 25, 31).
F IGURE 57. in Figure 56.
Radial magnetic field calculated for GTS hexapole for both possibilities presented
70
HITZ
F IGURE 58.
GTS-LHC hexapole composed of 36 magnets.
F IGURE 59. Halbach hexapole: magnetic field that could be obtained with different numbers of magnets. The inner and outer diameters of the hexapole remain constant.
1. The number of magnets utilized for this configuration (6, 12, 24, 36, 48, etc.). Figure 58 presents an array made of 36 magnets. Actually, the more the magnets, the larger the magnetic field as shown in Figure 59. Nevertheless, above 36 poles, the gain in magnetic field is quite poor compared to the hexapole cost of fabrication. 2. The hexapole inner and outer diameters. If the outer diameter of the magnetic array remains constant, the magnetic field intensity becomes
ELECTRON CYCLOTRON RESONANCE ION SOURCES
71
F IGURE 60. Radial magnetic field given by a hexapolar system as a function of its inner radius with a fixed outer diameter (Sun et al., 2004).
maximum and then decreases when the inner diameter increases as shown in Figure 60 (Katayose et al., 1995; Sun et al., 2004). From this closed configuration, improvements are still possible. For example, a so-called offset structure that has the same magnet configuration but a different easy axis for each magnet gives a stronger magnetic field (Suominen et al., 2004a). However, adding small pieces, made of high permeability material, close to the six poles, can strongly enhance the magnetic field in this region and therefore improve the extracted ion beam intensity (Koivisto et al., 2004; Suominen et al., 2006). However, the main parameter that has to be taken into account is magnet demagnetization. Careful calculations have to be performed to determine if each part of the multipole does not support a too strong demagnetization due to other magnets or to the axial magnetic field. Figure 61 is an example of a rather complicated hexapole that will be inserted in the strong axial field of Figure 53. This hexapole is divided into six elementary hexapoles, themselves fabricated with 36 magnets having different coercivity and remanence. Additional explanations will be given in the next section. Figure 62 presents
72
HITZ
F IGURE 61. 2004).
Halbach type hexapole with different magnet grades (Hitz et al., 2005b; Sun,
F IGURE 62.
Permanent magnet hexapole inserted inside axial magnetic field coils.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
73
the overall magnetic system made of superconducting coils and permanent magnets for a long ECRIS to produce very high charge states. Remarks. This presentation is done for a hexapolar system as this is the most common configuration utilized by today ECRISs. However, other multiplicity can be chosen, like octupole (Bol et al., 1985) or dodecapole (Alton and Smithe, 1994). Larger number of poles leads to larger plasma volume and to more cylindrical beams, but this is to the detriment of large magnetic field. b. Superconducting Coils When the ion source is required to deliver intense beams of highly charged ions, it has been demonstrated that large plasmas can fulfill these requirements (Hitz et al., 2000b). But large plasma can only be confined inside a large plasma chamber. In addition, beam intensity of highly charged ions increases with the square of the frequency. As the resonance field is related to the frequency, high radial magnetic fields are necessary. For example, 28 GHz involves 2 T as the radial field value at the chamber wall and such a homogeneous field can be achieved only with superconducting wires. Nevertheless, it is also possible to reach this magnetic field with a system made of permanent magnet and soft iron as presented in Suominen et al. (2004a), but just locally at the pole tip. Considering a hexapolar magnet made of six superconducting coils, each coil has a racetrack shape (Figure 63) whose length is greater than the length of the axial coils to avoid too high forces between the end parts of racetrack coils and axial bobbins. More details about this system can be found in Taylor et al. (2000), Gammino et al. (2002), and references therein.
F IGURE 63.
Usual magnetic field coils of an ECRIS.
74
HITZ
F IGURE 64.
Long superconducting ECRIS for very high charge state production.
Such a configuration can be extended indefinitely to obtain long plasmas to produce very high charge states. Figure 64 presents a system with four intermediate coils giving a long and flat minimum-B profile for the axial magnetic field. To avoid too large forces due to interaction between coils, an original design places racetrack hexapolar coils outside axial coils (Zhao et al., 2006). Thanks to this position, more room is available to install something else than a hexapole to create the radial field. Figure 65 shows a multipole made of 12 racetrack coils placed outside a long axial system composed of six coils. This configuration would give a more cylindrical ion beam than a hexapole.
F IGURE 65. Superconducting multipole, composed of twelve racetrack coils, and placed outside mirror coils equipped with four intermediate coils.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
75
However, the main drawback of this system is that larger current is needed for the multipole as compared to the configuration in Figure 64. C. Conclusion Once, the rf frequency is chosen, updated scaling laws presented in this section give a rough idea of the ion source size. It is clear that an all permanent magnet 28 GHz ECRIS or a 2.45 GHz fully superconducting device is not a suitable choice. However, improvements in permanent magnet technology and in superconducting wires may result in the design of very powerful ECRISs running at very high frequencies. The next section presents different ECRISs recently built according to updated magnetic scaling laws.
V. D ESIGN OF VARIOUS E LECTRON C YCLOTRON R ESONANCE I ON S OURCES This section will not draw a panel of ECRISs that are in use all around the world. There are a lot of references that present these machines, such as Wolf (1995), Geller (1996), and Brown (2004). In addition, proceedings of International Conferences on Ion Sources and ECRIS workshops provide many details about this type of ion source. The purpose of this section is to show how to build an ECRIS while taking into account all parameters described in the previous sections. After a rapid presentation of the main ECRIS parameters, different types of ECRIS are shown. A. Main Parameters of a Well-Performing ECRIS A source of multiply charged ions, based on electron cyclotron resonance along the whistler mode, fulfills the following condition: Becr = ω(m/e).
(50)
On the other hand, to remove a lot of electrons from the atom by electron collision, those electrons must be kept inside a magnetic bottle. Several magnetic field scaling laws, as presented before, indicate one typical magnetic profile of an efficient ECRIS to obtain high confinement at the injection side and high losses at the extraction side. An axial magnetic field profile in high-B mode superimposed with a radial magnetic field with a mirror ratio 2 will certainly give a performing ECRIS.
76
HITZ
In addition, electrons must be energetic and then electron energy is related to the square of the heating frequency; the higher the frequency, the better the performances provided the source has a suitable confinement compatible with the frequency. A fourth condition is to keep the ions during a period long enough to let them be stripped. This condition implies a large plasma chamber to get long free paths for the ions. Last, the extraction system must be appropriate to the ion source. This section will not provide an extensive discussion of extraction, as this important subject has been presented by Spädtke (2004) and Hollinger (2004). The following sections show how to integrate all these parameters into various ECRISs starting with the simplest, that is, all permanent magnet ECRIS, and finishing with large superconducting devices. B. All Permanent Magnet ECRIS Every ion source designer has, at least once, dreamed of an ion source having the lowest electrical consumption possible. Except for the microwave generator and the extraction power supply, this type of ion source exists thanks to outstanding progress made by permanent magnet manufacturers. Today rare earth permanent magnets made of sintered powder of NdFeB can fulfill several conditions at the same time: they can support a reasonable high temperature (about 70 ◦ C) while having high remanence and coercivity. For example, characteristics of two types of magnets are Br = 1.47 T with Hcj = 955 kA/m or Br = 1.08 T with Hcj = 2865 kA/m. Depending on the condition of use of the magnet, the source designer has a wide panel of magnet qualities. To obtain a strong magnetic field, a high remanence (Br ) will be chosen; if the magnet has to support strong demagnetizing magnetic fields created by other magnetic systems, a strong coercivity (Hcj ) will be chosen. For example, Figure 61 presents a hexapole made of several magnet grades chosen as a function of the demagnetizing field at each magnet point. As usual for an ECRIS, the magnetic field configuration is achieved by the superimposition of an axial field and a radial field. The axial magnetic field is simply created by a cylinder, which is usually composed of 24 elementary magnets whose magnetization is oriented to the source axis. Figure 66 presents three different magnetic cylinders that are necessary to create an axial magnetic field with minimum-B. Injection and extraction cylinders give mirror throats while the purpose of the central cylinder is to slightly tune the minimum-B to fit with magnetic scaling laws. For such a configuration, the resulting axial magnetic field strongly depends on the inner cylinder diameter. A small diameter gives a high axial field, which
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 66.
77
Three types of cylinder utilized for the axial magnetic field.
unfortunately contradicts the fact that a large plasma diameter is required to obtain long ion lifetimes. A compromise must then be made, with the usual plasma chamber diameter in the range of 50 mm, the largest today being achieved by LAPECR2 with 67 mm (Sun et al., 2006b). Figure 67 presents the axial magnetic field profile of SOPHIE, an ECRIS designed to be installed on a high-voltage platform (Hitz et al., 2005a). This magnetic field configuration is achieved by three cylinders as shown in Figure 66. The inner diameter of the injection and extraction cylinder is 56 mm leading to a plasma chamber inner diameter of 50 mm. The central cylinder has a large inner diameter (136 mm) since it is placed around the hexapole. One characteristic of this magnet is to be movable from the injection to the extraction cylinders, changing in such a way the magnetic field values at mirror throat and minimum-B. Figure 67 shows two possible configurations depending on the position of the central cylinder. For good source optimization, one way to tune the magnetic field profile is to move the magnets between each other. This movable central cylinder offers the possibility of minimum-B tuning to fit with scaling laws and gives the start of an asymmetric profile as recommended by scaling laws. In this figure, the straight line represents a resonant field for a 14 GHz ECRIS. Magnetic field scaling laws, as previously presented, indicate that, for a 14 GHz ion source, an optimum value at the injection mirror throat is about four times the magnetic field at resonance, which means about 2 T at 14 GHz. And then, axial magnetic field profiles such as shown in Figure 67 are not really satisfactory since only half of the optimum value is obtained and a high-B mode profile is not obtained. One solution for this is to enlarge the magnet size at injection. However, this method would considerably increase
78
HITZ
F IGURE 67. Axial magnetic field profile of an all permanent magnet ion source (Hitz et al., 2005a; Meyer et al., 2006). Upper curve: the central cylinder is almost at the center to give a symmetric profile. Lower curve: the central cylinder is set close to the extraction cylinder to create an asymmetric profile.
the source cost and weight. Another solution is to add an iron plug as already installed in previous room temperature ECRISs such as AECR (Xie, 1998) and GTS (Hitz et al., 2002a). Even in a permanent magnet source, a small plug can be installed on the source axis where the hot plasma is, as already shown in Figure 47.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 68.
79
Permanent magnet ECRIS under high-B mode.
This plug considerably enhances the magnetic field to satisfy the high-B mode configuration. Figure 68 shows the resulting axial magnetic field, which allows very good confinement at the injection side (Binj ≈ 3.6Becr ). However, for such an ion source, the radial magnetic field is given by a compact hexapole made of 24 poles as shown in Figure 69. This classical structure is inserted between injection, central, and extraction cylinders as presented in Figure 70. Figure 71 shows the calculated magnetic field, which is in good agreement with the measurement. At the plasma chamber wall, the magnetic field reaches 1.1 T, which is an optimum value for 14 GHz, according to scaling laws. The above figures show that degrees of freedom to tune such an ECRIS are rather limited. For example, a change in position of the central cylinder slightly changes the minimum-B value; however, such a mechanical adjustment is difficult when the source runs. Therefore, most users of such a type of ECRIS utilize a variable frequency transmitter such as traveling wave tube (TWT). If changing the magnetic field profile is somehow difficult, changing the resonance frequency is much easier.
80
HITZ
F IGURE 69.
F IGURE 70.
Hexapole of SOPHIE made of 24 elementary magnets.
Whole magnetic system of a simple all permanent magnet ECRIS.
1. Microwave Coupling Once the magnetic field configuration and microwave frequency are chosen, the way microwaves are launched into the ECRIS is critical. Taking a look at Figure 72, for example, indicates that several resonance points must be crossed by microwaves before entering into the main plasma chamber.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 71.
Radial magnetic field profile for the all permanent magnet source Sophie.
F IGURE 72.
Resonance points in an all permanent magnet ECRIS.
81
82
HITZ
F IGURE 73. Microwave injection system with the rf window at the plasma chamber entrance. Upper figure: position of the waveguide and rf window inside the magnetic system. Lower figures: details of the injection system extremity.
Several solutions can then be investigated: 1. A microwave window set at the plasma chamber entrance: a waveguide (rectangular or circular), in TE10 or TE11 mode, is introduced up to the plasma chamber entrance. The rf window is placed at the end of the waveguide as shown in Figure 73. This system seems to be advantageous as all parasitic resonance points crossed by the waveguide are at atmospheric pressure and there is no rf power absorption. However, the main drawback of this system is to place an rf window, which is made of quartz or sapphire, close to the plasma. To avoid any bombardment by plasma particles, this rf window has to be protected by a large disk. In addition, this disk can be polarized to work as the well-known biased disk utilized by many ECRISs to improve source performances. But because of the room taken by the rf window, it is impossible to install an iron plug, which would be at the same place. A high-B mode is then unfeasible. 2. Coaxial rf coupling: even if the efficiency of such a coupling is rather weak (see Figure 10), its use has several advantages. As coaxial waveguides are small (inner diameter ≈6 mm and outer diameter ≈20 mm for a frequency range of 10–14 GHz), compact ECRISs can be built. As for Caprice ECRIS,
ELECTRON CYCLOTRON RESONANCE ION SOURCES
83
a quartz tube can be installed at the parasitic resonance zones to create a socalled “first stage plasma.” However, this method of microwave launching may lead to a total absorption of microwave power at parasitic resonances, and then beam intensities of multiply charged ions would be rather weak. 3. A rectangular waveguide and rf window at the source entrance: this is the most classical way to launch microwaves into an ECRIS. As it is far from plasma, the rf window is well protected from any particle bombardment. But, as for a coaxial coupling, several parasitic resonances are under vacuum and may lead to power absorption inside the waveguide (Figure 74). One way to get rid of this difficulty is to keep a good vacuum inside the waveguide. As the power can increase the temperature of the waveguide, it is absolutely necessary to cool it down and avoid any outgassing inside the waveguide. Nevertheless, one clue to minimizing any pressure rise inside the waveguide is to drill small holes on the waveguide sides. If a rectangular waveguide is used for a TE10 coupling (as shown in Figure 12), the wavelength inside the waveguide λg is given by 1 1 1 = 2 + , 2 λg (2b)2 λ0
(51)
where λ0 is the wavelength under vacuum at the considered wavelength, which is at 14 GHz: λ0 =
F IGURE 74.
3 × 108 c = = 21.43 mm f 14 × 109
All permanent magnet ECRIS with coaxial microwave coupling.
(52)
84
HITZ
F IGURE 75.
All permanent magnet Sophie: O6+ intensity vs. rf power.
and b is the size of the largest side of the waveguide. At 14 GHz, the dominant mode TE10 is transmitted by a WR62 waveguide, whose size is 15.8×7.9 mm. Then 2b = 31.6 mm and λg = 29.16 mm. Usually, holes are drilled in the smallest waveguide size and, to avoid any effect by one hole on the others, the distance between holes is λg /4. At 14 GHz, this distance is about 7.3 mm. For some technical reasons, it is sometimes necessary to pump down the waveguide through the largest waveguide size. As the electric field is maximum on this side, drilling holes may lead to rf power losses. One possibility is to drill small slits (about 2 × 30 mm) along this waveguide. This technique is utilized by SOPHIE ECRIS and Figure 75 shows good microwave coupling since 100 µA of O6+ is produced by 100 W, while 800 W is necessary to deliver 600 µA (Hitz et al., 2005a). 2. Source Design Once microwave frequency and magnetic system are defined, the source designer has to introduce one of the most sensitive components of the source, that is, the plasma chamber. To prevent the magnets and plasma chamber itself from any temperature rise, this plasma chamber must be watercooled. It is usually made of aluminum to take benefit from the secondary electron emission of the oxided plasma chamber wall. Several other techniques can be utilized to improve source performances as extensively described by Drentje (2003). Figure 76 presents an entire source.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 76.
85
All permanent magnet ECRIS.
In such an ECRIS, the plasma chamber diameter is so small that neutral pumping inside the plasma chamber is difficult to achieve. It can be done only through the injection side, but if an iron plug is installed, the pumping speed from this source side is reduced to almost zero. And the only pumping port is the plasma electrode. That is why two types of plasma electrode are possible, depending on the charge state to be obtained. To produce medium charge states such as Ar8+ , a rough pumping through the extraction hole is enough; however, extraction of high charge states is more critical as charge exchange with neutral particules becomes important. In that case, a very efficient pumping is necessary near the extraction area. One possible trick is to drill holes in the plasma electrode in regions without plasma impact, as shown in Figure 77 as presented by Hitz et al. (2000b). The extraction system, which could be utilized in an all permanent magnet ECRIS, can be similar to that described by Spädtke (2004). In a three electrode system as presented in Figure 78, there is no space charge compensation between plasma electrode and puller, and then the ion beam can easily be divergent. Adding a negatively biased electrode compensates for this drawback. However, the distance between both puller electrodes has to be as small as possible, as in this region, the ion beam is partly space charge compensated. In addition, one major difficulty related to an all permanent magnet ECRIS is the fact that the magnetic field exists all the time. And the superposition of magnetic and electric fields easily leads to Paschen discharges. Therefore, a very good vacuum is required in the extraction region.
86
HITZ
F IGURE 77. ECRIS plasma electrode. Left: for medium charge states. Right: pumping holes are added to minimize the charge exchange process and to deliver high charge states.
F IGURE 78.
Three electrode extraction system.
Use of such an extraction system allows the ion source to deliver intense ion beams of medium charge states. Figure 79 shows the evolution of beam intensity as a function of the extraction voltage for the all permanent magnet source SOPHIE whose electrodes are rated at 25 kV. There are many permanent magnet ECRISs that are built for several purposes. One of them, specially dedicated to hadrontherapy, produces intense beams of C4+ and can run nonstop for several months while mastering carbon contamination (Kitagawa et al., 2002).
87
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 79.
SOPHIE: beam intensity versus extraction voltage.
TABLE 9 B EAM I NTENSITIES P RODUCED BY T WO K INDS OF A LL P ERMANENT M AGNET ECRIS S
Supernanogan (Bieth et al., 2000, 2005) SOPHIE (Hitz et al., 2005a)
Magnetic field (injection– minimum– extraction)
Plasma chamber size and microwave coupling
Typical beam intensities (µA) O6+
Ar8+
Ar12+
1.05 T–0.40 T– 1.05 T
diameter 44 mm coaxial diameter 50 mm rectangular
150
300
24
5
630
500
35
24
1.8 T–0.43 T– 0.84 T (high-B mode)
Xe26+
Table 9 presents typical characteristics and beam intensities produced by two multipurpose all permanent magnet 14 GHz ECRISs, Supernanogan (Bieth et al., 2000) and Sophie (Meyer et al., 2006). TE10 coupling seems more efficient than coaxial TM0,m,n ones as larger intensities are obtained with rectangular coupling. Furthermore, as expected, the larger plasma chamber enhances the intensities of higher charge states. Usually designed for 2.45 GHz, all permanent magnet ECRISs can now follow magnetic scaling laws at higher frequencies up to 14 GHz. They are
88
HITZ
F IGURE 80.
Sophie: typical xenon beam intensities and comparison with Caprice.
a good alternative for high-voltage platforms as their electrical consumption is very low. Their performances are now comparable to room temperature ECRISs having a similar plasma chamber size. As an example, Figure 80 compares the intensities given by SOPHIE with those given by the wellknown Caprice source (Hitz et al., 1995). Their small plasma chamber size allows the production of intense beams of medium charge states that are required by implanters. However, the production of higher charge states is somewhat limited as it requires a large plasma chamber to get long ion lifetimes. Thanks to its simplicity and reliability, such a type of ion source is now successfully utilized by accelerators dedicated to cancer treatment (Muramatsu et al., 2006; Drentje et al., 2006). C. Room Temperature ECRIS When required beam intensities and charge states are not compatible with allpermanent magnet ECRIS possibilities, room temperature ion sources are a good compromise between cost and performance. Room temperature means that the axial magnetic field is created by copper coils supplied by large
ELECTRON CYCLOTRON RESONANCE ION SOURCES
89
currents (up to 1300 A) or by a large number of pancakes. This involves important power supplies and efficient coil cooling, which is not advantageous if the ion source has to be installed on a high-voltage platform. As for all-permanent magnet ECRIS, we will not describe several room temperature ECRISs. What will be presented is a way to build an ECRIS at reasonable cost. Typically, the first things that have to be considered are heating frequency and magnetic fields. Physics says that larger frequencies give larger electron densities and then larger beam intensities. In this frequency domain, microwave generators were designed for broadcasting; and today generators are at 14, 14.5, and 18 GHz, more rarely at 10 and 16 GHz, with a maximum output power of 2 kW (it is also possible to find a 15 kW– 18 GHz generator). Then, for a reasonable cost, the source designer would choose a plasma chamber size compatible with the available rf power. 1. Example of Source Design To fit as much as possible with magnetic scaling laws, three sets of coils are utilized to provide the axial mirror field. Coils can be surrounded by soft iron to obtain maximum efficiency. In a previous section, the magnetic design of a second version of GTS (Grenoble Test Source) (Hitz et al., 2002c) was presented (Figure 49). Injection and extraction coils are surrounded by a thick iron yoke to enhance the magnetic field given by each individual coil pancake. To get a high-B mode, an iron plug is installed on the source axis inside the plasma chamber. A third central coil will tune the minimum-B. The resulting magnetic field profile allows this source to perfectly run at 14 and/or 18 GHz. For this source, each coil is made of three pancakes connected together as presented in Figure 81. Figure 82 shows different steps of source fabrication from a basic magnetic calculation up to the source assembled. As the resulting magnetic field allows the ion source to run either at 14 or 18 GHz or both, the microwave injection system is made of two rectangular waveguides (see Figure 17). These watercooled waveguides cross the iron plug before entering the plasma chamber. Two ovens are also installed to make metallic vapor at the plasma chamber entrance. These ovens can be removed from the source without breaking the vacuum. A movable biased disk is also installed at the plasma chamber entrance to enhance the source performance and to protect the iron plug from plasma bombardment as well (see Figure 17). The overall setup is presented in Figure 83, where the magnetic system is removed, and in Figure 84.
90
HITZ
F IGURE 81.
Coil configuration.
F IGURE 82. Magnetic design of GTS from design to assembly. (a) Overall design, (b) coil and iron yoke under assembly, and (c) source body with coil electrical connectors and cooling pipes.
2. Usual Performances of Room Temperature ECRIS Most room temperature ECRISs have more or less the same size, with some differences. For example, the well-known AECR (Xie, 1998) favors charge exchange reduction to get large intensities of high charge states. Caprice (Hitz,
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 83.
91
GTS II without magnetic structure.
F IGURE 84.
GTS II on its bench.
1995) puts the accent on compactness while still producing fully stripped argon ions. Many ECRISs are derivatives of these two groups, for example, AECR served as basis of several new ion sources in Europe and the United States. Another source, GTS as presented above, is a compromise between
92
HITZ
different parameters that are often in contraction: plasma chamber size, high magnetic fields, and high microwave frequency. Figure 85 and Figure 86 show that such a rather cheap machine can even produce fully stripped argon ions. Both CSDs presented in Figure 85 are obtained with and without gas mixing method. It has been shown by Suominen et al. (2004b) that too much gas mixing may worsen beam emittance. Therefore, if gas mixing technique is usually useful to increase beam intensities (Drentje, 2003), it is very important to minimize buffer gas quantity by a good confinement. Figure 86 shows that, when stopping the contribution of oxygen, Ar17+ intensity only falls from 4 µA down to 3.2 µA. Of course, as the plasma chamber is made
F IGURE 85. 2004a).
Production of Ar17+ by GTS with and without gas mixing technique (Hitz et al.,
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 86.
93
Same as Figure 85 with a zoom on fully stripped argon ion.
of aluminum, it has already been oxided and the difference between both intensities most likely comes from ion cooling (Drentje et al., 2000). This experiment also shows the crucial role of plasma chamber coating which gives a lot of secondary electrons, and to keep this effect constant in time, oxided layers or liners may be useful as shown by Schachter et al. (1998). A last version of GTS is installed at CERN and will provide ion beams for the Large Hadron Collider. For this application, GTS works in a pulsed operation, using the well-known afterglow (Melin et al., 1990). To do so, both microwave and biased disk are pulsed independently, as shown in Figure 87. The purpose of pulsing the biased disk is to let electrons (and consequently ions) rapidly go to the extraction zone of the plasma chamber. However, the role of this biased disk is not clearly understood. Some electrons are well trapped thanks to the extraction potential; and when a negative potential is set on the biased disk, which is on the other side of the source, this may increase the electron trap efficiency and increase the electron lifetime as well. However, applying a negative potential also leads to ion bombardment on the biased disk surface. And then a secondary electron emission may appear. In that case, biased disk material could be important, and it has been shown that this is not the case (Biri et al., 1999). A third possible phenomenon that may appear is the sputtering effect. This can be used to produce metallic elements as will be shown. Figure 88 and Figure 89 may provide a possible explanation for the biased disk effect. Without changing any parameter other than biased disk voltage, the evolution of argon charge state distribution is shown. These
94
HITZ
F IGURE 87. Afterglow mode: biased disk voltage can also be pulsed to influence the beam shape during the afterglow.
F IGURE 88. GTS I: argon charge state distributions at two different biased disk voltages, other source parameters being unchanged.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 89. Ar14+ .
95
GTS I: evolution of argon CSD with biased disk voltage for a source tuning on
CSDs do not show any ion coming from the disk and, in that case, sputtering is negligible. On the other hand, secondary electrons would increase with disk voltage, and so would the ions. The third and most probable solution is that the biased disk plays a key role in electron (and consequently ion) confinement
96
HITZ
as suggested by Runkel et al. (2000). It would be very interesting to perform spectroscopic measurements with a source equipped with a biased disk to determine if some ions that are not extracted still exist inside the source. Sources that are presented in Section IV (Caprice and Quadrumafios) do not have any biased disk. For a tuning on Ar14+ , oxygen gas is added as buffer. Figure 89 shows that oxygen CSD and intensities are not strongly influenced by biased disk as for the main gas. Nevertheless, apart from electron confinement, this small biased disk can easily be utilized for metallic ion production. The use of heavy ion (Ar, Kr, Xe) plasma and a negatively biased metallic disk placed quite close to the plasma can give very high charge state metallic ions. The intensity of these ions is quite low, depending on the number of metallic atoms that are sputtered; however, the charge states of these ions can be very high. Figure 90 presents two different methods to produce refractory elements. RIKEN 18 GHz (Nakagawa et al., 1998) and GTS 18 GHz are almost identical ECRISs operated at the same frequency, which roughly give similar results. One method to produce a refractory element is to slowly move a small rod toward the plasma, as done with RIKEN ECRIS. This gives quite intense beams of medium charges. Another method utilized by GTS ECRIS is by metal sputtering with argon ions. Figure 90 shows that these methods are complementary. Another advantage that can be offered by an ECRIS is its ability to run in pulsed mode, when the ion source is connected to a synchrotron, for example. In this configuration, the ion source can be utilized in the so-called afterglow mode. During the first period, in which the plasma is created and sustained by microwaves, multiply charged ions are kept inside the plasma. In a second phase, in which the rf is switched off, electrons are no longer confined and can escape from the source. So do the ions at the same time, this phenomenon being seen by a rapid increase of ion intensity during the afterglow, as shown in Figure 91. This figure shows Bi24+ intensity produced by GTS at 14 GHz and a moderate power. Owing to ECRIS flexibility, the time between two pulses can be attributed to the repetition rate of the synchrotron. Figure 92 presents a typical charge state distribution obtained during the afterglow. This CSD is obtained with the latest version of GTS, which is now installed at CERN for the LHC program. The source is tuned on Pb29+ , whose intensity is above 200 µA for a rather moderate rf power (1 kW) at 14.5 GHz. To conclude this section on room temperature ECRISs, whatever the ion source size, it is important, when designing it, to follow the magnetic scaling laws. For example, Table 10 presents the magnetic profile of the ion source, which has been taken as an example.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 90. (GTS ECRIS).
97
Tantalum ions produced by two methods: oven (RIKEN ECRIS) and sputtering
F IGURE 91.
GTS I: pulses of Bi24+ (14.5 GHz, 800 W).
98
HITZ
F IGURE 92. GTS-LHC: production of Pb ions in afterglow mode. Charge state distribution is obtained when the source is tuned on 29+ at 14.5 GHz. TABLE 10 M AGNETIC P ROFILE OF GTS II AND C OMPARISON WITH M AGNETIC S CALING L AWS Magnetic scaling laws Binj /Becr Bext /Brad Bmin /Brad Brad /Becr Blast /Becr
GTS II (14 GHz) ∼4 ∼0.9 ∼0.30 to 0.45 ∼2 2
4.8 1 0.33 to 0.41 2.37 2.5
Nevertheless, remaining compatible with these laws means that an upper limit of the source size is situated for a plasma chamber diameter of about 80–100 mm. This limit comes from the permanent magnet hexapole which must give a radial magnetic field equal to twice the resonance field. There are two ways to obtain a larger plasma chamber diameter: first, iron pieces can be added to enhance the radial field as proposed by Koivisto et al. (2004) and Suominen et al. (2006). The other solution is to use superconducting coils as shown in the next section. Other ECRISs also recently show the importance of a very good confinement. For example, Caprice 14.5 GHz, whose magnetic profile is in agreement with magnetic scaling laws, produced 1 mA of O6+ with 1 kW of rf power (Hitz et al., 1996), while a more recent ion source produced exactly the same intensity with, however, a rather low confinement. And to compensate for
ELECTRON CYCLOTRON RESONANCE ION SOURCES
99
this drawback, several kW of rf power and a larger frequency were needed Thuillier et al. (2005). Another limit of room temperature ECRIS comes from large power supplies that are needed for copper coils. Basically, this type of source may need 150 kW of electrical power and that could be a problem when the ion source has to be installed on a high-voltage platform. To compensate for this drawback, one solution is to replace copper coils with superconducting ones. D. Compact Superconducting ECRIS The philosophy of this type of machine is to use superconducting coils for the axial mirror field and permanent magnets for the hexapolar field. This is a step between room temperature ECRISs and fully superconducting machines. As presented in Table 7, several types of superconducting wires are offered to the designer. Most common are NbTi and Nb3 Sn wires whose working temperature is less than 4 K. Newer material is of the HTS type (BiSCCO) whose critical temperature is about 100 K. Considering only the cooling aspect, this latter material is easier to use than NbTi or Nb3 Sn, as only liquid nitrogen (LN2 ) is needed to cool down such type of coil, while liquid He (LHe) and LN2 are needed for LTS coils. However, several ECRISs now exist in which LTS coils are cooled down only by thermal conductivity, that is, without LHe (Kurita et al., 2000). Considering the magnetic aspect, as LTS coils are much easier to fabricate than HTS ones, they offer a wider range of coil size and then they can provide a larger axial magnetic field. Table 11 presents the axial magnetic field of an LTS system, SHIVA (Kurita et al., 2000), and an HTS one (Kanjilal et al., 2005, 2006). Another drawback of today HTS coils is their limited inner diameter, r, which is not compatible with the necessity to tune the minimum-B. As presented in the previous section, at least one central coil is very useful for ion source optimization. And this central coil is placed around the permanent magnet hexapole (see, TABLE 11 M AGNETIC C HARACTERISTICS OF T WO ECRIS S U SING D IFFERENT T YPES OF S UPERCONDUCTING W IRES Magnetic field
SHIVA (14 GHz) LTS coils
PKDELIS (18 GHz) HTS coils
Injection: Binj (T) Minimum: Bmin (T) Extraction: Bext (T)
3T 0.5 T 2T
1.8 T 0.6 T 1.5 T
100
HITZ
TABLE 12 C HARACTERISTICS OF C ARPE D IEM AND C OMPARISON WITH OTHER T YPES OF ECRIS SERSE
GTS
Carpe Diem
14–18 GHz
14–18 GHz
18–24 GHz
superconducting coils superconducting coils
copper coils permanent magnets
superconducting coils permanent magnets
Resonance length
14 GHz: 50 mm 18 GHz: 65 mm
14 GHz: 95 mm 18 GHz: 145 mm
18 GHz: 70–160 mm 24 GHz: 90–233 mm
Plasma chamber
L: 500 mm ∅: 130 mm
L: 300 mm ∅: 80 mm
L: 600 mm ∅: 100 mm
Frequency Magnetic characteristics
axial field radial field
for example, Figure 93). Therefore, when using this HTS technique, the ion source designer has to keep Bmin constant or find a way to tune this minimumB. And here the choice is limited: either LTS or copper coils, both choices losing the advantages of HTS devices. In the following, we are going to consider a compact ECRIS dedicated to the production of very high charge states. To do so, a long plasma chamber is chosen. A proposed configuration is presented in Table 12 and compared with two other ECRISs, Serse (Ludwig et al., 1998) and GTS presented above. This new ECRIS, called Carpe Diem (CARactérisée Pour Etre Dotée d’une Ionisation Efficace d’Ions Multichargés) was studied to be installed on a high voltage platform (Hitz et al., 2005b). This is a long source with a rather large plasma chamber compatible with an available radial magnetic field. Table 12 presents its general characteristics compared with a room temperature and a superconducting source. In the following, general aspects of this ion source are presented as a design example and more details can be found in Sun (2004) and Hitz et al. (2005b). 1. Mirror Field a. General Presentation To obtain a long resonance zone and to have good control of the minimum-B, the axial magnetic field is created by four coils as presented in Figure 93. This configuration allows an additional tuning, which is a possible control of electron temperature. As presented in Figure 94, the magnetic field gradient at resonance is another key factor in ECRIS. This defines the time spent by the electrons in the resonance zone. As they get kicks of energy (positive or negative), each time they cross the resonance, chances to let them accelerate are larger for
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 93.
101
Magnetic field configuration of Carpe Diem.
F IGURE 94.
Bz distribution along the source axis.
weak gradients. And then, changing the Bmin value leads to a variation of this gradient. To do so, two central coils are supplied with a counter current as compared to the other two coils located at the injection and extraction sides. All coils are made of NbTi, which is the same conductor utilized in Serse. They are surrounded by an iron yoke that also serves as the cryostat envelope. Soft iron plugs are set to reinforce the axial field at mirror throats. A plot of the axial magnetic field profile is shown in Figure 94 for the following current densities:
102
HITZ
Solenoid A (injection): 96 A/mm2 , Solenoid B (M1): −70 A/mm2 , Solenoid B (M2): −70 A/mm2 , Solenoid C (extraction): 115 A/mm2 . b. Safety Margin To avoid any quench problem, the safety margin of different coils has to be carefully calculated. As an example, working line conditions are estimated for the following coil current densities: 1. Solenoid A: 96 A/mm2 , 2. Solenoids B (M1 and M2): −100 A/mm2 , 3. Solenoid C: 115 A/mm2 . To estimate the safety margin against any quench in the coils (with the above currents), a series of working lines can be drawn as shown Figure 95. For the axial magnetic field profile shown in Figure 94, the safety margin is at least 20% at 4.5 K. Under these conditions, the main parameters related to the axial magnetic field are presented in Table 13. The resonance length indicated in this table is a function of the minimum-B value.
F IGURE 95.
Working conditions for the three types of solenoids at two different temperatures.
103
ELECTRON CYCLOTRON RESONANCE ION SOURCES TABLE 13 T YPICAL DATA OF THE A XIAL M AGNETIC F IELD OF C ARPE D IEM Source parameter
Typical data
Binj maximum (T) Bext maximum (T) Distance between mirror throats (mm) 18 GHz Bmin (T) 24 GHz Resonance length 24 GHz (mm) 18 GHz
3.2 2.2 ∼640 0.45∼0.55 0.6∼0.8 245∼220 162.5∼122.5
190∼87.5 70
One safe solution proposed for the solenoids of Carpe Diem is presented in Table 14. On the other hand, as four coils can be powered separately, it is necessary to examine different situations and check the electromagnetic forces encountered. Therefore, six different modes have been studied and are presented in Table 15. The resulting axial magnetic profiles are shown in Figure 96. TABLE 14 N B T I S OLENOID C HARACTERISTICS Solenoid
A
B (M1)
B (M2)
C
Radius of winding bore (mm) Radial thickness of winding (mm) Winding length (mm) Overall current density (A/mm2 ) Bmax on winding (T) {contribution from external iron (T)}
225 83 160 96 5.3 (0.3)
225 55 45 −100 2.7 (0.1)
225 55 45 −100 2.7 (0.1)
225 70 85 115 4.6 (0.2)
TABLE 15 D IFFERENT O PERATING M ODES OF THE F OUR C OILS Mode
A B C D E F
Binj (T)
Max Max Max Max 2.5 2.5
B min (T)
0.45 0.55 0.60 0.80 0.45 0.55
B ext (T)
1.6 1.6 2.0 2.0 1.6 1.6
Solenoid A (A/mm2 )
96.00 96.00 96.00 96.00 76.00 76.00
Solenoid B (A/mm2 )
Solenoid C (A/mm2 )
M1
M2
−86.00 −78.00 −84.00 −63.50 −70.50 −60.50
−86.00 −78.00 −84.00 −63.50 −70.50 −60.50
80.00 80.00 98.00 98.00 75.00 75.00
104
HITZ
F IGURE 96.
Axial magnetic profiles for six different coil currents.
Regarding operating mode A, an estimation of the unbalanced forces is presented in Table 16. The axial unbalanced force in this condition is 6.5 tons maximum with orientation toward the injection side. For mode D, Table 17 shows that this is an interesting mode as the maximum force is 0.8 ton toward the injection side. TABLE 16 U NBALANCED F ORCES FOR M ODE A Solenoid
F x (kN)
A B C Total
M1 M2
F y (kN)
Axial force F z (kN)
8.11
−2.08
−437.64
−1.03 0.07
1.62 0.43
248.52 −177.84
−1.97
3.17
301.81
5.18
3.15
−65.16
ELECTRON CYCLOTRON RESONANCE ION SOURCES
105
TABLE 17 U NBALANCED F ORCES FOR M ODE D Solenoid
F x (kN)
A B
M1 M2
C Total
F y (kN)
Axial force F z (kN)
8.09
−1.90
−336.74
0.54 0.03
0.94 0.30
167.14 −153.87
−2.95
4.63
315.40
4.0
−8.06
4.63
Similar calculations for other modes would show that only operating modes C, D, E, and F are safe and can ensure a source operation without any quench. 2. Hexapolar Field To reduce the ion source size, too large a cryostat is prohibited, and it is sometimes necessary to avoid a superconducting hexapolar structure, even if this is useful for good source tuning. Therefore a permanent magnet hexapole has been designed while taking into account all possible irreversible losses. In fact, because of strong magnetic fields encountered in such ion sources, strong local demagnetizing fields may appear at some locations of the hexapole, acting at various inclinations up to 90◦ with respect to the magnetization direction. If the demagnetizing field strength is comparable to the magnet coercivity, irreversible losses come out. To overcome this difficulty, different magnet grades are utilized to ensure that coercivity is always larger than the absolute value of the demagnetizing field. Therefore the ratio Hz /Htot has to be investigated where Hz is the axial magnetic field and Htot is the total magnetic field inside each hexapole magnet. The sketch of a possible hexapole is shown in Figure 97. Six different magnet grades are utilized whose magnetic properties are presented in Table 18 and the hexapole itself is made of six elementary regions (H1, . . . , H6). a. Region H1 This zone is situated close to the injection solenoid and undergoes a strong demagnetization field. A high coercivity magnetic quality is then strongly necessary, like material 4. In addition, the radial magnetic field in this source extremity cannot be too strong.
106
HITZ
F IGURE 97.
Hexapole for Carpe Diem.
TABLE 18 M AGNETIC P ROPERTIES OF D IFFERENT G RADES OF N D F E B P ERMANENT M AGNETS
Remanence Br (T) Coercivity (kA/m)
HcB HcJ
Material 1
Material 2
Material 3
Material 4
Material 5
1.44
1.35
1.28
1.14
1.18
1115 1195
1040 1430
990 1830
885 2865
915 2465
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 98.
107
Shape of the H2 part of the hexapole.
b. Region H2 As it covers the most important zone of the ion source, it is necessary to maximize as much as possible the radial magnetic field while avoiding any demagnetization. Depending on the Htot value, it is then necessary to utilize the following grades: 1. 2. 3. 4.
grade 1, where Htot < 13.5 kOe, grade 2, where 13.5 kOe < Htot < 16 kOe, grade 3, where 16 kOe < Htot < 20 kOe, grade 5, where Htot > 20 kOe.
The structure of H2 is presented in Figure 98. In this 36-pole configuration, the north poles are # 6, 18, and 30 and the south poles are # 0, 12, and 24. Fortunately, high remanence material can be utilized for those poles as demagnetization fields are rather low. c. Region H3 In this hexapole part, only grade 5 can be utilized to overcome any possible demagnetization. d. Region H4 As the H4 region is close to the injection solenoid, it still suffers from a very strong axial magnetic field induced by this solenoid. This part is located from z = 30 cm to z = 43 cm and radially from r = 16 cm to r = 19.5 cm. The z = 30 cm plan of H4 part is the most sensitive. That is why the Htot field in this plan gives the maximum demagnetization. Calculation indicates that axially, only the 1.5 mm region undergoes a demagnetization
108
HITZ
F IGURE 99.
Shape of the H4 part of the hexapole.
field larger than 34 kOe. And then a grade with Hcj = 36 kOe and Hk0 = 34 Oe can be used in H4. Figure 99 presents the H4 hexapole shape. e. Region H5 In this region, the demagnetization field is larger at the outer surface and grade 4 is needed. f. Region H6 H6 block is very close to the extraction coil, which makes the demagnetization field in H6 very large. But the maximum demagnetization field (as large as 29 kOe) exists only at some parts. The resulting structure design is shown in Figure 100.
F IGURE 100.
Shape of the H6 part of the hexapole.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 101.
109
Angular distribution of radial magnetic field.
g. Region H7 A hexapole with this structure, as discussed above, can give a radial magnetic field as high as 1.45 T at a radius of r = 52 mm and the 360◦ angular distribution of the radial magnetic field is given in Figure 101. Since different geometric sizes and different NdFeB grades have been adopted to design the hexapole, it is also important to know the radial magnetic field at the plasma chamber inner wall along the source axis. This distribution is indicated in Figure 102 and presents a maximum radial field at 1.45 T. 3. Total Magnetic Field From the superposition of the axial and radial magnetic fields, it is possible to draw the equal-B lines, which characterize the confinement properties of the whole structure. As shown in Figure 103, the last closed surface is at 1.5 T. At 18 GHz, the last closed surface is more than twice the resonant field, which gives an excellent confinement. At 24 GHz the last closed surface is approximately 1.7 times the resonance value, which still offers a good confinement for the production of high charges. As shown in Table 19, the magnetic field profile of Carpe Diem is in agreement with the scaling laws presented above. 4. Cryogenic Aspect If the ion source is installed on an HV platform, the cryostat must be autonomous since it is not possible to use any cryogenic fluids from the main storage through metallic insulated pipes. Two techniques are then possible:
110
HITZ
F IGURE 102.
Axial distribution of the radial magnetic field at plasma chamber inner wall.
F IGURE 103.
Total magnetic field contour distribution (B 1.7 T).
1. Liquid circulation: LN2 and LHe dewars can be installed near the cryostat and have to be changed before becoming empty.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
111
TABLE 19 M AGNETIC S CALING L AWS AND C OMPARISON WITH C ARPE D IEM Magnetic scaling laws Binj /Becr Bext /Brad Bmin /Brad Brad /Becr Blast /Becr
∼4 ∼0.9 ∼0.30 to 0.45 ∼2 2
18 GHz
24 GHz
5 1.4 0.31 to 0.38 2.27 2.34
3.7 1.4 0.41 to 0.55 1.69 1.74
TABLE 20 P OSSIBLE C OOLING T ECHNIQUES Cooling technique
Advantages
Drawbacks
Liquid circulation
Simple Low cost manufacturing Autonomous running
High running cost Ion source production stops for any dewar exchange Option: redundancy with liquid circulation High cost manufacturing
Cryocooler
Low running cost
2. Cryocoolers: the cryostat will have an autonomous operation until failure of the cryocooler occurs. Advantages and drawbacks of these two solutions are summarized in Table 20. a. Cryostat Design: PID Figure 104 presents a process integrated diagram (PID) of the cryostat. This cryostat uses liquid helium for coil cooling and liquid nitrogen for thermal shields. To let users choose their own way of refrigeration, it can be operated either with tanks for the supply of the cryogenic fluids or with cryocoolers. Regarding the 4 K cryocoolers, two different technologies are possible: 1. Gifford Mac Mahon (GM) technology. It is simple and reliable, but the most powerful available unit of this type is a 1.5 W at 4.2 K. 2. Gifford Mac Mahon associated with Joule-Thomson technology: a 3.5 W at 4.2 K JT type device is available, but its reliability has still to be proved. For the 77 K cryocoolers, several GM-type devices are possible. The cryostat works with a natural gravity cooling loop. As for LHe, an open circuit LN2 working facility is added for cooling down and cryocooler redundancy.
112
HITZ
F IGURE 104.
PID of the Carpe Diem cryostat.
Another cooling down loop is necessary to cool the coils down from 300 K to about 100 K. b. Cryostat Design: Thermal Aspect The thermal budget of such type of cryostat is summarized in Table 21. As the overall thermal budget at 4.2 K surpasses the ability of any available cryocooler (1.5 W) and as it is difficult to obtain a good estimate of the Xray emission from the plasma, two cryocoolers can be installed to remain autonomous. TABLE 21 C RYOSTAT T HERMAL B UDGET (W) Thermal contribution (W) Mechanical supporting Axial blockage (F = 20 kN/support) Radial supporting Pipes conduction Molecular conduction (P = 1.10−6 mbar) Radiation Current leads (5 with cold diodes) LHe tank neck (+LN2 ) Radiation from plasma (X-rays)
4K 0.28 0.09 0.05 0.05 0.8 0.17 0.2 0.1
Total
1.74
80 K 4.2 3.28 0.061 50 40 12.2
109.7
ELECTRON CYCLOTRON RESONANCE ION SOURCES
113
5. Mechanical Design a. Source Body Carpe Diem is designed according to the well-performing GTS ECRIS and a general presentation is given in Figure 105. b. Magnetic System The hexapole, installed inside the cryostat warm bore, is fixed by two stainless steel pieces. At each side of the hexapole, these pieces also hold an iron plug to enhance the axial magnetic field (see Figure 105). A double wall plasma chamber, made of aluminum, is cooled by pressurized water (5 bars < Pinlet < 10 bars); the cooling system also serves to cool down the plasma electrode. c. First Stage and Plasma Chamber The so-called “first stage” serves several tasks: rf injection, gas feed, metal production, and pumping. Figure 106 shows the plasma chamber connected to the injection tee. For troublefree maintenance, these components must be easily removable from the warm bore. The injection system is a derivative from GTS. If needed, a second iron plug can be installed to enhance the axial magnetic field at the injection side. d. Cryogenic Components Figure 107 presents a drawing of the mechanical assembly of Carpe Diem. Inside the large warm bore are installed the hexapole and plasma chamber.
F IGURE 105.
General shape of Carpe Diem.
114
HITZ
F IGURE 106.
F IGURE 107.
Injection part and plasma chamber.
Drawing of Carpe Diem source + cryostat.
6. Existing Hybrid ECRIS: Some Results The source presented above is not yet fabricated, but shows the general philosophy of the construction of such a type of ECRIS. To now, two kinds of hybrid machines have been in operation: Shiva and Ramses built by RIKEN (Nakagawa et al., 2002a) with a copy in Dubna (Efremov et al., 2006) and PKDELIS (Kanjilal et al., 2005, 2006). To obtain the axial magnetic field, the first two ion sources use NbTi wires cooled at 4 K without LHe, while the last one uses the HTS technology.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
115
TABLE 22 I ON C URRENT FROM T WO ECRIS T YPES
Frequency Ar8+ Xe21+
HTS source (Kanjilal et al., 2006)
Permanent magnet source (Hitz et al., 2005a)
18–28 GHz 732 µA 28 µA
13.5 GHz 500 µA 50 µA
As for the room temperature GTS, the RIKEN group also demonstrated the advantage of having a tunable minimum-B and a long resonance zone and of working in high-B mode (Nakagawa et al., 2002b). On the other hand, PKDELIS, which has a rather small resonance length and no central coil for independent minimum-B tuning, gives quite intense beams of low and medium charge states only. Nevertheless, a simple all-permanent magnet ECRIS can do almost the same thing as shown in Table 22. This table also indicates that an ECRIS works better in a high-B mode at moderate frequency (as the permanent magnet ECRIS in this table) than at high frequency in moderate confinement (as the HTS ECRIS). E. Fully Superconducting ECRIS The ideal solution that may satisfy all conditions to obtain a well-performing ECRIS in terms of charge state and intensity is to realize the magnetic configuration entirely with superconducting wires. The above section describes several possibilities for the axial magnetic field. In this section, the permanent magnet multipole has to be replaced by a superconducting magnet system. The hexapole, which is the most common multipole, is manufactured in a racetrack shape as presented in Figure 63, Figure 64, and Figure 65. Because of its relatively high manufacturing cost, just a few superconducting devices are now in operation. First ECRISs of this type were ISIS built by Beuscher et al. (1986) and a 6.4 GHz machine built at MSU by Antaya et al. (1987), this last source still being in operation. During the 1990s, a more powerful ECRIS, Serse, was constructed at INFN/LNS-Catania (Ciavola and Gammino, 1992). This machine runs very well at 14 and 18 GHz and was utilized for pioneer tests at 28 GHz (Hitz et al., 2002b). The last versions of this type of device are Venus (Leitner et al., 2006) and Secral (Zhao et al., 2005, 2006), both are now breaking world records for beam intensity. For almost 20 years, design principles were almost identical, that is, a racetrack hexapole was inserted inside axial coils. Major progress was achieved as a result of better knowledge in superconducting wiring technology and in ion
116
HITZ
source physics. And Venus can be assumed to be an outcome of this ion source design. Larger sources are nevertheless under design in RIKEN (Nakagawa et al., 2006) and through an European collaboration (Ciavola et al., 2006). However, one major drawback of such a design is the large forces that may interact between the end of the racetrack coils and the axial coils. Therefore, the distance between these coils must be large and that is why the hexapole is much longer than the distance between both axial mirror throats. A way to minimize these forces is to set the hexapole outside the axial coils and a source of this type (Secral) is now running very well at 18 GHz: even if its magnetic configuration is rated for 28 GHz, recent source commissioning was performed at 18 GHz (Zhao et al., 2006; Sun et al., 2006a). But as the hexapolar coil current density is somewhat limited, this limitation implies a maximum source size. For example, the Secral plasma chamber size is similar to that of Serse, but smaller than Venus. F. Discussion Whatever the technology that is going to be utilized, it is important to examine the ion source efficiency. And questions that may arise at the beginning of any source design must be related to the agreement between microwave frequency, magnetic system, and the place where the ion source is going to be installed (high voltage platform or not). Generally speaking, up to 14 GHz, permanent magnet ECRISs are now quite powerful and room temperature ECRISs can be utilized for more source flexibility and larger intensities of high charge states. The most common frequency range is between 14 and 18 GHz and this is the domain of ECRISs made of permanent magnets and copper coils if the source is installed at ground potential. For use on a high voltage platform, superconducting coils are easier to manage as their electrical consumption is quite small. In that case, the choice of material has to be based on the desired source size and magnetic field profile. A LHe free machine is ideal for such a situation, provided the time needed to cool down this device is not a problem for the user. A helium flow would cool down the source more rapidly, but leads to a more complicated system. 1. Microwave Power Apart from the choice of the magnetic system, which also has an economical aspect, the second factor that has to be carefully studied is the microwave coupling into the plasma chamber. Figure 108 compares, for example, three different types of ECRIS: all-permanent magnet (Sophie), room temperature (GTS), and fully superconducting (Venus 18 GHz). All these ion sources have the same type of microwave coupling, which is done with a rectangular
ELECTRON CYCLOTRON RESONANCE ION SOURCES
117
F IGURE 108. Beam intensity evolution versus rf power for four different ECRISs. Data are taken from Zhao et al. (2006) and Sun et al. (2006a) for Secral, Lyneis et al. (2004) for Venus, Hitz et al. (2002c) for GTS and Hitz et al. (2005a) for all-permanent magnet Sophie.
waveguide in the TE10 mode. This figure shows the evolution of O6+ intensity with rf power. Sophie, whose plasma chamber volume is 0.3 liter, reaches saturation at about 500 W. On the other hand, GTS, whose plasma chamber volume is 1.5 liters, shows an intensity saturation at about 1.5 kW. Finally, Venus has a plasma chamber volume of 9 liters and does not present any saturation. Such beam intensity evolution may have a better representation if the rf power per volume unit is considered, as done in Figure 109. This figure indicates that O6+ intensity reaches a plateau above 1 kW/liter for GTS and 1.7 kW/liter for Sophie. This would mean that at least 5 to 9 kW of rf power are necessary for Secral or Venus to reach a plateau at about 4–5 mA! Unfortunately, this cannot be verified, as the maximum power available at this frequency is about 2.5 kW, otherwise this would necessitate installing several microwave transmitters (however, SECRAL is already equipped with 2 microwave generators) and cooling down the plasma chamber accordingly. But, this simple way of thinking is not very accurate, as it is necessary to take into account the plasma size, where microwaves are normally absorbed. It is also necessary to take into account all possible losses that may appear between the microwave generator and the main part of the plasma chamber. A possible way to measure the real rf power that enters into the plasma chamber is to measure the rise in temperature of the plasma chamber cooling water as proposed by Higurashi et al. (2004).
118
HITZ
F IGURE 109. Same as Figure 108; O6+ evolution is shown as a function of rf power per volume unit (SC stands for superconducting).
However, all ion sources presented in Figures 108 and 109 are built according to magnetic scaling laws. They are different, not only in size, but also in the way microwaves are launched into the ion source. For example, Secral, Venus 18 GHz, GTS, and Sophie utilize a rectangular waveguide in the TE10 mode, while Venus 28 GHz utilizes a circular waveguide in the TE01 mode; this is because of the output mode from the 28 GHz gyrotron. As already presented in Figure 21, TE01 is circular polarized and has a rather flat power distribution profile, while the TE11 or HE11 mode is linear polarized and has a high power density gaussian beam profile. Microwave coupling to ECR plasma is more likely efficient with this latter mode. And to get one of these modes, once conversion into TE01 is done, a 90◦ angle waveguide can convert into TM11 , and then a small straight waveguide can convert this latter mode into HE11 , whose nearly Gaussian beam intensity distribution can efficiently be guided through corrugated waveguides over long distances. Another way is to convert the TE01 mode into TE11 , then into HE11 as shown by Figure 110 where both electric field lines and rf power distribution are represented (Marcuvitz, 1951). The crucial importance of an optimum magnetic confinement is also shown in Figure 111 where three kinds of sources are considered. Phoenix emphasizes a large rf power at high frequency and a rather weak confinement (Thuillier et al., 2005), GTS emphasizes a strong confinement with a moderate rf power at a lower frequency (Hitz et al., 2002a) and Venus uses at the same time a strong confinement, high frequency, and greater power. In all
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 110.
119
Conversion process from TE02 into HE11 mode.
F IGURE 111. Charge state distribution obtained by three different ECRISs: Phoenix 28 GHz (Thuillier et al., 2005), Venus 28 GHz (Leitner et al., 2005), and GTS 14 GHz (Hitz et al., 2003).
cases, source tuning is done on a charge of about 24+ for Bi and Pb. As these elements are neighbors in the periodic table of elements (A and Z are, respectively 208, 82 for Pb and 209, 83 for Bi), CSDs can be compared and source parameters are indicated in Table 23. Generally speaking, one source
120
HITZ
TABLE 23 S OURCE PARAMETERS FOR THE P RODUCTION OF M EDIUM C HARGES OF M ETALLIC E LEMENTS
Frequency rf power Extraction voltage Injection mirror ratio (Binj /Becr ) Radial mirror ratio (Brad /Becr )
Phoenix
GTS
28 GHz 1.5 kW 40 kV 1.6 1.5
14.5 GHz 0.8 kW 20 kV 5 2.2
emphasizes high power at high frequency launched into a weakly magnetized plasma, while the second source follows the magnetic scaling laws. Figure 111 shows that two different kinds of sources can achieve almost similar performances for medium charge states. Up to a certain limit, more rf power can compensate for a weaker confinement. Regarding higher charge states, things are totally different as good confinement is needed. To reach these high charges while maintaining a moderate confinement, the use of much larger frequencies (above 50 GHz) and greater rf power becomes necessary. In contrast, when confinement is optimum, large intensities and large charge states are achievable as presented in Figure 111 (Leitner et al., 2005). Similar behavior can be noted for lighter elements. For example, three ECRISs have been tested for O6+ . First, Caprice 14.5 GHz (Hitz et al., 1996), then Venus 18–28 GHz (Leitner et al., 2005; Leitner, 2006), and Phoenix 28 GHz (Thuillier et al., 2005). A 1 mA level of O6+ is reached with two different methods: 1 kW was necessary to produce 1.1 mA at 20 kV extraction with a very good confinement (Caprice), while Phoenix needs 1.8 kW to obtain 1 mA at 60 kV. In addition, Figure 112 shows that Phoenix needs more He than Caprice: the ratio I(O6+ )/I(He+ + O4+ ) is 1.35 for Caprice and 0.63 for Phoenix. If the purpose of mixing gas is to decrease ion temperature (Drentje et al., 2000), the greater power launched in Phoenix may lead to a small increase in ion temperature. Actually, the frequency repartition of the energy electron-ions as given by Huba (1994) is q 2 3.210−9 Ln Λei q ni q υeq e→i = (s−1 , eV, cm−3 ), (53) 3/2 Ai Te i
q
where Ai is the mass number of ion species I and ni is the ion density of charge q of species i. This formula indicates that the time needed to heat the ions is about 1 s; however, too large power per volume unit starts to heat the ions faster and then a larger quantity of lighter elements such as He becomes necessary to carry
ELECTRON CYCLOTRON RESONANCE ION SOURCES
121
F IGURE 112. Oxygen charge state distribution given by three different ECRISs tuned on O6+ . Charge 4+ is mixed with He+ and charge 8+ is mainly mixed with He2+ .
out ion energy. Regarding the fully superconducting ECRIS Venus, it reaches the intensity level obtained by GTS (Hitz et al., 2002c); this is a result of its optimized confinement. In addition, the ratio I(O6+ )/I(He+ + O4+ ) is 1.54, which means that the buffer gas quantity is smaller with Venus than with other sources: 5.5 kW was needed to reach 2 mA, which corresponds anyhow to a rather small microwave power per volume unit. During a source design, another important parameter is the cooling capacity of the plasma chamber. As part of the energy content of the plasma is diffused toward the chamber wall, the large rf power involves very efficient cooling. Generally, below 2 kW of incident rf power, basic plasma chamber cooling is sufficient: this could be done by a water flow between the two walls and the plasma chamber is usually made of an axisymmetrical double wall. As an example, let us consider the plasma chamber cooling system of GTS. This double-walled chamber is made of aluminum; its inner and outer walls are 2 mm thick and there is fluid circulation between these walls. The cooling water enters the lower half of the chamber, flows along the whole chamber length, and comes back to the upper half of the chamber as shown in Figure 113. Input water pressure is about 2 bars. Simulation has been done to estimate the fluid temperature for a total microwave power of 4 kW. As for an ECRIS, this power is distributed in a well-known three-star impact rotated by 60◦ between injection and extraction regions.
122
HITZ
F IGURE 113.
Cross section of GTS plasma chamber.
Figure 114 presents two parts of the plasma chamber (injection and extraction) showing a star-shaped impact (Vallcorba, 2003). It shows that with 4 kW, the water temperature can reach 500 K; the temperature of the outer wall can be as high as 480 K. Under 2 bars of pressure, the water evaporation temperature is at 394 K. This means that with such microwave power, the
F IGURE 114. P = 2 bars.
Plasma chamber temperature for 2 kW input microwave power and water
ELECTRON CYCLOTRON RESONANCE ION SOURCES
123
cooling fluid is in two phases. In addition, this situation could be very critical since permanent magnets are situated very close to this wall. Such a plasma chamber configuration is not recommended above 2 kW, where turbulent flow is recommended; and, to now, optimized ECRIS cooling systems can support up to 6 kW of incident power. 2. Pulsed ECRIS Depending on the accelerator type to which the ion source is connected, an ECRIS could be asked to work in a pulsed mode. To do so, two main solutions are possible. First, the axial magnetic field can be pulsed as tried by Mühle et al. (1995). In this experiment, a pulsed coil was added to a standard ECRIS at the extraction side of the source. Despite large forces involved during this coil pulsation, this system showed enhancement in beam intensity thanks to strong confinement reduction at extraction side. A pulsed ECRIS can also be obtained by pulsing the microwaves. This is a simpler method, if the microwave generator can be quickly pulsed. Running an ECRIS in pulsed mode was, at the beginning of the ECRIS story, a way to avoid plasma chamber overheating [see, for example, Figure 115 presented by Geller et al. (1980)]. Figure 115 is a good demonstration of the step-by-step ionization process: when the rf is switched on, low charge states immediately appear; their intensity then decreases as these low charge state ions are progressively stripped. Then to obtain nitrogen nuclei, several milliseconds are necessary. Such pulses presented above can be obtained by all ECRISs. Furthermore, depending on the source conditions, mostly gas pressure and rf power, the
F IGURE 115. Micromafios worked in pulsed mode to overcome plasma chamber cooling difficulty. This figure clearly shows the time evolution of different charge states.
124
HITZ
F IGURE 116.
Xe20+ pulse given by Serse 28 GHz.
F IGURE 117.
Xe27+ pulse given by Serse 28 GHz.
pulse shape can vary. It can be smooth as in Figure 116, where beam intensity smoothly increases up to saturation. Or it can present two bumps: one at the beginning and one at the end of the rf pulse as shown in Figure 117. When the rf power is switched on, Xe27+ takes some time to reach its maximum intensity. This increasing time depending on source conditions. Then the current drops because of too high power injected into this source. Ions are either well trapped in the machine or stripped to other charges. At the end of the rf pulse, microwave power no longer confines the electrons, which are then suddenly released. To maintain electroneutrality, ions also escape from the source in a so-called afterglow mode. This afterglow mode is now commonly utilized by pulsed accelerators as a way to enhance source performances. Usually, synchrotrons take a
ELECTRON CYCLOTRON RESONANCE ION SOURCES
125
F IGURE 118. Lead charge state distributions during afterglow given by Phoenix 28 GHz (Sortais et al., 2004) and GTS 14 GHz.
small fraction of beam during time: for example, CERN-LHC will take the Pb27+ beam from the ion source in about 200 µs at a 5 or 10 Hz repetition rate. For this particular situation, the ion “sourcery” is mainly dedicated to beam optimization during this small period of time. Similar remarks can be made for a charge state distribution in afterglow mode as for cw mode. Figure 118 also compares two kinds of ECRISs, Phoenix 28 GHz and GTS II. For optimization on charge ∼28+, because of limitations due to plasma instabilities resulting from the large rf power injected into the source, Phoenix presents a charge state distribution that peaks on charges 24–25+ at maximum; however, thanks to its confinement, GTS’s charge state distribution is centered on the desired charge. In addition, this latter CSD is much narrower than Phoenix’s, because of a narrower electron density distribution function. This fact is very important as, usually, an ion source has to produce one charge state at once. However, it is important to note that for any operation in pulsed mode, all source components must be adapted to this operation. First, as previously stated, the time response of the microwave generator must be fast, with typical rise and fall times of 20 µs. A rapid rise time is needed to produce low or medium charges as fast as possible (the first bump in Figure 117) and a rapid fall time is needed during the afterglow. In addition, during pulses, fast changes in ion current occur and the power supply must accept these quick
126
HITZ
F IGURE 119. Upper figure: O6+ pulse during afterglow (microwave power is stopped at 2 ms). Lower figure: output voltage provided by a standard high-voltage power supply.
changes. For example, Figure 119 presents an afterglow profile when O6+ is extracted from GTS-ECRIS. In this example, microwave power is stopped at horizontal scale of 2 ms. Then, O6+ intensity rises from 300 µA to about 700 µA. Even if this afterglow peak is obtained at low rf power, there is a fast change of total extracted current. This leads to a rapid drain increase that has to be supported by the high voltage supply. Figure 119 also shows that a standard high-voltage supply may not be able to correctly handle this sudden drain increase. In this case, there could be a voltage fluctuation of up to 10%. However, if the ion beam is taken by the accelerator in only 200 µs, this high-voltage supply imperfection is not a problem, as during the required time, the output voltage is stable. But to analyze the beam profile during the whole afterglow, this difficulty must be overcome. Actually, a change in output voltage leads to a change in beam position during its travel through the analyzing magnet situated after the ion source. And then, the afterglow profile of O6+ has to be corrected by the response of the high-voltage supply.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
127
Another feature that has to be pointed out for an operation in pulsed mode is that several ionization processes may be involved. As shown in the above figures, the main ionization is done step by step: to obtain O6+ , it is first necessary to produce O+ , then O2+ , etc. Another way to obatin O6+ , for example, is to use the shake off process, where an electron collision with neutral atoms immediately gives the required charge state. This process has anyhow a much lower cross section (about 10−3 –10−4 times) than the stepby-step one. And, to compensate for this drawback, an ion source using this process would need several kilowatts of rf power at a very high frequency. As shown by Eq. (13), there is a duality between electron density and electron energy: very dense plasma could be weakly energetic and weakly dense plasma could be very energetic. A compromise is to run the ion source at as large a microwave frequency as possible, compatible with magnetic confinement. For example, when an ion source is equipped only with an axial confinement, to compensate for the lack of radial confinement, much rf power can be quickly launched into the source within a short time. High electron densities are effectively achieved at 37 GHz with 100 kW of rf power. However, as confinement is weak, this ion source mostly utilizes the shake off process and only low charge states are obtained. For example, Skalyga and Zorin (2005) presented intense beams of nitrogen ions (N+ , N2+ , and N3+ ) with a CSD peaked on N2+ . To reach higher charge states, with a CSD peaked on N3+ , they plan to build a source at 75 GHz and 400 kW of rf power within 10 µs. 3. Charge State Distribution Figure 120 presents typical argon charge state distributions of the GTS-ECRIS at 18 GHz. It is important to notice that, to produce Ar8+ , 27% of the total extracted beam is effectively on this charge state, while only 0.5% of Ar17+ is seen in the whole ion beam even for an optimization on this charge. This is by far the major drawback of this type of ion source compared for example with Electron Beam Ion Source (EBIS). In an EBIS, electron energy is well defined as electrons come from a gun, while in an ECRIS, electrons are not monoenergetic and one always talks about electron distribution function. In an ECRIS, one gets at the same time, cold, warm, and hot electrons. Considering the example of argon ions, ionization potentials are about 150 eV for Ar7+ and about 4 keV for Ar16+ . Main changes between all CSDs presented in Figure 120 come from a decrease of injected neutral argon and rf power increase (by a factor two between) Ar8+ and Ar17+ . Increasing the rf power raises the electron density, but reducing the argon quantity obviously reduces all argon ion intensities. Even if the EDF is not Maxwellian, it nevertheless presents a maximum which seems to be optimized when the ion source is tuned on Ar8+ : this charge state is at the maximum of the CSD.
128
HITZ
F IGURE 120. Charge state distributions given by GTS 18 GHz for different optimizations. Arrow indicates the optimized charge and percentage is the relative intensity of this charge among all ions extracted from the plasma.
Things are different for Ar12+ and become worse for higher charge states: maximum of the CSD hardly goes to 13+ or 14+ and intensities rapidly decrease, since argon valve is almost closed to produce as much Ar17+ as possible. Tricks are then utilized: mixing with lighter gas, LaB6 cathodes (Xie, 1998), etc. As shown previously, adding another element such as oxygen leads to the production of high currents of unwanted ions (O+ , O2+ , etc.) which enlarges the overall extracted beam current. Nevertheless, using an electron gun would necessitate placing this gun inside the plasma chamber in such a
ELECTRON CYCLOTRON RESONANCE ION SOURCES
129
way that electrons would follow the magnetic field lines and would not immediately go to the wall. One possible way to inject cold electrons is to install a helicon at the plasma chamber entrance on the source axis. This cathodeless gun would avoid the limited lifetime of any cathode (about 5000 hours according to Chang and Sze, 1997). Softer methods use liners covering the plasma chamber wall, these tubes being made of aluminum (Nakagawa et al., 1996), quartz, MgO, metal dielectric (Schachter et al., 1998), that is, anything having high secondary electron emission. Hence arises the question about this technique lifetime for long-term operation. In addition, for ion sources dedicated to carbon therapy, this liner is not useful at all as it would be rapidly covered with carbon neutrals. Apart from these useful methods to enhance ECRIS efficiency, another possibility would be to optimize the energy of electrons which are already inside the ion source (from the wall and from all ionizations). Several bremstrahlung measurements showed that a lot of electrons may reach energy of several hundreds of keV, even MeV. This is for example, the case when the ion source presents a minimum-B magnetic field (flat or not) close to the resonance value (Alton and Smithe, 1995; Lyneis et al., 2006) or when its magnetic gradient at resonance is weak. These electrons do not efficiently contribute to the ionization process as, to produce Ar17+ for example, optimum electron energy is about 15 keV. Therefore, it would be useful to slow down these electrons which are very well confined. One possible technique is to place some obstacles in their way. Of course, such electron temperature limiters as shown in Figure 121 must experimentally be placed in such a way that they can intercept some hot electrons without being melted. This figure presents two types of limiter for an ECRIS equipped with hexapole, but one can easily imagine several limiters with different sizes
F IGURE 121.
Electron temperature limiters.
130
HITZ
placed at the plasma edge. In that case, hot electrons impinging on these limiters may also produce slow secondary electrons. 4. Ion Beam Shape Among all ECRISs that are running all around the world, more than 90% utilize a hexapole for the radial confinement. As a consequence, the resulting plasma shape gives a three-branch-star at both source sides as shown in Figure 122. If this is not a problem at injection side, this shape is a problem for the ion extraction. Indeed, as a first approximation it could be thought that, as the extraction aperture is included inside the star, the ion beam would be homogeneous. But Figure 123 shows that this is not the case. Some beam images have been taken with a CCD camera connected on the beam line of a 10 GHz Caprice ECRIS (Hitz et al., 2004a). It is now established that multiply charged ions are produced on the source axis, where the hottest plasma is, while medium charges are mostly out of axis. This may explain the difference in beam profile between O5+ and O7+ . O5+ is more influenced by the nonsymmetric radial magnetic field and gives a beam with three components correlated with the three branches of the magnetic field lines. As the hexapolar configuration for the radial field is up to now, the best compromise in term of particle drift inside the plasma and magnetic field strength as well, it is of primary importance to compensate for this beam defect. Two solutions are possible, either before or after the extraction electrode. Finding a solution after the extraction electrode means installing for example an active correction achieved by two hexapole lenses, horizontal and vertical, as proposed by Spädtke et al. (2005).
F IGURE 122.
Magnetic field lines and plasma shape.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 123.
131
Beam images of various ions extracted from a 10 GHz Caprice ECRIS.
Finding a solution before the extraction electrode means giving to the plasma a cylindrical shape. This can only be achieved in minimizing the influence of the hexapole in the vicinity of the extraction. Attempts were made consisted of using a plasma electrode partially made of iron. Indeed, this small iron piece short-circuited part of the hexapole situated at the extraction region. This was not dramatic for the plasma confinement and gave encouraging results (Xie, 1998; Drake and Schmelzbach, 2002). For example, experiments performed at the 10 GHz Caprice source installed at Paul Scherrer Institut, showed that a small iron piece inserted inside the source dramatically reduced the beam emittance of Ne6+ (Figure 124). However, instead of making a magnetic short-circuit, another solution would consist of having a radial magnetic system with hexapolar shape where a strong confinement is needed (at source center) and dodecapolar shape nearby the extraction. With permanent magnet assembled in Halbach configuration, this seems not to be a problem: 24 or 36 poles constituting a hexapole can also make a dodecapole (Figure 125). Of course, things are more complicated for a fully superconducting device and the simpler way is to optimize the extraction system to compensate for any beam inhomogeneity.
132
HITZ
F IGURE 124. Paul Scherrer Institut Caprice 10 GHz. Emittance pattern of a 20 Ne6+ beam at an extraction voltage of 4 kV. Left: normal magnetic configuration. Right: the hexapole field at the source extraction is attenuated by an iron ring.
F IGURE 125.
From hexapole to dodecapole.
5. Plasma Electrode Position It has been experimentally proved that plasma electrode position inside the plasma chamber is not exactly the same to extract highly charged ions like Ar17+ or medium charge like Ar8+ . Jacquot and Pontonnier (1991) noted that “an optimization of medium charge states requires that the extraction orifice penetrates the harmonic resonator.” More recently, Higurashi et al. (2003) presented similar results where, to produce medium charge states, the plasma electrode must be pushed closer to the plasma than for highly charged ions. These authors suppose that the plasma electrode position affects not only the plasma confinement but the plasma density and the beam extraction conditions as well. As ion production in ECRIS is mainly achieved by a step-by-step process, it is natural to think that very high charge states require better confinement than medium charge states. As this confinement is related to the last closed
ELECTRON CYCLOTRON RESONANCE ION SOURCES
133
magnetic surface Blast , it is also natural to place the plasma electrode where Blast is maximum to get intense beams of higher charge states. On the other hand, it is also recognized that higher charge states are situated closer to the source axis than medium charges. As the plasma electrode aperture is fixed, approaching it toward the plasma means that more magnetic flux tubes could be intercepted by the electrode aperture (see Figure 122); and then more medium charges can be captured. As it is difficult to move the plasma electrode when the source runs, one solution consists of using a rather large aperture compatible with the low energy beam transport (LEBT) (pullers, lenses, etc.). For example, if a plasma electrode with a diameter of 12 mm (or more) aperture is employed, intense beams of medium charges could be obtained. Furthermore, this large aperture is also advantageous for very high charges (even if a 6 mm diameter would be enough) as it facilitates an efficient pumping necessary to minimize charge exchange with neutrals. If the LEBT only permits small plasma electrode apertures, then another solution has to be found to optimize this electrode position without opening the ion source. As the difference in position could be less than 10 mm between tunings on Ar8+ and Ar18+ , an elegant way is to keep the plasma chamber in place and slide the whole magnetic system along the plasma chamber as proposed by Drentje et al. (1995). However, this solution by itself keeps constant the distance between plasma electrode and puller; and, as emittances between charge states are different, it would also be useful to tune this distance. Movable extraction pullers are available on the market, but are not very satisfactory because they may seize up when hot. Therefore, to have this additional tuning, instead of moving the puller, it may be better to move the entire ion source with regard to the extraction tank where generally the puller is fixed. Moving an ion source by a few mm seems not to be difficult as the only external connection that has to be examined is the waveguide: even if they have high VSWR, flexible waveguides could be used for this purpose. G. Conclusion Magnetic confinement, heating frequency, microwave coupling, and plasma chamber cooling are the four major parameters of an efficient ECRIS that have to be advantageously combined during any source design. Table 24 gives some solutions that could be considered to get an ion source as flexible as possible. If large power microwave transmitters are available, magnetic confinement can be slightly minimized while reaching good electron density, but use of
134
HITZ TABLE 24 S OME S OLUTIONS FOR A W ELL -P ERFORMING ECRIS
ECRIS parameter
Solution
Magnetic confinement
Binj /Becr ∼4 Bext /Brad ∼1 Bmin /Brad ∼0.30 to 0.45 Brad /Becr ∼2 Blast /Becr 2 Magnetic gradient at resonance: Strong to avoid too hot electrons Resonance zone: As long and large as possible, compatible with above parameters.
Microwave launching
Electric field maximum on source axis rectangular waveguide: TE10 , TE1,n,p circular waveguide: TE11 , HE11 or quasi optical coupling
Beam extraction
Plasma electrode position: Movable magnetic system (axial and radial) for tuning either on medium or high charge states. Distance between plasma electrode and puller: Whole source on rolling carriage to move it from fixed extraction tank.
this method leads to several other difficulties such as instabilities, plasma chamber cooling and X-ray protection. And by far, very good confinement is recommended for any source purpose, either in continuous wave or pulsed mode. Magnetic confinement being now well understood, the major problem that has to be solved in ECRIS remains in the shape of the EDF. Moreover, work has still to be carried out to optimize the microwave coupling efficiency, as presented by Gammino et al. (2006). To finish these general considerations on source design, Table 25 presents a list of today ECRISs classified according to the production of Xe20+ . This species has been chosen since this is the most common charge tested by various sources: it is a medium charge, needing good confinement, but does not need any efficient pumping to avoid charge exchange process. This table shows, for example, the large technical step that must be jumped over to gain a factor ten in intensity (from Sophie to Secral). After these general considerations on source design, the last section presents some possible uses of ECRIS apart from nuclear or atomic physics.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
135
TABLE 25 T OP ’15 H IGH F REQUENCY ECRIS S C LASSIFIED WITH THEIR P ERFORMANCE ON O NE M EDIUM C HARGE S TATE . S OME S OURCES M AY B E M ISSING IN THIS C LASSIFICATION ( E . G . AECR-U), P ROBABLY B ECAUSE N O I NTENSITIES W ERE P UBLISHED FOR THIS C HARGE Source name
Year
Source type
Xe20+ intensity (µA)
Secral 18 GHz (Sun, 2006) Serse 28 GHz (Hitz et al., 2000a) Venus 28 GHz (Leitner et al., 2005) GTS 18 GHz (Hitz et al., 2004a) RIKEN 18 GHz (Nakagawa et al., 1998) GTS 14 GHz (Hitz et al., 2004a) Venus 18 GHz (Leitner and Lyneis, 2005) LECR3 14 + 10 GHz (Sun et al., 2005) Serse 18 GHz (Gammino et al., 1999) PECRIS V 18 GHz (Müller et al., 2002) Caprice 14 GHz (Hitz et al., 1995) Shiva 14 GHz (Kurita et al., 2000) Sophie 13.5 GHz (Meyer et al., 2006) Caprice 10 GHz (Jacquot et al., 1988) PKDELIS 18 GHz (Kanjilal et al., 2005)
2006
fully SC
480
2000
fully SC
390
2005
fully SC
320
2003
RT coils + PM
310
1996
RT coils + PM
300
2003
RT coils + PM
240
2004
fully SC
164
2004
RT coils + PM
160
1997
fully SC
137
2002
RT coils + PM
95
1995
RT coils + PM
80
2000
LTSC coils + PM
56
2004
fully PM
50
1988
RT coils + PM
35
2004
HTSC coils + PM
28 (Xe21+ )
VI. I NDUSTRIAL A PPLICATIONS As ECRIS can be widely used for various applications, this section is not going to discuss all possible uses. First some aspects of ion implantation are presented, and then a possible use of ECR plasma in microelectronics is shown.
136
HITZ
A. Implantation It is well known that ion implantation serves as surface modifications or surface diagnostics. The ion energy range for this application is rather wide: it can be almost zero, when a multiply charged ion approaches a surface without going into it, or it can be several hundreds of keV. Interactions of multiply charged ions with surfaces have been extensively studied (Delaunay et al., 1985; Meyer et al., 1987). For example, when a very slow multiply charged ion arrives close to a metal surface, it takes a great number of electrons from this surface in a very short time (about 10−16 s). As a metal surface is an infinite reservoir of electrons, the multiply charged ion can capture as many electrons as possible up to its complete neutralization. These electrons are captured in very excited states while internal layers are empty. These very excited hollow atoms then go back to their fundamental state by Auger cascade, that is, through spontaneous ionizations (Briand et al., 1990; Briand, 1996). After a large number of captures and self-ionizations, a multiply charged ion can extract a large number of electrons from the metallic surface: for example, an ion of charge 80+ can extract up to 280 electrons. When approaching a surface, a singly charged ion interacts on the surface: it exchanges one electron with one atom of this surface, as in a chemical reaction, whereas a multiply charged ion interacts with the surface without touching it and takes a large number of electrons. If the surface is metallic, as it is an infinite electron reservoir, holes are immediately replaced and there is no permanent modification of this surface. However, when the surface is dielectric (insulator or semi conductor), a multiply charged ion extracts in a very short time a large number of bound electrons. And during this short period, ionized atoms of the surface stay in place and definitely modify the surface structure, either by Coulomb explosion (a large number of atoms are ejected out of the surface) or by spontaneous readjustment of the surface. Coulomb explosion, which is also called potential sputtering is a very selective process as the number of atoms extracted from the surface depends on their binding energy inside this surface. For example, SiO2 can easily be sputtered, whereas this process is more difficult for pure silicon: it is very easy to rapidly clean an oxided silicon wafer with this method, as sputtering will stop by itself as soon as SiO2 disappears. Regarding spontaneous readjustment of the atoms on the surface, it could either lead to a local amorphous crystalline diamond surface, creation of diamond nanocrystals on graphite surface (Meguro et al., 2001) or local surface reconstruction. For example, during interaction between multiply charged ions and Si, dots of some tens of nm can appear on the surface. This method could be an alternative to electron bombardment, where electron beam
ELECTRON CYCLOTRON RESONANCE ION SOURCES
137
size defines the etching resolution, while a fully stripped ion having a size of 10−6 nm can lead to an etching of some nm (Briand et al., 1996). Another example that could be considered is aluminum surface treatment and stabilization. Destabilization of an oxide layer under nitrogen atmosphere is difficult since the nitrogen molecule, with three electronic bindings, is very stable. The dissociation energy of N2 is 9.75 eV, while it is 5.18 eV for O2 . Any atmosphere containing both oxygen and nitrogen will lead to a preferential dissociation of oxygen whatever the temperature. The opposite phenomenon can occur only for atmospheres with a very low percentage of oxygen and/or with high nitrogen pressures. Once nitrogen is dissociated, a nitrogenation reaction (and/or deoxidation) can be achieved. A nitrogenized surface is very stable against oxidation. And implanting atomic nitrogen (hence dissociated) on an AlN surface would enable atomic nitrogen to be close to oxides so that it could be dissociated. In addition, this implantation may increase the mesh size and decrease binding forces between species on surfaces. With energy below 100 keV, it is possible to implant nitrogen on surfaces with a depth of about 300 nm. As an ECRIS can easily produce N5+ or N6+ , a source polarized only at 20 keV can do the job and create a very good AlN layer. In addition, as an ECRIS can produce all nitrogen charges at the same time, it is easy to imagine, for this type of surface treatment, using the whole beams exiting from the source (without any mass separator). In this way, while polarizing the ion source at +20 kV, it is possible to implant nitrogen the ion at 20 keV with N+ , 100 keV with N5+ , and 140 keV with N7+ , giving a large and efficient layer of AlN. Another example of ion implantation is the fabrication of integrated optical systems by implantation of multiply charged ions in silicon. Ion implantation is actually a general method to create optical waveguides in various materials. It consists of the creation of a large layer whose refracting index is smaller than that of the material. Acting as an optical barrier, this layer can confine the light between the surface and the layer itself. This technique is often utilized to fabricate waveguides by weakly charged ions (H+ , He+ , B+ , N+ , O+ , O5+ , C4+ , etc.) in several materials such as Si, AsGa, etc. As a first step in producing a laser microsource with multiply charged ions, it is possible to implant Geq+ on SiO2 . Then, implantation of Er ions would give some active properties to Ge:SiO2 as Er ion emission coincides with the broadcast window at 1.5 µm. Finally, Yb implantation in this system would enlarge absorption cross sections as well as amplification performances. Figure 126 gives an example of Ta19+ implantation on glass: a 75 nm-thick Ta layer is formed at a depth of about 75 nm. A similar layer can be formed with singly charged Ta ions but, in that case, a 400 kV implanter is needed. Use of multiply charged ions would considerably reduce implanter size.
138
HITZ
F IGURE 126.
Glass sample irradiated by Ta19+ at 20 kV extraction, i.e., 380 keV.
Apart from the use of ions given by an ECR ion source, it is also possible to utilize photons, which are also produced by this machine. The next section presents a new concept of a photon lithography machine. B. Photon Lithography Lithography is the fabrication step that determines the size of components utilized in microelectronics. There are several reasons to decrease transistor size and interconnections. This increases the number of transistors per component (memory, processor, etc.). It also increases the functioning speed while decreasing electrical consumption. Lithography can now achieve a node size of about 90 nm and semidense lines of 60 nm (a node corresponds to half pitch). In lithography, steppers are termed 248, 193, 157, and 193i. These numbers give utilized wavelengths in nanometers. 248 has being been under industrial production for several years. 193 is the best one up to now, while 157 is a technology under development, which may be replaced by 193i. This latter technique uses water, which gives n(193) = 1.44. This means that the wavelength is 1.44 smaller, that is, 134 nm. This 193i method may give a node
ELECTRON CYCLOTRON RESONANCE ION SOURCES
139
size of about 45 nm. And now, the question is how to get smaller nodes such as 32, 22, or 16 nm. There is no excimer laser with a wavelength smaller than 130 nm. X-rays have been investigated; however, there were technical problems during mask fabrication. Electron lithography (called e− beam) can give a resolution down to 10 nm; however, it is a very slow technique hardly intended for industry. It actually requires several hours to several days per wafer, while the optical technique delivers 80–100 wafers per hour. Another technique has to be found to reach nodes of 32 nm and within. EUV lithography remains an optical technique requiring a photon source, optical systems, masks, etc.; 13.5 nm can be chosen since mirrors exist at this wavelength. However, compared with other optical techniques, EUV lithography at 13.5 nm presents a major difference, as it necessitates installing the equipment under vacuum. So far, most EUV photon sources considered are either laser produced plasma (LPP), gas discharge plasmas (GDP) or synchrotrons. The following main properties are required for this type of source: 1. Great light power: more than 100 W at 13.5 nm is required at the source output. 2. Small source spot size. 3. Efficiency: it needs a low driving power to minimize costs. It also must minimize the thermal effects to extend optics lifetime. 4. Very good stability. 5. High repetition rate. 6. Last but not least, the source must be clean. LPP, GDP, and synchrotrons work in pulsed mode, which is why a high repetition rate is required to achieve a high production volume. Moreover, LPP and GDP also produce debris that could affect the optics and wafer cleanliness. More information about plasma sources for EUV lithography can be found in Banine and Moors (2004), Stamm (2004), Silverman (2005), Bakshi (2006). Most EUV photons at 13.5 nm come from the decay of excited states of multiply charged ions such as O6+ , Xe10+ or Sn7..10+ . Because it has several charge states able to produce 13.5 nm by decay of excited states, Sn is favored in EUV source research, but its drawback is that it may produce a lot of debris when a laser is shot on a tin target. In the following, the pros and cons offered by ECR plasma in this domain of the EUV light source are given. After a presentation of some orders of magnitudes, some results are shown.
140
HITZ
1. Orders of Magnitude Energy corresponding to 13.5 nm is about 100 eV; therefore an EUV source has to produce electrons having this energy since excitations come from electron impact, which is the case in ECR plasmas. Moreover, ions are not necessarily hot and strong ion confinement is not necessary to produce an intense EUV light. Even if many publications showed that a tokamak produces EUV light, such an expensive machine cannot be used as a tool for lithography. In Section IV, we have shown that ECR plasma produces EUV light. Figure 127 presents, for example, light emitted by the oxygen plasma of the Caprice ECRIS. This shows that high oxygen charge states are able to produce light at 13.5 nm. Let us consider, as an example, the OVI line at 15 nm [1s2 (1 S) 2s – 2 1s (1 S) 3p]. Power emitted by this light can be estimated in the following way: the excited state is mostly produced by electron collision. As an order of magnitude, it can be supposed that this state decays only through this line (which is, of course, not correct). Then power P at 15 nm (i.e., photon energy hν) by a plasma having a volume V , an electron density ne , an ion
F IGURE 127. light.
Multiply charged ions extracted from Caprice source and corresponding EUV
ELECTRON CYCLOTRON RESONANCE ION SOURCES
141
density of OVI ni , and a temperature Te is P = ne ni σ vex (hν)V .
(54)
Excitation cross section and excitation rates can be found in several tables and taking into account the experimental data for a typical ECRIS: ne ∼ 1011 cm−3 ; ni ∼ 1011 cm−3 ; Te ∼ 40–100 eV; V ∼ 1l; hν ∼ 100 eV; σ v ∼ 3 × 10−10 cm3 /s P ∼ 50 mW is obtained. This rough estimation gives a starting point for power that can be emitted by ECR plasma. Even if some approximations have been done (e.g., ne = ni ), such a plasma may be used as an alternative, as shown by Hahto et al. (2005). 2. Experimental Results To verify the above estimation, an existing ECR ion source has been utilized to measure light emitted by its plasma. Actually, the source utilized for this experiment is Sophie as described previously. A sketch of the experiment is shown in Figure 128 and Figure 129. Part of the light emitted by the ECR plasma goes through a collimator, is then reflected on an MoSi multilayer mirror, and arrives at a diode after a path through a final Zr filter that selects only the 13.5 nm wavelength. Of course, in such a simple experiment, only a small fraction of light arrives at the detector. Even if Snq+ ions are known to be much more efficient for the production of 13.5 nm light, Xe10+ ions are employed in this experiment for the sake of simplicity. Figure 130 (Hitz et al., 2004b) presents the power emitted by this source as a function of rf power. It also shows Xe10+ evolution. Both curves
F IGURE 128.
Principle of an ECR photon source.
142
HITZ
F IGURE 129.
Detection of EUV light with SOPHIE: upper half-view of the experiment.
F IGURE 130.
Power emitted by a small permanent magnet ECRIS at 13.5 nm.
are correlated, which proves that the detected light comes from decay of Xe10+ ions. Such a very compact machine cannot be utilized in the microchip
ELECTRON CYCLOTRON RESONANCE ION SOURCES
143
industry; nevertheless, its low cost may allow the use of such a compact machine in other applications such as metrology or EUV microscopes. Compared to other types of EUV light sources, ECR plasma presents major advantages: it is filamentless and therefore can work with no stops and no maintenance for several months. In addition, it works in continuous mode and therefore can increase the fabrication rate. However, one drawback is its plasma size, since lithography tools now requires small spots to irradiate wafers in stepper machines. But instead of sending a small homogeneous EUV spot on a wafer and sweeping it on the whole wafer, why not consider a larger EUV beam (still homogeneous) able to irradiate the whole wafer at once. Let us now see how to produce intense light power at 13.5 nm, still in a small spot, while keeping in mind that any change in species may allow use of other wavelengths if necessary. 3. Powerful ECR Light Source EUV lithography manufacturers plan to produce about 100 wafers per hour, with a wafer size of 300 mm in diameter (12 in). Lithography technique, known as “step and scan,” scans the image of a mask pattern across a portion of the wafer and then moves to a new location to repeat the process. 100 wafers per hour means 36 s per wafer, and considering that the time needed for all positionings is about 27 s, 9 s remain to irradiate the wafer. If the photoresist used for printing the mask pattern onto the wafer has a sensitivity of 5 mJ/cm2 and if the printed surface is 82% of the wafer size, this leads to a photon energy of 2.9 J/wafer during 9 s, that is, 0.321 W. And taking into account all losses due to mirror and mask reflectivity, the required power at the so-called intermediate focus is about 115 W. For the sake of simplicity, let us consider that the required output power is 100 W at the so-called intermediate focus; this means 100 J/s. Energy given by one photon at 13.5 nm is 1.5 × 10−17 J (hc/λ). Within 1 s, the source must deliver 100/1.5 × 10−17 = 6.7 × 1018 photons. A priori, only LPP and GDP can produce such a number of photons since they work at very high density. Let us, however, see how an ECR plasma can achieve this photon quantity. In the above permanent magnet source operated at 14.5 GHz, the electron density is ne ∼ 1011 e− /cm3 . By plasma electroneutrality, ion densities, ni , q+ are such that ne = q ni . Considering Xe10+ , its density is about 10% of ne , i.e. 1010 ions/cm3 . Sophie’s plasma volume is about 50 cm3 . Then in such a source, there could be 5 × 1011 Xe10+ ions. This value is very far from the ∼1019 required photons! But power emitted by the ion source has been measured to be about 100 mW (in 2π sr). This means that SOPHIE gives 6 × 1015 photons/s.
144
HITZ
F IGURE 131.
Argon ions produced by GTS ECRIS.
Actually the difference between the number of Xe10+ ions and the number of emitted photons comes from the fact that in an ECR plasma, ions have long lifetimes. They cross this plasma several times and, therefore, they can be utilized many times to produce photons by electron impact excitation. As shown by Sasaki (2003), the origin of the 13.5 nm line with Xe10+ comes from the 5p–4d transition, whose decay rate is about 1011 s−1 . During its lifetime (about 10 ms), one ion can undergo many excitation/deexcitation processes. This fact is a major advantage of ECR plasma as compared to other processes such as LPP or GDP. In addition, working at lower densities considerably minimizes the debris: it is well known that an ECRIS, if correctly designed, does not produce anything other than the desired ions. Figure 131 is a clear demonstration of ECR plasma cleanliness. Working with metallic elements such as Sn is not a problem: several techniques are now widely utilized to produce metal vapor, depending on the element (oven, sputtering, Mivoc). Now comes the source repetition rate. LPP and GDP are asked to work with a high repetition rate as shown in Figure 132. Then during a short shot, these sources deliver a great number of photons. However, there is some time between the two discharges when no photons are produced at all, even if the decay time of plasmas produced by laser is longer than the shot itself. ECR plasma does not present this drawback as it can work in continuous mode.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 132.
145
Difference between LPP or GDP and ECR plasma in terms of photon production.
Then, even if such plasma produces less photons than LPP and GDP in 1 ns, it can produce the same quantity of photons per second, minimizing thermal effects as well. Therefore, several factors are involved in the ability to produce intense photon power: ECR plasmas 1. Work in continuous wave. 2. As electron and ion densities rise with heating frequency (f ), each density being scaled with f 2 , the number of photons increases with the fourth power of the rf frequency, and a powerful photon source can be 50 GHz or more. A plasma having the same size as Sophie, but created at 50 GHz instead of 13.5 GHz leads to a gain of 200 in density, and then about 20 W of power at 13.5 nm, while using Xe10+ only. 3. Electron density rises with the rf power, nevertheless up to a limit defined by plasma instabilities (Hitz et al., 2000b; Girard et al., 2004). Only 200 W of microwave power was necessary to produce 100 mW of light power at 13.5 nm with Sophie. Today microwave generators can deliver more than 10 kW of power. And then, just a gain of 20 in electron density would lead to 400 W of photon power. Apart from plasma instabilities, the upper microwave power limit would also be defined by the plasma chamber cooling capacity (plasma chamber having large diameter would be more efficiently cooled down than a small size one). At this stage, the required photon power could be achieved by “standard” ECR plasma, but not the photon source size. Fortunately, there are still several parameters with which one can play to fulfill all requirements. 4. As preferential ions that give a 13.5 nm line are medium charged, it is not necessary to obtain a very strong confinement as for fully stripped Ar or Kr. A radial magnetic field of 2.5–3 T at plasma chamber would certainly be enough at 50 GHz. But, such a radial magnetic field configuration would anyhow lead to a rather large plasma diameter. Therefore, to achieve the photon source size requirement, soft iron plugs could locally be added
146
5.
6.
7. 8.
HITZ
inside the plasma chamber, as already done by some ECRISs (Caprice, AECR-U, GTS). As compared to an ECRIS, an ECR photon source must not let ions go out of the plasma. Typically, an ECRIS has an ion leak of about 1014 ions/s and all escaping ions are lost for photon production. Then, an ideal axial magnetic confinement profile would be a high-B mode on both source sides. During the experiment presented above, one photon gave 104 photons. This value can be enhanced if the temperature electron distribution function is centered on the energy that is necessary to produce a high excitation rate by electron–ion collisions. External and internal electron sources (e.g., helicon type) can be added. Oq+ , Xeq+ and Snq+ are known to result in photons at the desired wavelength. For an ECR plasma, it is not difficult to use these elements at the same time and one can imagine a photon source with as much gas and/or metal inputs as wanted.
Even if all the above proposals may not be accumulated, a powerful photon source can be realized with the ECR technique: the challenge would be to find a good compromise between electron temperature and density and photon source size. Figure 133 presents a possible scheme of an EUV source for lithography. As source etendue must be small (about 1 mm in diameter), this dense plasma could be achieved at 37 GHz. In that case, Bmin must be close to the resonance value, for example, 1.3 T. This large value for Bmin allows a high-B mode axial magnetic field on each source direction. To add to this small plasma more species that give photons at 13.5 nm, another resonance is added (50 GHz). As already shown by several ECRISs, this multiple frequency heating works very well as it decreases the electron losses and then the ion losses. Moreover, charges that are involved are rather weak (e.g., Xe10+ ): this means that one would not be limited by any charge exchange process. In addition, as ECR plasmas work with gases, it is easy to install many gas bottles and/or ovens as necessary to create multiple species plasma. Helicon could be installed to provide cold electrons that are necessary for the excitation process. Of course, the above figure is not scaled, but shows that a compact fully superconducting machine could be a good alternative for the lithography industry. Regarding the technology involved, almost everything is proved. Microwave transmitter technology already exists, as pulsed gyrotrons can deliver up to 200 kW during 100 ms at several frequencies (53, 60, 70 GHz, etc.). There would be no problem to make them work in continuous wave mode with an output power of 30–40 kW. Magnet technology can fulfill the required
ELECTRON CYCLOTRON RESONANCE ION SOURCES
F IGURE 133.
147
Principle of a simple ECR EUV photon source (not to scale).
magnetic fields: as shown by Table 7, superconducting NbSn would be the most suitable candidate to achieve about 6 T at both mirror throats (injection and extraction), 1.3 T for Bmin and about 4 T at chamber wall for the radial magnetic field. As needed ions are medium charges, it would not be difficult to minimize the hot electron tail responsible for high doses of X-ray emission: in normal ECRIS conditions, such very high-frequency machines are suitable for very high charge states production (e.g., fully stripped Xe), but increasing the local pressure would shift the charge state distribution on Xe10+ . Major difficulty may arise from plasma chamber cooling as several tens of kW of rf power would be injected in a normal ECRIS. However, here also, no long ion lifetime is needed to produce Xe10+ , and then microwave grids can be inserted inside the plasma chamber to artificially minimize the plasma chamber size and then the amount of microwave power. 4. Other Applications of EUV Sources Compact EUV sources are also necessary to diagnose integrated circuits. Indeed, modern circuits now have grid thickness below 2 nm and now use high permittivity materials, called “high-k.” Moreover, transitions between different bands of these new dielectric materials are situated in the far
148
HITZ
UV domain. As some processes like temperature may lead to a change in optical gap, on-line controls of different layers now become necessary. One diagnostic technique commonly utilized is ellipsometry and a compact all permanent magnet ECR source can deliver enough photons in the EUV range to equip new types of ellipsometers at 60 nm. Other possible uses of small EUV sources can be, for example, outgassing of resists, or EUV microscopes. Even in these domains (ellipsometry, microscope, etc.), EUV source produced by ECR plasma can work in cw mode, which is a great advantage compared to other techniques which are all pulsed. 5. Conclusion To conclude this section, ECR plasma is also a good candidate as a photon source for EUV lithography. To minimize technical problems mainly arising from plasma chamber cooling, 10 kW could now be the maximum rf power. Based on current knowledge of ECRIS, an ideal photon source would be at 37–50 GHz, with a rather large plasma chamber diameter (up to 200 mm) making it possible to install the collector optics. C. Conclusion An entire article could easily be written on the various applications of ECR plasmas for industry. However, this section is intentionally short since this article is dedicated to ion sources and possible ways to improve ECRIS performances. Apart from ion implantation and plasma processing, new possible use of ECR plasma recently arose. Actually, X-rays emitted by this type of plasma can be utilized for food sterilization, for example, but another wavelength range of photons produced by ECR plasmas may also be useful in producing future integrated circuits.
VII. C ONCLUSION Based on this discussion, any source designer could note that ECRISs are simple and complicated at the same time. They are simple since it is only necessary to heat electrons and confine them to produce a lot of collisions with neutrals. Things become more complicated, as it is always necessary to obtain the best compromise between confinement and losses: an ion source has to deliver ions. Even if there has been a lot of work performed on understanding ECRIS, experimentally and theoretically, ECR plasma is still not very well understood. For example, we have seen that a very simple polarized tube or disk may
ELECTRON CYCLOTRON RESONANCE ION SOURCES
149
drastically change the ratio between ion trapping and ion production. Gas mixing is not completely understood; microwave coupling, still very basic, may be improved, but a well-performing ECRIS is mostly a multimode cavity, and any desire to force one mode to be launched is still problematic. ECR plasma spectroscopy performed with several ECRISs illustrates the well-known principle “What You See Is What You Get” with two entirely different ECRISs. This nondestructive diagnostic is a powerful tool to enhance source performance and should be installed on many ECRISs to determine if ions are really produced in the plasma. Magnetic confinement now becomes well understood and many performing ECRISs generally follow well-established magnetic scaling laws. And whatever the budget, a source designer can find an appropriate solution. Nonetheless, even if new techniques appear in magnet technology, it is necessary to keep in mind that the choice of the ECRIS magnetic configuration is the result of a compromise between cost, necessary magnetic field value, and use. It is not necessary to use high or low temperature superconducting technology to provide ion beams that could be produced by all-permanent magnet machines. However, all-permanent magnet ECRISs would certainly not deliver Pb50+ as provided by large superconducting devices. The main difficulty now remains in the microwave coupling and optimization of the electron population in order to increase the ion source efficiency. Finally, ECR plasmas are now widely utilized as ion sources, although other uses have now become possible due to the great progress made in magnet and microwave technology. New smaller integrated circuits may sooner or later appear as a result of a lithography apparatus equipped with an ECR plasma source.
ACKNOWLEDGMENTS This article could not have been written without the outstanding work done during tens of years by the worldwide ECRIS community, to which I give my warmest thanks. It is very difficult to list all the people who directly or indirectly participated in this chapter, but I would like to give a special thanks to Richard Geller, Bernard Jacquot, Samuel Bliman, and Serge Dousson, my first team who gave me the ECRIS virus. I would also like to thank my other colleagues, first of all at CEA-Grenoble, but also in Argonne, Berkeley, Berlin, Bucharest, Caen, Catania, Chiba, College Station, Darmstadt, Debrecen, Dubna, East Lansing, Frankfurt, Geneva, Groningen, Jyväskylä, Lanzhou, Oak Ridge, Moscow, Nizhny Novgorod, Paris, Osaka, Pasadena, Reno, Riken, and Villigen. I also dedicate this contribution to my tender wife,
150
HITZ
Marie-Hélène for her strong support, to my sunbeams Pernelle, Brivaël, and Ombéline, and also to my first son Valentin-Maël. “There are a few easy successes and definitive defeats” Marcel Proust “à la recherche du temps perdu”
R EFERENCES Alton, G.D., Smithe, D.N. (1994). Design studies for an advanced ECR ion source. Rev. Sci. Instrum. 65, 775–787. Alton, G.D., Smithe, D.N. (1995). A single-frequency ECR ion source with a large uniformly distributed resonant plasma volume. In: Proceedings of the 12th International Workshop on ECR Ion Sources (RIKEN, Institute for Nuclear Study, University of Tokyo, Tanashi, Tokyo 188, Japan), INSJ-182, pp. 100–104. Antaya, T., Blosser, H.G., Moskalik, J.M., Nolen, J.A., Zeller, A.F. (1987). The advanced superconducting ECR project at NSCL. In: Proceedings of the International Conference on ECR Ion Sources and their Applications, NSCL Report #MSUCP-47, pp. 312–322. Apard, P., Bliman, S., Geller, R., Jacquot, B., Jacquot, C. (1973). Production of multiply charged xenon ions. Physics Letters A 44, 432–434. Arai, H., Imanaka, M., Lee, S.M., Higurashi, Y., Nakagawa, T., Kidera, M., Kageyama, T., Kase, M., Yano, Y., Aihara, T. (2002). Effect of minimum strength of mirror magnetic field (Bmin ) on production of highly charged heavy ions from RIKEN liquid-He free superconducting electron cyclotron resonance ion source (RAMSES). Nucl. Instrum. Meth. A 491, 9–14. Bakshi, V. (2006). EUV Sources for Lithography. The International Society for Optical Engineering Press, Monograph PM149. Banine, V., Moors, R. (2004). Plasma sources for EUV lithography exposure tools. J. Phys. D: Appl. Phys. 37, 3207–3212. Barué, C., Briand, P., Girard, A., Melin, G., Briffod, G. (1992). Hot electron studies in the Minimafios ECR ion source. Rev. Sci. Instrum. 63, 2844– 2846. Barué, C., Lamoureux, M., Briand, P., Girard, A., Melin, G. (1994). Investigation of hot electrons in electron cyclotron resonance ion sources. J. Appl. Phys. 76, 2662–2670. Bastert, A., Bukow, H.H., von Buttlar, H. (1992). Intensity calibration of vacuum UV spectrometer by using the line ratio method. Applied Optics 31, 6597–6599. Bathia, A.K., Kastner, S.O. (1993). Collision strengths and transition rates for OIII. Atomic Data and Nuclear Data Tables 54, 133–164.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
151
Berreby, R. (1997). Diagnostic de plasmas créés dans des sources d’ions multichargés à résonance cyclotronique électronique par spectroscopie V.U.V. Ph.D. thesis, Université Pierre et Marie Curie, Paris (in French). Beuscher, H., Krauss-Vogt, W., Bräutigam, W., Reich, J., Wucherer, P. (1986). Early performance of the superconducting ISIS ECR source at Jülich. In: Proceedings of the 11th International Conference on Cyclotrons and their Application (Tokyo), pp. 713–716. Bieth, C., Bouly, J.L., Curdy, J.C., Kantas, S., Sortais, P., Sole, P., VieuxRochaz, J.L. (2000). Electron cyclotron resonance ion source for high currents of mono and multicharged ion and general purpose unlimited lifetime application on implantation devices. Rev. Sci. Instrum. 71, 899– 901. Bieth, C., Kantas, S., Tasset, O. (2005). ECR ions source dedicated to hadron and proton therapies. In: XXXIV European Cyclotron Progress Meeting (ECPM 2005, October 6–8, 2005, Belgrade, Serbia and Montenegro). Biri, S., Valek, A., Vámosi, J. (1995). Status of ECRIS building and recent results on trap modeling. In: Proceedings of the 12th International Workshop on ECR Ion Sources (RIKEN, Institute for Nuclear Study, University of Tokyo, Tanashi, Tokyo 188, Japan), INS-J-182, pp. 207–211. Biri, S., Nakagawa, T., Kidera, M., Kenez, L., Valek, A., Yano, Y. (1999). Highly charged ion production using an electrode in biased and floating mode. In: Proceedings of the 14th International Workshop on ECR Sources (CERN, Geneva), pp. 81–85. Bliman, S., Dousson, S., Fremion, L., Geller, R. (1978). Source d’ions multichargés pour vapeurs métalliques. Nucl. Instrum. Meth. 148, 213– 216. Bliman, S., Aubert, J., Geller, R., Jacquot, B., Van Houtte, D. (1981). Electron capture at keV energies of multiply charged ions of carbon and argon with molecular deuterium. Phys. Rev. A 23, 1703–1707. Bol, J.L., Jongen, Y., Lacroix, M., Mathy, F., Ryckewaert, G. (1985). Operational results and developments of the ECR sources and the injector into CYCLONE. IEEE Transactions on Nuclear Science N.S. 32 (5), 1817– 1819. Bouchama, T., Druetta, M. (1989). Sensitivity calibration of a VUV spectrometer using cross section measurements. Nucl. Instrum. Meth. Phys. Res. B 40, 1252–1254. Briand, J.P. (1996). Pierre Auger and new trends in the study of the Auger effects. Comments on Atomic and Molecular Physics 33, 1–9. Briand, J.P., de Billy, L., Charles, P., Essabaa, S., Briand, P., Geller, R., Desclaux, J.P., Bliman, S., Ristori, C. (1990). Production of hollow atoms by the excitation of highly charged ions in interaction with a metallic surface. Phys. Rev. Lett. 65, 159–162.
152
HITZ
Briand, J.P., Thuriez, S., Giardino, G., Borsoni, G., Froment, M., Edrieff, M., Sébenne, C. (1996). Observation of hollow atoms or ions above insulator and metal surfaces. Phys. Rev. Let. 77, 1452–1455. Brown, I. (2004). The Physics and Technology of Ion Sources. Brown, I. (Ed.), Wiley-VCH Verlag GmbH&Co, Weinheim. Chang, C.Y., Sze, S.M. (1997). ULSI Technology. McGraw-Hill. Chen, F. (1984). Introduction to Plasma Physics and Controlled Fusion, 2nd ed. Plenum Press, New York. Ciavola, G., Gammino, S. (1992). A superconducting electron cyclotron resonance source for the L.N.S. Rev. Sci. Instrum. 63, 2881–2882. Ciavola, G., Gammino, S., Celona, L., Torrisi, L., Passarello, S., Andóo, L., Cavenago, M., Galatà, A., Spädtke, P., Tinschert, K., Lang, R., Iannucci , R., Leroy, R., Barué, C., Hitz, D., Seyfert, P., Koivisto, H., Suominen, P., Tarvainen, O., Beijers, H., Brandenburg, S., Vanrooyen, D., Hill, C., Küchler, D., Homeyer, H., Röhrich, J., Schachter, L., Dobrescu, S. (2006). Multipurpose superconducting electron cyclotron resonance ion source, the European roadmap to third-generation electron cyclotron resonance ions sources. Rev. Sci. Instrum. 77, 03A303 (5 pages). Claudet, G. (1977). Source d’ions multichargés type Supermafios en version cryogénique. Etude de faisabilité. Note CEA/DTCE/SBT 433/77, in French. Collin, R.E. (1960). Field Theory of Guided Waves. McGraw-Hill Inc. Consolino, J., Geller, R., Leroy, C. (1969). Faisceau intense d’ions à très faible tension d’extraction. In: Proceedings of the 1st International Conference on Ion Sources (Saclay, France), pp. 537–547. Delaunay, M., Fehringer, M., Geller, R., Hitz, D., Varga, P., Winter, H. (1985). Electron emission from a metal surface bombarded by slow highly charged ions. Phys. Rev. B 35, 4232–4235. Delaunay, M., Jacquot, B., Pontonnier, M. (1991). Exploration axiale du plasma RCE d’une source d’ions multichargés CAPRICE. Nucl. Inst. Meth. A 305, 223–231. de Michelis, C., Mattioli, M. (1981). Soft X-ray spectroscopic diagnostics of laboratory plasmas. Nuclear Fusion 21, 677–754. Donets, E.D., Illushchenko, V.I., Alpert, V.A. (1969). Ultra high vacuum electron beam source of highly stripped ions. In: Proceedings of the 1st International Conference on Ion Sources (Saclay, France), pp. 635–642. Douysset, G., Khodja, H., Girard, A., Briand, J.P. (2000). Highly charged ion densities and ion confinement properties in an electron cyclotron resonance ion sources. Phys. Rev. E 61, 3015–3022. Drake, S., Schmelzbach, P. (2002). Emittance measurements at the PSI ECR heavy ion source. Private communication.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
153
Drentje, A.G. (2003). Techniques and mechanisms applied in electron cyclotron resonance sources for highly charged ions. Rev. Sci. Instrum. 74, 2631–2645. Drentje, A.G., Kremers, H.R., Sijbring (1995). The new ECRIS3 for the AGOR cyclotron. In: Proceedings of the 12th International Workshop on ECR Ion Sources (RIKEN, Institute for Nuclear Study, University of Tokyo, Tanashi, Tokyo 188, Japan), INS-J-182, pp. 148–151. Drentje, A.G., Barzangy, F., Kremers, H.R., Meyer, D., Mulder, J., Sijbring, J. (1999). Can an hexapole magnet of an ECR ion source be too strong? In: Proceedings of the 14th International Workshop on ECR Ion Sources (Geneva), pp. 94–97. Drentje, A., Girard, A., Hitz, D., Melin, G. (2000). Role of low charge state ions in electron cyclotron resonance ion source plasmas. Rev. Sci. Instrum. 71, 623–626. Drentje, A.G., Muramatsu, M., Kitagawa, A. (2006). Optimizing C4+ and C5+ beams of the Kei2 electron cyclotron resonance ion source using a special gas-mixing technique. Rev. Sci. Instrum. 77, 03B701 (3 pages). Druetta, M., Hitz, D. (1992). VUV diagnostics of the plasma of an ECR ion source. J. Optics 23, 259–262. Efremov, A.A., Kutner, V.B., Lebedev, A.N., Loginov, V.N., Yazviskiy, N.Yu., Zhao, H.W. (1995). Preliminary results of DECRIS-14-2. In: Proceedings of the 12th International Workshop on ECR Ion Sources (RIKEN, Institute for Nuclear Study, University of Tokyo, Tanashi, Tokyo 188, Japan), INSJ-182, pp. 228–231. Efremov, A., Bekhterev, V., Bogomolov, S., Dmitriev, S., Lebedev, A., Leporis, M., Nikiforov, A., Paschenko, S., Yakovlev, B., Yazvitsky, N., Datskov, V., Drobin, V., Seleznev, V., Tsvineva, G., Shiskov, Yu.A. (2006). Status of the ion source DECRIS-SC. Rev. Sci. Instrum. 77, 03A320 (3 pages). Finkenthal, M., Yu, T.L., Lippmann, S., Huang, L.K., Moos, H.W., Stratton, B.C., Bathia, A.K., Bengston, R.D., Hodge, W.L., Phillips, P.E., Porter, P.E., Price, T.R., Rhodes, T.R., Richards, B., Ritz, C.P., Rowan, W.L. (1987). A comparison of the C III, O V, F VI, and Ne VII line emission from a laboratory plasma with theoretical prediction and astrophysical observations. The Astrophysical Journal 313, 920–925. Gammino, S., Ciavola, G., Antaya, T., Harrison, K. (1996). Volume scaling and magnetic field scaling on SC-ECRIS at MSU-NSCL. Rev. Sci. Instrum. 67, 155–160. Gammino, S., Ciavola, G., Celona, L., Castro, M., Chines, F. (1999). 18 GHz upgrading of the superconducting electron cyclotron resonance ion source SERSE. Rev. Sci. Instrum. 70, 3577–3582.
154
HITZ
Gammino, S., Ciavola, G., Celona, L., Hitz, D., Girard, A., Melin, G. (2001). Operation of the SERSE superconducting electron cyclotron resonance ion source at 28 GHz. Rev. Sci. Instrum. 72, 4090–4097. Gammino, S., Ciavola, G., Celona, L., Ando, L., Menna, M., Torrisi, L., Hitz, D., Girard, A., Melin, G., Seyfert, P. (2002). Gyroserse, a new superconducting ECRIS. In: Proceedings of the 15th International Workshop on ECR Ion Sources (University of Jyväskylä, Finland), JYFL Report 4/2002, pp. 17–20. Gammino, S., Celona, L., Ciavola, G., Consoli, F., Mascali, D., Barbarino, S., Maimone, F. (2006). Intense heavy ion beam production with ECR sources. In: 2006 Linear Accelerator Conference (August 21–25, Knoxville, TN, USA). Garner, R.C., Mauel, M.E., Hokin, S.A., Post, R.S., Smatlak, D.L. (1990). Whistler instability in an electron-cyclotron-resonance-heated, mirrorconfined plasma. Physics of Fluids B 2, 242–245. Gaudart, G. (1995). Etude de la population électronique énergétique d’une source d’ions à resonance cyclotron des electrons. Ph.D. thesis, Université Joseph Fourier-Grenoble I (in French). Geller, R. (1965). Extension de la loi de Child Langmuir à un faisceau de particules animé d’une vitesse dirigée. Rapport CEA, R2898, 1–23 and references therein. Geller, R. (1970). New high intensity ion source with very low extraction voltage. Appl. Phys. Lett. 16, 401–404. Geller, R. (1976). Electron cyclotron resonance multiply charged ion sources. IEEE Trans. Nucl. Sci. 23, 904–912. Geller, R., Jacquot, B., Pauthenet, R. (1980). Micromafios, source d’ions multichargés basée sur la résonance cyclotron électronique. Revue de Physique Appliquée 15, 995–999. Geller, R. (1996). Electron Cyclotron Resonance Ion Sources and ECR Plasmas. Institute for Physics Publishing, Bristol. Geller, R. (1998). Electron cyclotron resonance ion sources: Historical review and future propects. Rev. Sci. Instrum. 69, 1302–1310. Girard, A. (1992). Plasma diagnosis related to ion sources. Rev. Sci. Instrum. 63, 2676–2682. Girard, A., Briand, P., Gaudart, G., Klein, J.P., Bourg, F., Debernardi, J., Mathonnet, J.M., Melin, G., Su, Y. (1994). The Quadrumafios electron cyclotron resonance ion source: Presentation and analysis of the results. Rev. Sci. Instrum. 65, 1714–1717. Girard, A., Klein, J.P., Gaudart, G., Perret, C. (1995). Electron cyclotron resonance ion sources: Experiments and theory. In: Proceedings of the 12th International Workshop on ECR Ion Sources (RIKEN, Institute for Nuclear Study, University of Tokyo, Tanashi, Tokyo 188, Japan), INS-J182, pp. 164–169.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
155
Girard, A., Hitz, D., Melin, G., Serebrenikov, K. (2004). Electron cyclotron resonance plasmas and electron cyclotron resonance ion sources: Physics and technology. Rev. Sci. Instrum. 75, 1381–1388 (and references therein). Hahto, S.K., Leung, K.N., Reijonen, J., Ji, Q., Schneider, D., Bruch, R., Kondagari, S., Merabet, H. (2005). Permanent magnet microwave source for generation of EUV light. In: Proceedings of the 16th International Workshop on ECR Ion Sources, AIP Conference Proceedings, vol. 49, pp. 179–182. Halbach, K. (1980). Design of permanent multipole magnets with oriented rare earth cobalt material. Nucl. Instrum. Meth. 169, 1–10. Hasegawa, T., Matsuda, T., Kaminisi, K., Hagiwara, H., Koba, K., Miyaaji, M., Hattori, T., Osada, E. (1995). Ion irradiation system using Nanogan ECR ion source at Miyazaki University. In: Proceedings of the 12th International Workshop on ECR Ion Sources (RIKEN, Institute for Nuclear Study, University of Tokyo, Tanashi, Tokyo 188, Japan), INS-J-182, pp. 64– 66. Hibst, R., Bukow, H.H. (1988). Measurement of some selected line ratios in the EUV domain. Nucl. Instrum. Meth. B 31, 284–289. Higurashi, Y., Nakagawa, T., Kidera, M., Kageyama, T., Aihara, T., Kase, M., Yano, Y. (2003). Optimization of magnetic field configuration for the production of Ar ions from RIKEN 18 GHz ECR ion source. Nucl. Instrum. Meth. A 510, 206–210. Higurashi, Y., Nakagawa, T., Kidera, M., Aihara, T., Kase, M., Yano, Y. (2004). Effect of the plasma electrode position on the beam intensity and emittance of the RIKEN 18 GHz ECRIS. In: Proceedings of the 16th International Workshop on ECR Ion Sources, AIP Conference Proceedings, vol. 749, pp. 71–74. Hitz, D. (1995). Evolution of the Caprice concept. In: Proceedings of the 12th International Workshop on ECR Ion Sources (RIKEN, Institute for Nuclear Study, University of Tokyo, Tanashi, Tokyo 188, Japan), INS-J182, pp. 161–163. Hitz, D., Druetta, M., Khardi, S. (1992). Spectroscopic study of the plasma of the 10 GHz Caprice source. Rev. Sci. Instrum. 63, 2889–2891. Hitz, D., Bourg, F., Ludwig, P., Melin, G., Pontonnier, M., Nguyen, T.K. (1995). The new 1.2 T Caprice source: Presentation and results. In: Proceedings of the 12th International Workshop on ECR Ion Sources (RIKEN, Institute for Nuclear Study, University of Tokyo, Tanashi, Tokyo 188, Japan), INS-J-182, pp. 126–130. Hitz, D., Bourg, F., Delaunay, M., Ludwig, P., Melin, G., Pontonnier, M., NGuyen, T.K. (1996). The new 1.2 T–14.5 GHz Caprice source of multiply charged ions: Results with metallic elements. Rev. Sci. Instrum. 67, 883– 885.
156
HITZ
Hitz, D., Girard, A., Melin, G., Debernardi, J., Mathonnet, J.M., Ciavola, G., Gammino, S., Celona, L., Marletta, S., Messina, E. (2000a). First results of the Serse source operation at 28 GHz. In: Workshop on Production of Intense Beams of Highly Charged Ions (Catania, Italy), Conference Proceedings from Italian Physical Society, vol. 72, pp. 13–17. Hitz, D., Melin, G., Girard, A. (2000b). Fundamental aspects of electron cyclotron resonance ion sources: From classical to large superconducting devices. Rev. Sci. Instrum. 71, 839–845. Hitz, D., Cormier, D., Mathonnet, J.M. (2002a). A new room temperature ECR ion source for accelerator facilities. In: Proceedings of the 8th European Particle Conference (EPAC) (Paris), pp. 1718–1720. Hitz, D., Girard, A., Melin, G., Ciavola, G., Gammino, S., Celona, L. (2002b). Results and interpretation of high frequency experiments at 28 GHz in ECR ion sources, future propects. Rev. Sci. Instrum. 73, 509–512. Hitz, D., Cormier, D., Mathonnet, J.M., Girard, A., Melin, G., Lansaque, F., Serebrenikov, K., Sun, L.T. (2002c). Grenoble Test Source (GTS): A multipurpose room temperature ECRIS. In: Proceedings of the 15th International Workshop on ECR Ion Sources (Jyväskylä, Finland, June 12– 14), JYFL Research Report N◦ 4/2002, pp. 53–55. Hitz, D., Girard, A., Melin, G., Gammino, S., Ciavola, G., Celona, L. (2002d). The comparison of 18 and 28 GHz behavior of SERSE: Some conclusions. In: Proceedings of the 15th International Workshop on ECR Ion Sources (University of Jyväskylä, Finland), JYFL Report 4/2002, pp. 100–103. Hitz, D., Cormier, D., Girard, A., Melin, G., Mathonnet, J.M., Sun, L.T. (2003). Multiply charged ion production with ECR ion sources: State of the art and prospects. Nucl. Instrum. Meth. B 205, 168–172. Hitz, D., Girard, A., Serebrenikov, K., Melin, G., Cormier, D., Chartier, J., Sun, L.T., Briand, J.P., Benhachoum, M. (2004a). Production of highly charged ion beams with the Grenoble test electron cyclotron resonance ion source. Rev. Sci. Instrum. 75, 1403–1406. Hitz, D., Delaunay, M., Quesnel, E., Vannuffel, E., Michallon, P., Girard, A., Guillemet, L., Robic, J.Y. (2004b). All-permanent magnet ECR plasma for EUV light. In: Third International Symposium on EUV Light (Miyazaki, Japan). Hitz, D., Delaunay, M., Girard, A., Guillemet, L., Mathonnet, J.M., Chartier, J., Meyer, F.W. (2005a). An all-permanent magnet ECR ion source for the ORNL MIRF upgrade project. In: Proceedings of the 16th International Workshop on ECR Ion Sources (Berkeley, CA, USA, September 26–30, 2004), AIP Conference Proceedings, vol. 49, pp. 123–126. Hitz, D., Girard, A., Guillaume, D., Guillemet, L., Seyfert, P., Poncet, J.M., Sun, L.T. (2005b). Design study of a hybrid ECRIS. In: Proceedings of the 16th International Workshop on ECR Ion Sources (Berkeley, CA, USA,
ELECTRON CYCLOTRON RESONANCE ION SOURCES
157
September 26–30, 2004), AIP Conference Proceedings, vol. 49, pp. 157– 160. Hofman, S. (1997). Heavy and superheavy neuclei. Zeitschrift für Physik A 358, 125–129. Hofman, S., Ninov, V., Hessberger, F.P., Ambruster, P., Folger, H., Münzenberg, G., Schött, H.J., Popeko, A.G., Yeremin, A.V., Saro, S., Janik, R., Leino, M. (1996). The new element 111. Zeitschrift für Physik A 350, 281– 282. Hollinger, R. (2004). In: Brown, I. (Ed.), The Physics and Technology of Ion Sources. Wiley-VCH Verlag GmbH&Co, Weinheim, pp. 61–106. Huba, J.D. (1994). NRL Plasma Formulary. Office of Naval Research, Washington, DC, 20375. Hunger, H.G. (1957). Normal mode bends for circular electric waves. Bell System Technical Journal, 1292–1297. Hutchinson, I.H. (1990). Principles of Plasma Diagnostics. Cambridge University Press. Ivanov, A.A., Wiesemann, K. (2005). Ion confinement in electron cyclotron resonance ion sources (ECRIS): Importance of nonlinear plasma-wave interaction. IEEE Transactions on Plasma Science 33, 1743–1762. Itikawa, Y., Hara, S., Kato, T., Nakazaki, S., Pindzola, M.S., Crandall, D.H. (1985). Electron impact cross sections and rate coefficients for excitations of carbon and oxygen ions. Atomic Data and Nuclear Data Tables 33, 149– 193. Jacquot, B., Pontonnier, M. (1991). The new 10 GHz Caprice source— magnetic structures and performances. In: Proceedings of the 10th International Workshop on ECR Ion Sources, ORNL Report #CONF-9011136, pp. 133–155. Jacquot, B., Bourg, F., Geller, R. (1987). Source d’ions lourds multichargés caprice 10 GHz pour tous les élements métalliques et gazeux. Nucl. Instrum. Meth. A 254, 13–21. Jacquot, B., Briand, P., Bourg, F. (1988). Source d’ions lourds Caprice 10 GHz 2ωce . Nucl. Instrum. Meth. A 269, 1–6. Jaeger, F., Lichterberg, A.J., Lieberman, M.A. (1972). Theory of electron cyclotron resonance heating. Plasma Physics 14, 1073–1100. Kanjilal, D., Rodrigues, G.O., Kumar, P., Safvan, C.P., Rao, U.K., Mandal, A., Roy, A., Bieth, C., Kantas, S., Sortais, P. (2005). First high temperature superconducting ECRIS. In: Proceedings of the 16th International Workshop on ECR Ion Sources (Berkeley), AIP Conference Proceedings, vol. 749, pp. 19–22. Kanjilal, D., Rodrigues, G., Kumar, P., Mandal, A., Roy, A., Bieth, C., Kantas, S., Sortais, P. (2006). Performance of first high temperature superconducting ECRIS. Rev. Sci. Instrum. 77, 03A317 (3 pages).
158
HITZ
Kasparek, W., Kumri´c, H., Müller, G.A., Plaum, B., Girard, A., Hitz, D., Melin, G. (2002). Development of transmission lines at frequencies 18– 37 GHz for application with ECR ion sources. In: Proceedings of the 27th International Conference on Infrared and Millimeters Wave (San Diego). Katayose, T., Hattori, T., Yamada, S., Kitagawa, K., Sekiguchi, M. (1995). Design of the HiECR(MK-3) ion source. In: Proceedings of the 12th International Workshop on ECR Ion Sources (RIKEN, Institute for Nuclear Study, University of Tokyo, Tanashi, Tokyo 188, Japan), INS-J-182, pp. 281–283. Kato, T., Masai, K., Sato, K. (1985). Effect of inner dubshell ionization on emission lines of O IV ions. Physics Letters A 108, 259–262. Kato, T., Lang, J., Berrington, K.A. (1990). Intensity ratios of emission lines from O V ions for temperature and density diagnostics, and recommended excitation rate coefficients. Atomic Data and Nuclear Data Tables 44, 133– 187. Kato, Y., Satomi, N., Nishikawa, M., Watanabe, K. (1993). Electron temperature measurement using the line intensity ratio on the CTCC spheromak. Plasma Phys. and Control. Fusion 35, 1513–1528. Keenan, F.P. (1992). Line ratio diagnostics for astrophysical plasmas. In: Silver, E.H., Kahn, S.M. (Eds.), Proceedings from the 10th International Colloquium on U.V. and X-Ray Spectroscopy of Astrophysical and Laboratory Plasmas (Berkeley). Cambridge University Press, pp. 44–58. Kitagawa, A., Muramatsu, M., Sasaki, M., Yamada, S., Jincho, K., Sakuma, T., Sasaki, N., Takahashi, H., Takagugi, W., Yamamoto, M., Biri, S., Sudlitz, K., Drentje, A. (2002). Trial for extension of the range of ion species at HIMAC. In: Proceedings of the 15th International Workshop on ECR Ion Sources (University of Jyväskylä, Finland), JYFL Report 4/2002, pp. 70–73. Klein, J.P. (1995). Ph.D. thesis, Université Pierre et Marie Curie-Paris VI (in French). Klose, J.Z., Deters, T.M., Fuhr, J.R., Wiese, W.L. (1993). Atomic branching ratio for carbon-like ions. J. of Quanitative Spectroscopy and Radiative Transfer 50, 1–6. Koivisto, H., Suominen, P., Tarvainen, O., Hitz, D. (2004). A modified permanent magnet structure for a stronger multipole magnetic field. Rev. Sci. Instrum. 75, 1479–1481. Koivisto, H., Suominen, P., Tarvainen, O., Ärje, J., Lammentausta, E., Lappalainen, P., Kalvas, T., Ropponen, T., Frondelius, P. (2005). Recent ECRIS related research and development work at JYFL. In: Proceedings of the 16th International Workshop on ECR Ion Sources (Berkeley), AIP Conference Proceedings, vol. 749, pp. 27–30.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
159
Koivisto, H., Suominen, P., Tarvainen, O., Virtanen, A., Parkkinen, A. (2006). Electron cyclotron resonance ion source related development work for heavy irradiation tests. Rev. Sci. Instrum. 77, 03A316 (3 pages). Kurita, T., Nakagawa, T., Kawaguchi, T., Lee, S.M. (2000). Design of electron cyclotron resonance ion source using liquid-helium-free superconducting solenoid coils. Rev. Sci. Instrum. 71, 909–911. Kutner, V.B., Zhao, H.W., Efremov, A.A. (1995). Design study for an advanced ECRIS with new radial multipolar magnet. In: Proceedings of the 12th International Workshop on ECR Ion Sources (RIKEN, Institute for Nuclear Study, University of Tokyo, Tanashi, Tokyo 188, Japan), INSJ-182, pp. 223–227. Leitner, D. (2006). Private communication. Leitner, D., Lyneis, C.M. (2005). High intensity high charge state ECR ion sources. In: Proceedings of 2005 Particle Accelerator Conference (Knoxville, TN), pp. 179–183. Leitner, D., Lyneis, C.M., Abbott, S.R., Dwinell, R.D., Collins, D. (2005). First results of the superconducting ECR ion source Venus with 28 GHz. In: Proceedings of the 16th International Workshop on ECR Ion Sources (Berkeley, CA, USA, September 26–30, 2004), AIP Conference Proceedings, vol. 49, pp. 3–9. Leitner, D., Lyneis, C.M., Loew, T., Todd, D.S., Virostek, S., Tarvainen, O. (2006). Status of the 28 GHz superconducting electron cyclotron resonance ion source VENUS. Rev. Sci. Instrum. 77, 03A302 (6 pages). Liu, Z., Zhang, W., Guo, X., Yuan, P., Zhou, S., Wei, B. (1995). Status report on HIRFL ECR2 ion source. In: Proceedings of the 12th International Workshop on ECR Ion Sources (RIKEN, Institute for Nuclear Study, University of Tokyo, Tanashi, Tokyo 188, Japan), INS-J-182, pp. 235–236. Los Alamos Accelerator Code Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA (1987). http://laacg1.lanl.gov/laacg. Lotz, W. (1968). Electron-impact ionization cross-sections and ionization rate coefficients for atoms and ions from hydrogen to calcium. Zeitschrift für Physik A 216, 241–247. Ludwig, P., Bourg, F., Briand, P., Girard, A., Melin, G., Guillaume, D., Seyfert, P., LaGrassa, A., Ciavola, G., DiBartolo, G., Gammino, S., Cafici, M., Castro, M., Chines, F., Marletta, S., Passarello, S. (1998). Preliminary results of the 14 GHz SERSE superconducting electron cyclotron resonance ion source working in high-B mode. Rev. Sci. Instrum. 69, 653–655. Lyneis, C.M. (1992). ECR ion sources for accelerators. Report LBL-32649, UC413. Lyneis, C.M., Leitner, D., Abbott, S.R., Dwinell, R.D., Leitner, M., Silver, C.S. (2004). Results with the superconducting electron cyclotron resonance ion source VENUS. Rev. Sci. Instrum. 75, 1389–1393.
160
HITZ
Lyneis, C.M., Leitner, D., Todd, D., Virostek, S., Loew, T., Heinen, A., Tarvainen, O. (2006). Measurments of bremstrahlung production and Xray cryostat heating in VENUS. Rev. Sci. Instrum. 77, 03A342 (5 pages). Marcuvitz, N. (1951). Waveguide Handbook. McGraw-Hill Book Company Inc. Masterman, P.H., Clarricoat, P.J.B. (1971). Computer field matching solution of waveguide transverse discontinuities. In: Proceedings of the Institution of Electrical Engineers-London, vol. 118, pp. 51–55. Meguro, T., Hida, A., Suzuki, M., Koguchi, Y., Takai, H., Yamamoto, Y., Maeda, K., Aoyagi, Y. (2001). Creation of nanodiamonds by single impacts of highly charged ions upon graphite. Appl. Phys. Let. 79, 3866–3868. Melin, G. (1997). ECR ion sources: Present status and prospects. Physica Scripta T 71, 14–22. Melin, G., Girard, A. (1997). ECR ion sources. In: Shafroth, S.M., Austin, J.C. (Eds.), Accelerator-Based Atomic Physics Techniques and Applications. American Institute of Physics, New York, pp. 33–66. Melin, G., Bourg, F., Briand, P., Debernardi, J., Delaunay, M., Geller, R., Jacquot, B., Ludwig, P., N’Guyen, T.K., Pin, L., Rocco, J.C., Zadworny, F. (1990). Some particular aspects of the physics of the ECR sources for multicharged ions. Rev. Sci. Instrum. 61, 236–238. Meyer, F.W., Havener, C.C., Snowdon, S., Overbury, H., Zehner, D.M., Heiland, W. (1987). Phys. Rev. A 35, 3176–3179. Meyer, F.W., Bannister, M.E., Dowling, D., Hale, J.W., Havener, C.C., Johnson, J.W., Juras, R.C., Krause, H.F., Mendez, A.J., Sinclair, J., Tatum, A., Vane, C.R., Bahati Musafiri, E., Fogle, M., Rejoub, R., Vergara, L., Hitz, D., Delaunay, M., Girard, A., Guillemet, L., Chartier, J. (2006). The ORNL multicharged ion research facility upgrade project. Nucl. Instrum. Meth. B 242, 71–78. Mühle, C., Ratzinger, U., Jöst, G., Leible, K., Schennach, S., Wolf, B. (1995). PUMA-ECR ion source operation. In: Proceedings of the 12th International Workshop on ECR Ion Sources (RIKEN, Institute for Nuclear Study, University of Tokyo, Tanashi, Tokyo 188, Japan), INS-J-182, pp. 29– 33. Müller, L., Albers, B., Heinen, A., Kahnt, M., Nowak, L., Ortjohann, H.W., Täschner, A., Vitt, C., Wolosin, S., Andrä, H.J. (2002). The new Münster 18 GHz plateau ECRIS. In: Proceedings of the 15th International Workshop on ECR Ion Sources (University of Jyväskylä, Finland), JYFL Report 4/2002, pp. 35–41. Muramatsu, M., Kitagawa, A., Ogawa, H., Iwata, Y., Yamamoto, K., Yamada, S., Ogawa, H., Fujimoto, T., Yoshida, Y., Drentje, A.G. (2006). Improvement of the Kei2 source for a new carbon therapy facility. Rev. Sci. Instrum. 77, 03A307 (1 page).
ELECTRON CYCLOTRON RESONANCE ION SOURCES
161
Nakagawa, T., Arje, J., Miyazawa, Y., Hemmi, M., Kase, M., Kageyama, T., Kamigaito, O., Chiba, T., Inabe, N., Goto, A., Yano, Y. (1996). Development of RIKEN 18 GHz ECRIS. In: Proceedings of the 5th European Particle Accelerator Conference (IOP Bristol), pp. 536–538. Nakagawa, T., Arje, J., Miyazawa, Y., Hemmi, M., Chiba, T., Inabe, N., Kase, M., Kageyama, T., Kamigaito, O., Kidera, M., Goto, A., Yano, Y. (1998). Production of intense beams of highly charged metallic ions from RIKEN 18 GHz electron cyclotron resonance ion source. Rev. Sci. Instrum. 69, 637–639. Nakagawa, T., Kurita, T., Kidera, M., Imanaka, M., Higurashi, Y., Tsukada, M., Lee, S.M., Kase, M., Yano, Y. (2002a). Intense beam production from RIKEN 18 GHz ECRIS and liquid-He free SC-ECRISs. Rev. Sci. Instrum. 73, 513–515. Nakagawa, T., Kurita, T., Imanaka, M., Arai, H., Kidera, M., Higurashi, Y., Lee, S.M., Kase, M., Yano, Y. (2002b). Recent progress of liquid-He free SC-ECRISs. In: Proceedings of the 15th International Workshop on ECR Ion Sources (University of Jyväskylä, Finland), JYFL Report 4/2002, pp. 25–28. Nakagawa, T., Aihara, T., Higurashi, Y., Kidera, M., Kase, M., Yano, Y., Arai, I., Arai, H., Imanaka, M., Lee, S.M., Arzumanyan, G., Shirkov, G. (2004). Electron cyclotron resonance ion source developments in RIKEN. Rev. Sci. Instrum. 75, 1394–1398. Nakagawa, T., Higurashi, Y., Kidera, M., Aihara, T., Kase, M., Goto, A., Yano, Y. (2006). Effect of magnetic field configuration on the beam intensity from electron cyclotron resonance ion source and RIKEN superconducting electron cyclotron resonance ion source. Rev. Sci. Instrum. 77, 03A304 (4 pages). Pastukhov, V.P. (1987). Classical longitudinal plasma losses from open adiabatic traps. Review of Plasma Physics 13, 203–259. Perret, C. (1998). Caractérisation de la population électronique dans un plasma de source d’ions à resonance cyclotronique électronique. Ph.D. thesis, Université Joseph Fourier-Grenoble (in French). Petty, C.C., Goodman, D.L., Smatlak, D.L., Smith, D.K. (1991). Confinement of multiply charged ions in an electron resonance heated mirror plasma. Phys. Fluids B 3, 705–714. Phaneuf, R.A., Janev, R.K., Pindzola, M.S. (1987). Collisions of carbon and oxygen ions with electrons, H, H2 and He. Atomic Data for Fusion, ORNL Report 6090/V5 (635 pages). Plaum, B., Wagner, D., Kasparek, W., Thumm, M. (2001). Optimization of oversized waveguide components using a genetic algorithm. Fusion Engineering and Design 53, 499–503.
162
HITZ
Pöffel, W., Schartner, K.H., Mank, G., Salzborn, E. (1990). VUV spectroscopy for plasma diagnostics of an ECR ion source. Rev. Sci. Instrum. 61, 613–615. Postma, H. (1970). Multiply charged heavy ions produced by energetic plasmas. Phys. Let. A 31, 196–197. Romand, J., Vodar, B. (1962). Un monochromateur à réseau concave en incidence tangentielle pour l’ultra violet lointin. Journal of Modern Optics 9, 371–381. Runkel, S., Hohn, O., Stiebing, K.E., Schempp, A., Schmidt-Böcking, H. (2000). Time resolved experiments at the Frankfurt 14 GHz electron cyclotron resonance ion source. Rev. Sci. Instrum. 71, 912–914. Sasaki, A. (2003). Theoretical EUV spectrum of near Pd-like Xe. J. Plasma Fusion Res. 79, 315–317. Schachter, L., Dobrescu, S., Badescu-Singureanu, A.I. (1998). High secondary electron emission for an enhanced electron density in electron cyclotron resonance plasma. Rev. Sci. Instrum. 69, 706–708. Silverman, P.J. (2005). Extreme ultraviolet lithography: Overview and development status. J. Microlith., Microfab., Microsyst. 4, 011006, 1–5. Simonen, T.C. (1981). Experimental progress in magnetic-mirror fusion research. Proceedings of the IEEE 69, 935–957. Skalyga, V., Zorin, V. (2005). Multicharged ion trap generation in plasma confined in a cusp magnetic trap at quasigasdynamic regime. In: Proceedings of the 16th International Workshop on ECR Ion Sources (Berkeley, CA, USA, September 26–30, 2004), AIP Conference Proceedings, vol. 49, pp. 112–115. Sortais, P., Bieth, C., Foury, P., Lescene, N., Leroy, R., Mandin, J., Marry, C., Pacquet, J.Y., Robert, E., Villari, C.C. (1995). Developments of compact permanent magnet ECRIS. In: Proceedings of the 12th International Workshop on ECR Ion Sources (RIKEN, Institute for Nuclear Study, University of Tokyo, Tanashi, Tokyo 188, Japan), INS-J-182, pp. 44–52. Sortais, P., Bouly, J.L., Curdy, J.C., Lamy, T., Sole, P., Thuillier, T., VieuxRochaz, J.L., Voulot, D. (2004). ECRIS development for stable and radioactive pulsed beams. Rev. Sci. Instrum. 75, 1610–1612. Spädtke, P. (2004). In: Brown, I. (Ed.), The Physics and Technology of Ion Sources. Wiley-VCH Verlag GmbH&Co, Weinheim, pp. 41–60. Spädtke, P., Tinschert, K., Lang, R., Iannucci, R. (2005). Use of simulations based on experimental data. In: Proceedings of the 16th International Workshop on ECR Ion Sources (Berkeley, CA, USA, September 26–30, 2004), AIP Conference Proceedings, vol. 49, pp. 47–54. Stamm, U. (2004). Extreme ultraviolet light sources for the use in semiconductor lithography – state of the art and future developments. J. Phys. D: Appl. Phys. 37, 3244–3253.
ELECTRON CYCLOTRON RESONANCE ION SOURCES
163
Stix, T.H. (1969). Negatively charged open-ended plasma to strip and confine heavy ions. Phys. Rev. Lett. 23, 1093–1097. Stix, T.H. (1992). In: Waves in Plasmas. American Institute of Physics, New York. Sun, L.T. (2004). Design and experimental study of highly charged ECR ion sources. Ph.D. thesis, Institute of Modern Physics, Graduate school of the Chinese Academy of Science, China (in Chinese & English). Sun, L.T. (2006). Private communication. Sun, L.T., Zhao, H.W., Zhang, Z.M., Hitz, D. (2004). The design of a high charge state all permanent ECR ion source. Rev. Sci. Instrum. 75, 1514– 1516. Sun, L.T., Zhao, H.W., Zhang, Z.M., Wei, B., Zhang, X.Z., Guo, X.H., Ma, X.W., Cao, Y., He, W., Zhao, H.Y. (2005). Brief review of multiple charge state ECR ion sources in Lanzhou. Nucl. Instrum. Meth. in Phys. Res. B 235, 524–529. Sun, L.T., Zhao, H.W., Xuezhen, Z., Zimin, Z., Xiaohong, G., Wei, H., Jinyu, L., Yucheng, F., Yun, C., Hui, W., Baohua, M., Xixia, L., Huanyu, Z., Yong, S., Wang, L., Jie, L., Pin, Y., Mingtao, S., Wenlong, Z., Baowen, W., Xie, D.Z. (2006a). Report of Training and Commissioning Results of SECRAL, 2005 Annual Report of IMP & HIRFL. Atomic Energy Press, China. Sun, L.T., Zhao, H.W., Zhang, Z.M., Wang, H., Ma, B.H., Li, X.X., Ma, X.W., Song, M.T., Zhan, W.L. (2006b). A latest developed all permanent magnet ECRIS for atomic physics research at IMP. Rev. Sci. Instrum. 77, 03A319 (3 pages). Suominen, P., Tarvainen, O., Koivisto, H., Hitz, D. (2004a). Optimization of the Halbach-type magnetic multipole for an electron cyclotron resonance ion source. Rev. Sci. Instrum. 75, 59–63. Suominen, P., Tarvainen, O., Koivisto, H. (2004b). The effects of gas mixing and plasma electrode position on the emittance of an electron cyclotron resonance ion source. Rev. Sci. Instrum. 75, 1517–1519. Suominen, P., Tarvainen, O., Koivisto, H. (2006). First results with a modified multipole structure electron cyclotron resonance ion source. Rev. Sci. Instrum. 77, 03A332 (3 pages). Tarvainen, O., Suominen, P., Koivisto, H. (2004). A new plasma potential measurement for plasma sources. Rev. Sci. Instrum. 75, 3138–3145. Taylor, C., Caspi, S., Leitner, M., Lundgren, S., Lyneis, C., Wutte, D., Wang, S.T., Chen, J.Y. (2000). Magnet system for an ECR ion source. IEEE Transactions on Applied Superconductivity 10, 224–227. Thuillier, T., Curdy, J.C., Lamy, T., Lachaize, A., Ponton, A., Sole, P., Sortais, P., Vieux Rochaz, J.L. (2005). High current beam transport with Phoenix 28 GHz: Experiment and simulation. In: Proceedings of the 16th International Workshop on ECR Ion Sources, AIP Conference Proceedings, vol. 749, pp. 41–46.
164
HITZ
Vallcorba, R. (2003). Private communication. Vinogradov, I.P., Jettkant, B., Meyer, D., Wiesemann, K. (1994). Spectroscopic density determination of nitrogen species inan ECR discharge. J. Phys. D: Appl. Phys. 27, 1207–1213. Vondrasek, R., Scott, R., Pardo, R., Koivisto, H., Tarvainen, O., Suominen, P., Edgell, D.H. (2005). ECRIS operation with multiple frequencies. In: Proceedings of the 16th International Workshop on ECR Ion Sources (Berkeley), AIPS Conference Proceedings, vol. 749, pp. 31–34. West, H.I. Jr. (1982). Calculations of ion charge state distribution in ECR ion sources. Lawrence Livermore National Laboratory Report UCRL 53391. Whaley, D.R., Getty, W.D. (1990). Ion temperature effects on ion charge state distributions of an electron cyclotron resonant ion source. Phys. Fluids B 2, 1195–1197. Wolf, B. (1995). Handbook of Ion Sources. Wolf, B. (Ed.). CRC Press, ISBN 0-8493-2502-1. Xie, Z.Q. (1998). Production of highly charged ion beams from electron cyclotron resonance ion sources. Rev. Sci. Instrum. 69, 625–630. Xie, Z.Q., Lyneis, C.M. (1994). Plasma potentials and performance of the advanced electron cyclotron resonance ion source. Rev. Sci. Instrum. 65, 2947–2952. Yang, F., Cunningham, A.J. (1994). Ionic EUV branching ratio measurements using electron impact excitation. J. Quantitative Spectroscopy and Radiative Transfer 49, 53–64. Zavodsky, P. (2005). Design of SuSi—Superconducting source for ions at NSCL/MSU. In: Proceedings of the 16th International Workshop on ECR Ion Sources (Berkeley), AIPS Conference Proceedings, vol. 749, pp. 131– 134. Zavodsky, P., Arend, B., Cole, D., DeKamp, J., Machicoane, G., Marti, F., Miller, P., Moskalik, J., Ottarson, J., Vincent, J., Zeller, A., Kazarinov, N.Y. (2006). Status report on the design and construction of the superconducting source for ions at the National Superconducting Laboratory/Michigan State University. Rev. Sci. Instrum. 77, 03A334 (4 pages). Zhao, H.W., Zhang, Z.M., Sun, L.T., Cao, Y., He, W., Zhang, X.Z., Guo, X.H., Ma, L., Yuan, P. (2005). Recent development of IMP ECR ion sources. In: Proceedings of the 16th International Workshop on ECR Ion Sources (Berkeley), AIPS Conference Proceedings, vol. 749, pp. 10–14. Zhao, H.W., Sun, L.T., Zhang, X.Z., Zhang, Z.M., Guo, X.H., He, W., Yuan, P., Song, M.T., Li, J.Y., Feng, Y.C., Cao, Y., Li, X.X., Zhan, W.L., Wei, B.W. (2006). Advanced superconducting electron cyclotron resonance ion source SECRAL: Design, construction and the first test results. Rev. Sci. Instrum. 77, 03A333 (4 pages).
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 144
Fixed Points of Lattice Transforms and Lattice Associative Memories GERHARD RITTER AND PAUL GADER University of Florida, Gainesville, Florida 32611, USA
I. II. III. IV. V. VI. VII. VIII. IX. X. XI. XII. XIII. XIV. XV.
Introduction . . . . . . . . . . . . Pertinent Basic Properties of Lattices . . . . . Matrices and Lattice-Ordered Groups . . . . . Lattice Dependence and Independence . . . . Lattice Associative Memories . . . . . . . Lattice Dependence and Fixed Points . . . . . Convex Sets and Polytopes in Rn . . . . . . Linear Subspaces and Orientation in Rn . . . . The Shape of F (X) . . . . . . . . . . Remarks Concerning the Dimensionality of F (X) . Strong Lattice Independence . . . . . . . Pattern Reconstruction from Noisy Inputs . . . Kernel Vectors . . . . . . . . . . . Associative Memories Based on Dendritic Computing Conclusion . . . . . . . . . . . . References . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
165 167 170 171 175 181 185 187 193 198 203 213 221 228 238 238
I. I NTRODUCTION The past decade has seen the emergence of a variety of novel neural network models based on lattice algebraic operations. Many of these models have been successfully employed to solve real-world problems. Among the different models based on lattice algebra are the following: morphological associative memories (Ritter et al., 1998, 1999, 2003b; Graña and Raducanu, 2001; Sussner, 2003), shared-weight neural networks (Khabou et al., 2000; Won et al., 1997), regularization neural networks (Gader et al., 1994), hybrid morphological-rank-linear neural networks (Pessoa and Maragos, 2000), min–max neural networks (Simpson, 1992, 1993; Zhang et al., 1996), morphological perceptrons (Ritter and Urcid, 2003; Ritter et al., 2003a), fuzzy lattice networks (Petridis and Kaburlasos, 1998; Kaburlasos, 2003), and adaptive logic networks that combine linear functions by tree expressions of maximum and minimum operations (Armstrong and Thomas, 1996). In this article we restrict our discussion to matrix-based memories operating 165 ISSN 1076-5670/05 DOI: 10.1016/S1076-5670(06)44002-7
Copyright 2006, Elsevier Inc. All rights reserved.
166
RITTER AND GADER
on lattices. The concept of an associative memory is a fairly intuitive one: Associative memory seems to be one of the primary functions of the brain. We easily associate the face of a friend with that of the friend’s name, or a name with a telephone number. Artificial neural networks (ANNs) that are capabable of storing several types of patterns and corresponding associations are refered to as associative memories. Such memories retrieve stored associations when presented with corresponding input patterns. An associative memory is said to be robust in the presence of noise if when presented with a corrupted version of a prototype input pattern it is still capable of retrieving the correct association. Advances in mathematical theory related to associative memories played an important historical role in the revitalization of ANN research. Early research concerned with ANNs almost came to a standstill in the 1970s. A widely publicized book by Minsky and Papert (1969) showed the limitations of the highly touted neural network model known as a perceptron. Probably as much as any other single factor, the efforts of J.J. Hopfield during the early 1980s brought about a profound change in the perception of ANNs within the scientific community. As a well-known physicist of the California Institute of Technology, Hopfield’s scientific credentials lent renewed credibility to the field of ANNs, which had been badly tarnished by the hype of the mid-1960s. Several applications of Hopfield’s early papers include associative or content-addressable memories (Hopfield, 1982, 1984; Hopfield and Tank, 1986). It is, however, important to note that some significant work on associative memories did occur in the 1970s. In 1972, T. Kohonen proposed a correlation matrix model for associative memories. The model was trained—using the outer vector product rule (also known as the Hebb rule)—to learn an association between input and output patterns (Kohonen, 1972, 1987). James Anderson (1972) published a closely related paper at the same time, although he and Kohonen worked independently. Although all the above named early researchers in artificial neural networks have, justifiably, received accolades from their peers, credit must be given to Karl Steinbuch, the German pioneer of ANNs. Steinbuch (1961a) introduced the first associative memory, called the “Lernmatrix” (learn matrix), in 1961. This was followed by the world’s first monograph on artificial neural networks (Steinbuch, 1961b), which was revised and expanded three times (Steinbuch, 1963, 1965, 1972). Because Steinbuch’s publications were in German, his work did not become widely known outside of Germany. He tried to remedy this situation with an English publication (Steinbuch and Piske, 1963). Nonetheless, he was never afforded adequate attention by the international ANN research community. Consequently he is considered by many German researchers as the forgotten pioneer of artificial neural networks.
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
167
The associative memories discussed in this article mirror the structure of the matrix correlation memories that resulted from the work of Steinbuch, Hopfield, and Kohonen. However, since the matrix operations are lattice based, the properties and behavior of these memories are drastically different from the correlation memories based on linear algebra.
II. P ERTINENT BASIC P ROPERTIES OF L ATTICES The concept of lattices was formed with a view to generalize and unify certain relationships between subsets of a set, between substructures of an algebraic structure such as groups, and between geometric structures such as topological spaces. Formally, a lattice is a partially ordered set L any two of whose elements x, y have a greatest lower bound, denoted by x ∧ y, and a least upper bound, denoted by x ∨ y. A prime example of a lattice is the real number system, which is also the focus of this chapter. The real numbers R together with the relation of less or equal (≤) is a totally ordered set; that is, given any pair x, y ∈ R, then either x ≤ y or y ≤ x. If x ∨ y = max{x, y} and x ∧ y = min{x, y} ∀x, y ∈ R, then R together with the operations of ∨ and ∧ is a lattice that we denote by (R, ∨, ∧). Since R is totally ordered, (R, ∨, ∧) is a distributive lattice, which means that the distributive properties x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z),
(1)
x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z)
(2)
hold for (R, ∨, ∧). However, (R, ∨, ∧) is not a complete lattice as there is no smallest and largest number. A complete lattice can be obtained by extending the real numbers to include the symbols −∞ and ∞ by setting R±∞ = R ∪ {−∞, ∞} and defining −∞ < x < ∞ ∀x ∈ R and −∞ ≤ x ≤ ∞ ∀x ∈ {−∞, ∞}. The extended structure (R±∞ , ∨, ∧) is now a complete lattice as well as a distributive lattice. For distributive lattices the more general laws n n a∧ bi = (a ∧ bi ), (3) a∨
i=1 n
bi
=
i=1
and
n k j =1
i=1
(a ∨ bi ),
(4)
i=1
xi,j
i=1 n
≤
k n i=1
j =1
xi,j
(5)
168
RITTER AND GADER
hold (Birkhoff, 1984). The last inequality is known as the minimax principle. In addition to being a distributive lattice, the set of real numbers is also a group under addition, which satisfies the useful properties P1
x ≥y ⇒z+x ≥z+y
P2
x ≥y ⇒z+x+w ≥z+y+w
P3
z + (x ∨ y) = (z + x) ∨ (z + y) and z + (x ∧ y) = (z + x) ∧ (z + y),
where x, y, z, w ∈ R. These properties exhibit the interplay between the lattice and group operations. A lattice that is also a group and satisfies property P2 is called a lattice-ordered group or -group. The two equalities a+ xα = (a + xα ) and a + xα = (a + xα ) (6) α
α
α
α
are true in any -group. In fact, as the following theorem shows, a more general relationship holds (Birkhoff, 1984). Theorem 2.1.
The following equalities hold in any -group: xi + (xi + yj ) = yj i
and
j
i
j
i
(7)
j
xi + (xi + yj ) = yj .
i
(8)
j
In particular, Eqs. (6), (7), and (8) hold in the -group (R, ∨, ∧, +). It is often convenient to deal with only one of the operations ∨ or ∧. Every partially ordered set F for which the operation x∨y (or x∧y) is associative and is defined for each pair x, y ∈ F is called a semilattice whenever x ∨ x = x. For example, (R, ∨) is a semilattice with dual (R, ∧). If R−∞ = R ∪ {−∞} and R∞ = R ∪ {∞}, then (R−∞ , ∨) is a semilattice with dual (R∞ , ∧). Also, since r ∨ (−∞) = (−∞) ∨ r ∀r ∈ R−∞ , (R−∞ , ∨) is a monoid with null element −∞. Similarly, (R∞ , ∧) is a monoid with null element ∞. If a semilattice is also a group, then it is called a semilattice-ordered group or s-group. If (F, ∨, +) and (F, ∧, +∗ ) are s-groups and the equation a ∨ (b ∧ a) = a ∧ (b ∨ a) = a is satisfied ∀a, b ∈ F, then we say that F is an sgroup with duality. If the operations + and +∗ coincide, then the operation + is called self-dual and F is an s-group with self-duality. Obviously, (R, ∨, +)
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
169
and (R, ∧, +) are s-groups with self-duality. If (F, +) and (F, +∗ ) are only semigroups, then F is an s-semigroup with duality. Since (R, ∨, +) and (R, ∧, +) satisfy the left-hand and right-hand equation of Property P3 , respectively, it follows that each of the structures is also a semiring. These two semirings are in a one-to-one correspondence implied by the self-dual relationship r ∗ = −r, where r ∈ R. That is, r ∗ is the dual of r as (r ∗ )∗ = r and r ∧ u = (r ∗ ∨ u∗ )∗ ∀r, u ∈ R. These two isomorphic ssemirings are subalgebras of the -group (R, ∨, ∧, +). By denning operations of addition + and +∗ in the semilattices (R−∞ , ∨) and (R∞ , ∧), respectively, so that (R−∞ , ∨, +) and (R∞ , ∧, +∗ ) are isomorphic, we can again merge these two algebras into one coherent algebra (R±∞ , ∨, ∧, +, +∗ ). Defining a + (−∞) = (−∞) + a = −∞
∀a ∈ R−∞
(9)
turns (R−∞ , ∨, +) into an s-semigroup since −∞ has no inverse under addition. On the other hand, the structure (R−∞ , ∨, +) now shares many of the arithmetic properties of the ring (R, +, ×) if we view ∨ as replacing addition and addition replacing multiplication. For example, we have a + (−∞) = (−∞) + a = −∞ a ∨ (−∞) = (−∞) ∨ a = a a+0=0+a =a
a×0=0×a =0 a+0=0+a =a a × 1 = 1 × a = a,
where the left column illustrates the laws governing the zero and unit for the structure (R−∞ , ∨, +) and the right column the corresponding laws for the ring (R, +, ×). An addition +∗ for the s-semigroup (R∞ , ∧, +∗ ) is defined in a similar fashion by setting a +∗ b = a + b ∀a, b ∈ R and a +∗ (∞) = (∞) +∗ a = ∞ ∀a ∈ R∞ .
(10)
, ∧, +∗ )
has no additive inverse. Again, the null element ∞ of (R∞ To merge the two s-semigroups R−∞ and R∞ into one coherent algebraic structure, the operations + and +∗ need to be extended to the symbols −∞ and ∞. This is achieved by setting a+∞=∞+a =∞
∀a ∈ R∞
a +∗ −∞ = −∞ +∗ a = −∞
∀a ∈ R−∞
(11) (12)
and −∞ + ∞ = ∞ + −∞ = −∞
(13)
−∞ +∗ ∞ = ∞ +∗ −∞ = ∞.
(14)
, ∨, ∧, +, +∗ )
is a distributive lattice and is The resultant structure (R±∞ called a bounded -group or blog. Conjugation of an element r ∈ R±∞ is
170
RITTER AND GADER
defined by r ∗ = −r if r ∈ R, r ∗ = ∞ if r = −∞, and r ∗ = −∞ if r = ∞. As before, (r ∗ )∗ = r and r ∧ u = (r ∗ ∨ u∗ )∗ ∀r, u ∈ R±∞ . Observe also that the operation +∗ in R∞ is the same as the addition + extended to R∞ . For this reason we shall use the representation (R∞ , ∧, +) for the s-semigroup R∞ .
III. M ATRICES AND L ATTICE -O RDERED G ROUPS In recent years, lattice-based matrix operations have found widespread applications in the engineering sciences. In these applications, the usual matrix operations of addition and multiplication are replaced by corresponding lattice operations. For example, given the blog (R±∞ , ∨, ∧, +, +∗ ) and A = (aij ), B = (bij ) two m × n matrices with entries in R±∞ , then the pointwise maximum, A ∨ B, of A and B, is the m × n matrix C defined by A ∨ B = C, where cij = aij ∨ bij . If A is m × p and B is p × n, then the max product of A p and B is the matrix C = A ∨ B, where cij = k=1 = (aik +bkj ). Observe that p this product is analogous to the usual matrixproduct cij = k=1 (aik × bkj ), with the symbols and × replaced by and +, respectively. Since replaces in our definition, the pointwise maximum can be thought of as matrix addition. Example 3.1. An illustration of the max product of a 5×4 and a 4×3 matrix with entries from R is the following: ⎡ ⎡ ⎤ ⎤ ⎡ ⎤ 1 6 −2 2 13 7 16 2 6 −2 ⎢ 7 −5 10 −4 ⎥ ⎢ 18 14 21 ⎥ ⎢ ⎥ ⎢ 7 −5 10 ⎥ ⎢ ⎥ = ⎢ 19 15 22 ⎥ . 4 11 9 ⎥∨⎣ ⎢ 8 ⎦ 8 4 11 ⎣ −3 2 ⎣ ⎦ 1 −7 9 5 12 ⎦ −1 1 0 −1 1 0 5 8 6 11 The min product of A and B is the matrix C = A ∧ B defined by cij = p ∗ k=1 (aik + bkj ). Similarly, the pointwise minimum A ∧ B of two matrices of the same size is defined as A ∧ B = C, where cij = aij ∧ bij . The two matrix products are collectively referred to as minimax products. While lattice theory and lattice-ordered groups have only marginal connections to the computational aspects of linear algebra, problems noted using the minimax products take on the flavor of problems in linear algebra. By allowing for the minimax matrix products to take on the character of the familiar matrix products, concepts analogous to those in linear algebra such as solutions to systems of equations, linear dependence and independence, rank, seminorms, eigenvalues and eigenvectors, spectral inequalities, and invertible and equivalent matrices can be formulated. For example, the notion of the conjugate of a matrix A = (aij ) with entries in R±∞ is the matrix A∗ = (bij ),
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
171
where bij = [aj i ]∗ and [aj i ]∗ is the dual of aj i defined earlier. Conjugation of matrices then leads to the duality relationships A ∧ B = (A∗ ∨ B ∗ )∗ and A ∧ B = (B ∗ ∨ A∗ )∗ for appropriately sized matrices. Originally, many of these concepts were developed primarily to help solve operations research and image processing types of problems (Shimbel, 1954; Cuninghame-Green, 1960, 1962, 1979; Giffler, 1960; Peteanu, 1967; Benzaken, 1968; Backhouse and Carré, 1975; Carré, 1971; Davidson, 1989; Ritter, 1992; Ritter and Sussner, 1992; Ritter and Wilson, 2001). Our interest in these notions is due to their applicability in the field of associative memories since these can be viewed as transforms Rn → Rm (or Rn±∞ → Rm ±∞ ) of the lattice space (Rn , ∨, ∧) to the lattice space (Rm , ∨, ∧). Closely associated with these transforms is the concept of diagonal dominance. Definition 3.1. An n × n matrix A is said to be diagonally max dominant if and only if it satisfies the condition ajj − aij =
n
(aj k − aik )
∀i = 1, . . . , n.
(15)
k=1
Similarly, A is said to be diagonally min dominant if and only if A satisfies the condition ajj − aij =
n
(aj k − aik )
∀i = 1, . . . , n.
(16)
k=1
In regards to the lattice spaces under discussion, we note that the lattice space (Rn , ∨, ∧) is an -group while the lattice space (Rn±∞ , ∨, ∧) is a blog. In these spaces, vector addition is the group operation and the maximum and minimum of two vectors is defined as the pointwise coordinate maximum or minimum, respectively. Thus, if x = (x1 , x2 , . . . , xn ) ∈ Rn±∞ and y = (y1 , y2 , . . . , yn ) ∈ Rn±∞ , then z = x ∨ y is defined by zi = xi ∨ yi for i = 1, 2, . . . , n, and similarly for z = x ∧ y. In contrast to (R, ∨, ∧) or (R±∞ , ∨, ∧), the lattice spaces (Rn , ∨, ∧) and (Rn±∞ , ∨, ∧) are not totally ordered.
IV. L ATTICE D EPENDENCE AND I NDEPENDENCE The close resemblance between the ring (R, +, ×) and the s-semigroup (R−∞ , ∨, +) [or (R∞ , ∧, +)] observed earlier extends to the vector space (Rn , +) and the lattice space (Rn−∞ , ∨) [or (Rn∞ , ∧)] if we replace the notion of scalar multiplication in the former by scalar addition in the latter. More
172
RITTER AND GADER
precisely, if x = (x1 , . . . , xn ) ∈ Rn−∞ and a ∈ R−∞ , then scalar addition is defined as a + x = (a + x1 , . . . , a + xn ) . The scalar addition a + x is equivalent to the vector addition a + x, where a = (a1 , . . . , an ) is the vector given by ai = a for i = 1, . . . , n. This equivalence allows the use of various well-established properties of -groups and vector lattices. In linear algebra we have the property that for a nonzero vector x ∈ Rn , a · x = 0 if and only if a = 0 and a · 0 = 0 ∀a ∈ R. In the extended lattice space structure (R±∞ , ∨, ∧, +, +∗ ) it is necessary to deal with two null vectors, namely ω = (ω1 , . . . , ωn ) , where ωi = −∞ for i = 1, . . . , n and ω∗ , where ωi∗ = ∞ for i = 1, . . . , n. In analogy with linear algebra we have the property that for a nonnull vector x ∈ Rn±∞ , a + x = ω if and only if a = −∞ and a +∗ x = ω∗ if and only if a = ∞. Similarly, a + ω = ω and a +∗ ω∗ = ω∗ ∀a ∈ R±∞ . However, since our application domain is pattern recognition, which deals with real valued vectors, we restrict our discussion to sets of vectors X = {x1 , . . . , xk } ⊂ Rn±∞ for which xξ ∈ Rn for ξ = 1, . . . , k. With this restriction the operation of scalar addition is selfdual as a + xξ = a +∗ xξ ∀a ∈ R±∞ and for all ξ = 1, . . . , k. Note that under / Rn . these conditions it is still possible to have a + xξ ∈ Rn±∞ with a + xξ ∈ Definition 4.1. If {x1 , . . . , xk } ⊂ Rn , then a linear minimax combination of vectors from the set {x1 , . . . , xk } is any vector x ∈ Rn±∞ of form
x = G x ,...,x 1
k
=
k
aξj + xξ ,
(17)
j ∈J ξ =1
where J is a finite set of indices and aξj ∈ R±∞ ∀j ∈ J and ∀ξ = 1, . . . , k. The expression G(x1 , . . . , xk ) = j ∈J kξ =1 (aξj + xξ ) is called a linear minimax sum. The similarity with linear sums kξ =1 aξ xξ in the vector space (Rn , +) is n (or ) obvious; in the vector lattice space (R±∞ , ∨, ∧) the symbols replace the symbol and scalar addition replaces scalar multiplication. Also, if every scalar in the linear sum is zero, then the linear sum is the zero vector. Similarly, according to Eq. (17), if for every j ∈ J there exists an index ξ ∈ {1, . . . , k} such that αξj = −∞, then x = G(x1 , . . . , xk ) is the null vector ω and if for some j ∈ J the scalar αξj = ∞ ∀ξ = 1, . . . , k, then x = ω∗ , the null vector of (Rn∞ , ∧, +). The set of all linear minimax sums of vectors from {x1 , . . . , xk } is the linear minimax span of the vectors and will be denoted by LMS(x1 , . . . , xk ). The subset of all real valued vectors in LMS(x1 , . . . , xk ) is denoted by LMSR (x1 , . . . , xk ). Hence LMSR (x1 , . . . , xk ) = LMS(x1 , . . . , xk ) ∩ Rn or, equivalently, LMSR (x1 , . . . , xk ) = {x ∈ Rn : x = G(x1 , . . . , xk )}.
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
173
Linear minimax sums provide for the definitions of lattice dependence and lattice independence that are in close analogy with these concepts as defined in linear algebra. Definition 4.2. Suppose X = {x1 , . . . , xk } ⊂ Rn . A vector x ∈ Rn is lattice dependent on X if and only if x = G(x1 , . . . , xk ) for some linear minimax sum of vectors from X. The vector x is said to be lattice independent of X if and only if it is not lattice dependent on X. The set X is said to be lattice independent if and only if ∀λ ∈ {1, . . . , k}, xλ is lattice independent of X\{xλ } = {xξ ∈ X: ξ = λ}. A consequence of Eq. (17) is that any finite expression involving the symbols ∨, ∧, and vectors of form a + xξ , where xξ ∈ X and a ∈ R±∞ is a linear minimax sum. For example, the expression (2 + xγ ) ∨ (−4 + xλ ) is given by j ∈J kξ =1 (aξj + xξ ) if we set J = {1, 2} and 2 if ξ = γ −4 if ξ = λ aξ 1 = and aξ 2 = ∞ if ξ = γ ∞ if ξ = λ. More generally, we have Theorem 4.1. If x = γ ∈A (aγ + xγ ) and y = λ∈B (bλ + xλ ), where A, B ⊂ {1, . . . , k}, then there exists finite indexing sets J and I such that (1) x =
k
aξj + xξ
j ∈J ξ =1
and (2) y =
k
bξ i + xξ . (18)
i∈I ξ =1
Proof. To prove the first equality, simply set J = A and aj if ξ = j aξj = ∞ if ξ = j. The second equality follows trivially when setting I = {1} and bγ if ξ = γ bξ 1 = ∞ if ξ = γ . The next set of equations is a direct consequence of property P3 : k k k (aξ ∨ bξ ) + xξ aξ + x ξ ∨ bξ + x ξ = ξ =1
ξ =1
ξ =1
(19)
174
RITTER AND GADER
k
aξ + x
ξ
∧
ξ =1 k
aξ + x
ξ
∧
aξ + x
ξ
k
bξ + x
ξ =1 k
∨
ξ =1
=
ξ =1 k
bξ + x
ξ
=
ξ =1 k
ξ
bξ + x
ξ
=
ξ =1
k
(aξ ∧ bξ ) + x
ξ
ξ =1 k
(aξ ∧ bξ ) + x
ξ =1 k
(20)
ξ
(21)
(aξ ∨ bξ ) + xξ . (22)
ξ =1
These equations can be generalized to include formal linear minimax sums. Theorem 4.2. If x = G1 (x1 , x2 , . . . , xk ) and y = G2 (x1 , x2 , . . . , xk ), then x ∧ y and x ∨ y are also linear minimax sums. Proof. Let x =
k
ξ ξ =1 (aξj + x ) and define uj = kξ =1 (aξj
j ∈J
y=
k
i∈I
ξ ξ =1 (bξ i + x ). For vi = kξ =1 (bξ i + xξ ).
each j ∈ J and i ∈ I + xξ ) and Using Eqs. (18.1) and (18.2), we obtain j uj ∧ vi = u ∧ vi x∧y= j ∈J
=
aξj + x
j ∈J i∈I
=
=
ξ
∧
ξ =1
k
j ∈J i∈I
j ∈J i∈I
i∈I
k
bξ i + x
(aξj ∧ bξ i ) + x
k
ξ
ξ =1
ξ
ξ =1
k
cξ + xξ ,
∈J ×I ξ =1
where = (j, i) ∈ J × I and cξ = aξj ∧ bξ i . Similarly, j uj ∨ vi = u ∨ vi x∨y= j ∈J
=
aξj + x
j ∈J i∈I
j ∈J i∈I
i∈I
k ξ =1
ξ
∨
k
bξ i + x
ξ =1
ξ
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
=
k j ∈J i∈I
=
k
175
(aξj ∨ bξ i ) + x
ξ
ξ =1
cξ + xξ ,
∈J ×I ξ =1
where = (j, i) ∈ J × I and cξ = aξj ∧ bξ i . As a direct consequence of Theorems 4.1 and 4.2 we have the following corollary. Corollary 4.1. If x = G(x1 , x2 , . . . , xk ) and y = then x ∧ y and x ∨ y are linear minimax sums.
k
i∈I
ξ =1 (bξ i
+ xξ ),
Proof. It follows from Theorem 4.1 that for each i ∈ I , there exists a linear minimax sum k Gi x1 , x 2 , . . . , x k = bξ i + x ξ . ξ =1
1 2 k By repeated application of Theorem 4.2, y = i∈I Gi (x , x , . . . , x ) is equal to a linear minimax sum. Again by Theorem 4.2 we now have that x ∧ y and x ∨ y are linear minimax sums. A consequence of Corollary 4.1 is that either expression j ∈J kξ =1 (aξj + xξ ) or i∈I kξ =1 (bξ i + xξ ) can be chosen for defining a canonical linear minimax sum.
V. L ATTICE A SSOCIATIVE M EMORIES In classical pattern recognition patterns are viewed as column vectors in Euclidean space. Each component of a pattern vector x = {x1 , x2 , . . . , xn } ∈ Rn corresponds to one of the pattern’s features. The numerical value xi of a pattern feature can represent a variety of objects or physical features such as signal strength, curvature, a probability value, mean mass, and so on. One goal in the theory of associative memories is for the memory to recall a stored pattern y ∈ Rm when presented with a pattern x ∈ Rn , where the pattern association expresses some desired pattern correlation. More precisely, suppose X = {x1 , . . . , xk } ⊂ Rn and Y = {y1 , . . . , yk } ⊂ Rm are two sets of pattern vectors with desired association given by the diagonal
176
RITTER AND GADER
{(xξ , yξ ): ξ = 1, . . . , k} of X × Y . The goal is to store these pattern pairs in some memory M such that for ξ = 1, . . . , k, M recalls yξ when presented with the pattern xξ . If such a memory M exists, then we shall express this association symbolically by xξ → M → yξ . Additionally, it is generally desirable for M to be able to recall yξ even when presented with a somewhat corrupted version of xξ . The matrix correlation memories resulting from the work of Kohonen and Hopfield that were mentioned in the Introduction were the earliest ANN approaches for solving this particular problem. These approaches start out with an m × n matrix M defined in terms of the sum of outer products of the associated pattern vectors, namely M=
k
yξ · xξ .
(23)
ξ =1
k ξ ξ It follows that the (i, j )th entry of M is given by mij = ξ =1 yi xj . Furthermore, if the input patterns x1 , . . . , xk are orthonormal, that is if j i 1 if i = j x ·x = (24) 0 if i = j, then M · xξ = yξ [(xξ ) · xξ ] + γ =ξ yγ [(xγ ) · xξ ] = yξ . Thus, we have perfect recall of the output patterns y1 , . . . , yk . If x1 , . . . , xk are not orthonormal (as in most realistic cases), then filtering processes using activation functions become necessary to retrieve the desired output pattern. Lattice-based associative memories are surprisingly similar to these classical linear correlation memories. With each pair of pattern associations (X, Y ) we associate two canonical lattice-based m × n memories WXY and MXY defined by WXY =
k
∗ yξ + xξ and
ξ =1
MXY =
k
∗ yξ + xξ ,
(25)
ξ =1
where the minimax outer product is defined as ⎞ ⎛ y1 − x1 · · · y1 − xn .. .. .. ⎠. y + x∗ = ⎝ . . . ym − x1 · · · ym − xn Accordingly, y + x∗ = y ∨ x∗ = y ∧ x∗ and the i, j th entries of WXY and MXY are given by k k ξ ξ ξ ξ wij = yi − xj and mij = yi − xj , ξ =1
ξ =1
(26)
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
177
respectively. The canonical memories WXY and MXY are also known as morphological memories. Example 5.1. Let 2 1 x = 0 , 2 1 y1 = , 1
0 x = 1 , 1 0 y2 = , 0 2
1 x = 2 , −1 0 2 and y = . 1 3
According to Eq. (25), the memories WXY and MXY are given by −1 1 −1 0 −1 −1 −1 −2 1 WXY = ∧ ∧ −1 1 −1 0 −1 −1 0 −1 2 −1 −2 −1 = . −1 −1 −1 Using the maximum instead of the minimum of the outer products yields the memory 0 1 1 . MXY = 0 1 2 It is easily verified that WXY ∨ xξ = yξ = MXY ∧ xξ holds for ξ = 1, 2, 3. For example 2 1 −1 −2 −1 1 ∨ 0 = WXY ∨ x = = y1 1 −1 −1 −1 2 and MXY ∧ x = 2
0 1 1 0 1 2
0 0 ∧ 1 = = y2 . 0 1
Using either one of the slightly distorted versions x = (2, 0, 1) or x = (1, 0, 2) of x1 , WXY ∨ x = y1 = MXY ∧ x are again obtained. However, when using the vector x = (1, 0, 1) , which could represent a distorted version of either x1 or x2 , WXY ∨ x = y2 and MXY ∧ x = y1 are obtained. The above example raises two obvious questions. The first concerns the existence of perfect recall memories. Specifically, for what set of vector pairs {(x1 , y1 ), . . . , (xk , yk )} will WXY or MXY provide perfect recall? Once this question has been answered, the next logical question is to inquire as to the
178
RITTER AND GADER
amount of distortion or noise WXY or MXY can tolerate for perfect recall; that is, if x˜ ξ denotes a distorted version of xξ , what are the conditions or bounds on x˜ ξ to ensure that WXY ∨ x˜ ξ = yξ or MXY ∧ x˜ ξ = yξ ? The next set of theorems addresses the first question. The following theorems and their corollaries were established in Ritter et al. (1998). Theorem 5.1. Let (X, Y ) denote the associate sets of pattern vector pairs. Whenever there exist perfect recall memories A and B such that A ∨ xξ = yξ and B ∧ xξ = yξ for ξ = 1, . . . , k, then A ≤ WXY ≤ MXY ≤ B
and ∀ξ
WXY ∨ xξ = yξ = MXY ∧ xξ . (27)
In this theorem we use the notion that matrix A is less or equal than a matrix B of the same dimension, denoted by A ≤ B, and A is strictly less than B, denoted by A < B, if and only if for each corresponding entry of these matrices we have that aij ≤ bij and aij < bij , respectively. In this sense, WXY is the least upper bound of all perfect recall memories involving the ∨ operation and MXY is the greatest lower bound of all perfect memories involving the ∧ operation. Furthermore, if there exists perfect recall memories, then the canonical memories are also perfect recall memories. The next theorem and its corollaries answer the existence question of perfect recall memories. Theorem 5.2. WXY is a perfect recall memory for the pattern association (xλ , yλ ) if and only if each row of the matrix [yλ + (xλ )∗ ] − WXY contains a zero entry. Similarly, MXY is a perfect recall memory for the pattern pair (xλ , yλ ) if and only if each row of the matrix MXY − [yλ + (xλ )∗ ] contains a zero entry. As an immediate corollary of this theorem we have the following. Corollary 5.1. WXY is a perfect recall memory for the association (X, Y ) if and only if for each ξ = 1, . . . , k, each row of the matrix [yξ × (xξ )∗ ] − WXY contains a zero entry. Similarly, MXY is a perfect recall memory for (X, Y ) if and only if for each ξ = 1, . . . , k, each row of the matrix MXY − [yξ × (xξ )∗ ] contains a zero entry. If X = Y (i.e., ∀ξ , xξ = yξ ), then the morphological autoassociative memories WXX and MXX are obtained. Since wii =
k ξ =1
ξ xi
ξ − xi
=0=
k
ξ ξ xi − xi = mii ,
ξ =1
(28)
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
179
the diagonals of the memories WXX and MXX consist entirely of zeros. Hence, as a trivial consequence of Corollary 5.1 we have the following. Corollary 5.2. WXX ∨ xξ = xξ = MXX ∧ xξ for each ξ = 1, . . . , k. Note that there is no restriction on the dimension or the number of patterns. As a consequence, lattice-based autoassociative memories do not exhibit the well-known weakness of linear correlation matrix encoding that requires orthogonality of the encoded vectors to guarantee perfect recall. The information storage capacity (number of bits that can be stored and recalled associatively) in a lattice-based autoassociative memory exceeds the respective number of the linear matrix associative memories that were calculated by Palm (1980) and Willshaw et al. (1969). The lattice-based autoassociative memories do not restrict the domain of the vectors to be encoded in any way. In the real number case, the capacity the memory WXX (or MXX ) can be as large as desired. That is, if k denotes the number of distinct patterns of length n to be encoded, then k is allowed to be any positive integer, no matter how large. In fact, as we shall show in the next section, WXX and MXX actually store an infinite number of patterns for any finite set of autoassociations (X, X). Of course, in the binary case, the limit is k = 2n as this is the maximum number of distinct binary patterns of length n. In comparison, McEliece et al. (1987) showed that the asymptotic limit capacity of the Hopfield associative memory is n/2 log n if with high probability the unique fundamental memory is to be recovered, except for a vanishingly small fraction of fundamental memories. Additionally, unlike the Hopfield network, which is a recurrent neural network that uses an activation function at each step and requires a large number of iterations for convergence, the latticebased model provides the final result in one pass through the network without any significant amount of training. Another question relates to the robustness of these memories in the presence of noise. As we shall demonstrate in Section XII, the memories WXX and MXX are extremely robust in the presence of certain types of noise. The reason for this robustness is a consequence of the conclusions of the next theorem. Lemma 5.1. If i, j, ∈ {1, . . . , n}, then wij + wj ≤ wi and mij + mj ≥ mi . Proof. wij =
k
γ γ ξ ξ xi − x j ≤ x i − x j
ξ =1
∀γ = 1, . . . , k
(29)
180
RITTER AND GADER
and wj =
k
γ γ ξ ξ xj − x ≤ x j − x
∀γ = 1, . . . , k.
(30)
ξ =1
Therefore,
γ γ γ γ γ γ wij + wj ≤ xi − xj + xj − x = xi − x
∀γ = 1, . . . , k. ξ ξ Hence, wij + wj ≤ kξ =1 (xi − x ) = wi . The inequality mij + mj ≥ mi is proven in a similar manner.
(31)
Theorem 5.3. WXX is diagonally max dominant and MXX is diagonally min dominant. Proof. By Lemma 5.1 we have that wj − wi ≤ −w ij = wjj − wij for all = 1, . . . , n and for all i = 1, . . . , n. Therefore, n=1 (wj − wi ) = wjj − wij for all i = 1, . . . , n. The proof that MXX is diagonally min dominant is similar. For heteroassociative memories (i.e., X = Y ) perfect recall is not guaranteed even for uncorrupted input patterns as the conditions required by Corollary 5.1 will in many cases not be satisfied. Example 5.2. If
0 x = 0 , 0 1 y1 = , 0 1
0 x = 2 , 4 0 y2 = , 1 2
1 x = 1 , 1 1 y3 = 1 3
is the desired pattern association, then the canonical association memory WXY for (X, Y ) = ({x1 , x2 , x3 }, {y1 , y2 , y3 }) is given by 0 −2 −4 WXY = . 0 −1 −3 For input x1 we obtain WXY
0 0 ∨ 0 = = y1 . 0 0
181
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
Since the matrix 1 1 ∗ − WXY = y + x
1 1 1 0 0 0
−
0 −2 −4 0 −1 −3
=
1 3 5 0 1 3
has no zero entry in row one, the association (x1 , y1 ) violates the conditions of Theorem 5.2.
VI. L ATTICE D EPENDENCE AND F IXED P OINTS The goal of this section is to provide an algebraic characterization of the set of fixed points of WXX and MXX , where X = {x1 , . . . , xk } ⊂ Rn . Given a transform T : Rn → Rn , then x ∈ Rn is called a fixed point of T if and only if T (x) = x. By Corollary 5.2, WXX ∨ x = x = MXX ∧ x for every x ∈ X. Therefore X is a subset of the fixed point set of WXX and MXX . In fact, as the next theorem shows, the transforms WXX and MXX share the same fixed point set. Theorem 6.1.
If x ∈ Rn , then WXX ∨ x = x if and only if MXX ∧ x = x.
Proof. SupposeWXX ∨ x = x. It follows that for i = 1, . . . , n, xi = (WXX ∨ x)i = nj=1 = (wij + xj ). Hence, xi ≥ wij + xj for j = 1, . . . , n. Since this holds for all i = 1, . . . , n, we have that xi ≥ wij + xj
∀i, j ∈ {1, . . . , n}.
(32)
Now MXX ∧ x ≤ x. If equalitydoes not hold, then ∃j ∈ {1, . . . , n} such (mj k + xk ) < xj . Also, for some i ∈ that (MXX ∧ x)j < xj . Hence, nk=1 {1, . . . , n} we must have mj i + xi = nk=1 (mj k + xk ) and, therefore, mj i + xi < xj . Using duality we now obtain xi − xj < −mj i = −
k
k ξ ξ ξ ξ xj − x i = − − xi − xj
ξ =1
=
k
ξ =1
ξ ξ xi − xj = wij .
ξ =1
Hence, xi < wij + xj , which contradicts the inequality (32). Therefore, the equality MXX ∧ x = x must hold. The case MXX ∧ x = x ⇒ WXX ∨ x = x is proven in an analogous fashion.
182
RITTER AND GADER
Observe that the null vector ω ∈ R−∞ is also a fixed point of WXX since (WXX ∨ ω)i = nj=1 [wij + ωj ] = −∞ = ωi ∀i = 1, . . . , n. Similarly, the null vector ω∗ of R∞ is also a fixed point. To avoid these special cases and since pattern recognition is concerned with real valued vectors, we restrict our discussion to fixed points x ∈ Rn . Hence, for the remainder of our discussion we shall let F (X) = {x ∈ Rn : WXX ∨ x = x} denote the fixed point set of WXX and MXX . Theorem 6.2. If x, y ∈ F (X), then (a + x) ∈ F (X), (a + x) ∨ (b + y) ∈ F (X), and (c + x) ∧ (d + y) ∈ F (X) ∀a, b, c, d ∈ R. Proof. Suppose x ∈ F (X) and a ∈ R. Then for i = 1, . . . , n,
WXX ∨ (a + x) i =
n
n
wij + (a + xj ) =
j =1
=a+
a + (wij + xj )
j =1 n
(wij + xj )
by Eq. (6)
j =1
= a + [WXX ∨ x]i = a + xi
since x ∈ F (X).
Therefore, a + x ∈ F (X). Next let z = (a + x) ∨ (b + y) and i ∈ {1, 2, . . . , n}. Using property P3 and Eq. (6) we obtain (WXX ∨ z)i =
n
(wij + zj ) =
j =1
=
n
n j =1
% $ by P3 wij + (a + xj ) ∨ wij + (b + yj )
j =1
=
wij + (a + xj ) ∨ (b + yj )
n
wij + (a + xj )
∨
j =1
= a+
n
wij + (b + yj )
j =1
(wij + xj ) ∨ b +
j =1
n
n
(wij + yj )
j =1
= a + (WXX ∨ x)i ∨ b + (WXX ∨ x)i = (a + xi ) ∨ (b + yi ) = zi .
by Eq. (6)
183
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
Since i was arbitrary, (WXX ∨ z)i = zi ∀i = 1, 2, . . . , n. Hence, WXX ∨ z = z. An analogous argument shows that (c + x) ∧ (d + y) is also a fixed point. It should be obvious that the theorem also holds for a, b, c, d ∈ R±∞ . For instance, if a = ∞, then (a + x) ∨ (b + y) is the null vector of Rn∞ , which by our earlier observation is a fixed point of WXX . It follows that the theory can be easily extended to include the two null vectors of (Rn±∞ , ∨, ∧). A direct consequence of Theorem 6.2 is that [F (X), ∨, ∧] is a sublattice of (Rn , ∨, ∧). The next corollary is another consequence of the theorem. Corollary 6.1. If x ∈ Rn , then x is a fixed point of WXX if and only if x is lattice dependent on X. Proof. Suppose that x = (x1 , . . . , xn ) is a fixed point of WXX . For each ξ j = 1, . . . , n and ξ = 1, . . . , k, define aξj = (xj − xj ). Then
1
G x ,...,x
k
=
n k
aξj + x
ξ
=
j =1 ξ =1
=
n
xj +
j =1
= x1 +
n k j =1 ξ =1
k
ξ −xj
+x
ξ =1 k
ξ −x1
ξ =1
ξ x j − x j + xξ
+x
ξ
ξ x1 k ⎜ ξ ⎜ x2 ⎜
wn1 + x1 = W ∨ x = x.
ξ − x1 ξ − x1
⎞
∨ · · · ∨ xn +
k
−xnξ + x
ξ =1
wn2 + x2
⎛
ξ
⎞ k ⎜ ξ ⎟ ξ⎟ ⎟ ⎜ x2 − x n ⎟ = x1 + ⎟ ∨ · · · ∨ xn + ⎜ ⎟ .. .. ⎝ ⎠ ⎝ ⎠ . . ξ =1 ξ =1 ξ ξ ξ ξ xn − x xn − x n ⎞ ⎛ 1 ⎞ ⎞ ⎛ ⎛ + xn w w11 + x1 w12 + x2 1n ⎜ w21 + x1 ⎟ ⎜ w22 + x2 ⎟ ⎜w +x ⎟ ⎟∨⎜ ⎟ ∨ · · · ∨ ⎜ 2n . n ⎟ =⎜ . . ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ .. .. ..
⎛
ξ
ξ
ξ
x1 − x n
wnn + xn
Thus, x is lattice dependent on X. Conversely, if x ∈ Rn is lattice dependent on X, then x = G(x1 , . . . , xk ) = k ξ ξ ξ =1 (aξj + x ). But each x is a fixed point of WXX . Hence, by j ∈J
184
RITTER AND GADER
Theorem 6.2 every finite linear minimax combination is also a fixed point of WXX . Therefore x is a fixed point of WXX . Another easy corollary is the following. Corollary 6.2. If X = {x1 , . . . , xk } ⊂ Rn and Y = {y1 , . . . , yk } ⊂ Rn , where yξ = aξ +xξ and aξ ∈ R for ξ = 1, . . . , k, then X is lattice independent if and only if Y is lattice independent. Furthermore, F (X) = F (Y ). Proof. Let λ ∈ {1, . . . , k} be arbitrarily chosen. Set Xλ = X\{xλ }, Y λ = Y \{yλ }, and let wij and w ij denote the (i, j )th entry of WXλ Xλ and WY λ Y λ , respectively. Then ξ ξ ξ ξ ξ ξ yi − y j = aξ + x i − a ξ − x j = xi − xj = wij w ij = ξ =λ
ξ =λ
ξ =λ
for i, j ∈ {1, . . . , n}. Therefore, WXλ Xλ = WY λ Y λ .
(33)
Now if X is lattice independent, then xλ is lattice independent of Xλ . But / F (X λ ). For if yλ ∈ F (X λ ), then by Theorem 6.2, −aλ + yλ = then yλ ∈ λ y ∈ F (X λ ), which contradicts Corollary 6.1. Thus, yλ = WY λ Y λ ∨ yλ = WXλ Xλ ∨ yλ , where the last equality follows from Eq. (33). Hence, yλ is lattice independent of Y λ . Since λ was arbitrary, this means that Y is lattice independent. The converse is proven in a likewise fashion. Using an argument identical to the establishment of Eq. (33) proves that WXX = WY Y . Therefore, F (X) = F (Y ). A consequence of Corollary 6.1 is that F (X) = LMSR (x1 , . . . , xk ). This establishes the goal of providing an algebraic characterization of the fixed point set of WXX and MXX . Our next goal is to provide a geometric characterization of F (X). This characterization requires some knowledge of high-dimensional geometry. The geometry of more than three dimensions is a relatively modern branch of mathematics, with rigorous development going no further back than the first part of the nineteenth century. As this branch of mathematics is seldom a part of the standard curriculum in engineering science, we devote the next two sections to an introduction of several notions from n-dimensional geometry. We shall keep the number of definitions and geometric objects as small as possible, but there is a minimum vocabulary that is essential for understanding the geometry of F (X).
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
VII. C ONVEX S ETS AND P OLYTOPES
IN
185
Rn
The algebraic classification of F (X) established in the preceding section conveys little information as to what F (X) looks like nor does it expose explicitly the geometric properties of F (X). The aim of this and the next section is to introduce the main objects that are needed to provide a concise geometric description of F (X). For this reason the two sections consist mostly of definitions and examples. Given two sets X, Y ⊂ Rn , then the join of X and Y will be denoted by X, Y and is defined as % $ X, Y = z ∈ Rn : z = αx + βy, x ∈ X, y ∈ Y, α, β ≥ 0, α + β = 1 . A consequence of this definition is that the join operation is commutative and associative, and that n n i i λi x , λi = 1, λi ≥ 0, x ∈ Xi , X1 , X2 , . . . , Xn = x ∈ R : x = i=1
where X1 , X2 , . . . , Xn is defined inductively as X1 , X2 , . . . , Xn−1 , Xn . If X = {x} is a one-point set, then we simply write x, Y instead of {x}, Y . Similarly, if both X = {x} and Y = {y} are one-point sets, then we let x, y denote the join of X and Y . Note that in this case x, y is simply a line segment in Rn . If X = {x} is a one-point set and Y ⊂ Rn , then x, Y is called a cone with vertex x and base Y if and only if x, y1 ∩x, y2 = x ∀y1 , y2 ∈ Y with y1 = y2 . It follows that the join x, Y is not always a cone, but depends on the properties of the set x, Y . A simplex in Rn is a special type of cone that requires the notion of affine independence. A set of vectors X = {x0 , x1 , . . . , xk } ⊂ Rn is said to be affinely independent if and only if the set of vectors {xi − x0 : i = 1, . . . , k} is linearly independent. A k-simplex σ k ⊂ Rn is defined as the repeated join σ k = x0 , . . . , xk of k + 1 affinely independent points. Thus, a zero simplex σ 0 is simply a point; a 1-simplex σ 1 is a line segment; a 2-simplex σ 2 is a triangle that can be viewed as a cone with the base a line segment, namely x0 , x1 , x2 ; a 3-simplex σ 3 = x0 , x1 , x2 , x3 is a tetrahedron (Figure 1), and so on. If σ k = x0 , . . . , xk , then the points x0 , . . . , xk are called the vertices of σ k . The join of any two vertices xi , xj forms an edge of σ k , while the join xi1 , . . . , xim of any subcollection of m vertices is called an m-dimensional face σ k . Also, since the vertices constitute an independent set of points, the repeated join is equivalent to the repeated cone, and σ k cannot be contained in any subspace of dimension less than k. Recall that a set X ⊂ Rn is convex if and only if for each pair of points x and y in X, the line segment x, y is a subset of X. For a set X that is not
186
RITTER AND GADER
F IGURE 1. Three polyhedra: (1) a tetrahedron or 3-simplex; (2) a parallelepiped; (3) a prism. The 3-simplex (1) is not a prism; both (2) and (3) are prisms, but (3) is not a parallelepiped.
necessarily convex, the convex hull of X, denoted by C(X), is the intersection of all convex sets that contain X. Another well-known fact is that X is convex if and only if X = C(X). The convex hull of a finite number of points is called a convex polytope. For example, if X = {x0 , . . . , xk } ⊂ Rn is a set of affinely independent points, then its convex hull C(X) is the k-simplex x0 , . . . , xk . Hence, simplexes are special types of polytopes. When restricted to R3 and R2 , a convex polytope is usually referred to as a convex polyhedron and a convex polygon, respectively. Prisms are a special type of polyhedra. Suppose P ⊂ R3 is an n-gon. If a line L ⊂ R3 traverses the boundary of P without altering its direction, then it describes a prismatic surface. The three-dimensional region bounded by the prismatic surface and containing P is a prismatic beam. Whenever L passes through a vertex of P , then it becomes an edge of the prismatic beam and when it passes through an edge of P , then it becomes a face of the prismatic beam. Conversely, an n-gon can be interpreted as the compact region obtained from the intersection of a prismatic beam with a plane that cuts all the edges of the prismatic surface. If a second plane is parallel to the first, then it also intersects the prismatic beam and provides for a congruent n-gon. The section of the prismatic beam bounded by the two n-gons is called a prism. The prism thus obtained has 3n edges. It also follows from this description that a prism is a polyhedron with two congruent polygonal faces and all remaining faces are parallelograms. Two types of such prisms are shown in Figure 1. A parallelepiped is a prism whose faces are all parallelograms. The volume V of such a prism is given by the formula V = |a · (b × c)| = |det(a, b, c) |, where the vectors a, is a polytope b, and c are as indicated in Figure 1. In Rn , a parallelepiped n i spanned by n vectors v1 , . . . , vn , with span(v1 , . . . , vn ) = i=1 ti v and 1 n 0 ≤ ti ≤ 1. The volume is given by V = |det(v , . . . , v ) |. Parallelepipeds of dimension n > 3 are more commonly referred to as parallelotopes. Basically, all the above concepts generalize to higher dimensions (n > 3). A prismatic hypersurface in Rn results when a line traverses the boundary of an (n − 1)dimensional polytope in Rn without changing its direction. The n-dimensional
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
187
region bounded by the prismatic hypersurface will again be called a prismatic beam. A section of the prismatic beam between two congruent (n − 1)dimensional polyhedra will be a polytope of dimension n. Such a polytope results when each of two parallel hyperplanes cuts the prismatic beam in all its edges. If only one hyperplane cuts the prismatic beam in all its edges, then the intersection is an (n − 1)-dimensional polyhedron. These concepts will become more exact in the next section where we introduce the notions of hyperplanes, directions, and parallelism for linear subspaces of R n .
VIII. L INEAR S UBSPACES AND O RIENTATION IN Rn If x, y ∈ Rn are distinct points, then the set L(x, y) = {z ∈ Rn : z = λx + (1 − λ)y, λ ∈ R} is a line passing through x and y. A nonempty set E ⊂ Rn is called a linear subspace of Rn if for any two points x, y ∈ E, L(x, y) ⊂ E. According to this definition, linear subspaces are convex. Lines and planes in Rn are obvious examples of linear subspaces of Rn . A hyperplane is a maximal linear subspace of Rn in the sense that it is not a proper subset of any linear subspace except for Rn itself. Linear subspaces should not be confused with vector subspaces of Rn since a line or plane may not pass through the origin. Nevertheless, there is a close connection between vector subspaces and linear subspaces: A nonempty set E ⊂ Rn is a linear subspace of Rn if and only if for any y ∈ E, E − y is a vector subspace of Rn (Critescu, 1977). Here E − y = {z ∈ Rn : z = x − y, x ∈ E}. The set E − y is a shift of E, which contains the origin, and is parallel to E. There are several linear subspaces of Rn that play a vital role in describing the set F (X). One obvious set of such subspaces consists of lines of the form L(x) = {y ∈ Rn : y = a + x, a ∈ R}, where x ∈ Rn is fixed. Another set consists of specific types of hyperplanes. Recall that a hyperplane E in Rn can also be defined as the set of all points x ∈ Rn that satisfy an equation of the form a1 x1 + a2 x2 + · · · + an xn = b,
(34)
where the ai ’s and b are constants and not all the ai ’s are zero. It follows from Eq. (34) that E is an (n − 1)-dimensional linear subspace of Rn . A typical example is the set of all x ∈ Rn satisfying xn = 0. In this case a1 = a2 = · · · = an−1 = 0 = b and an = 1. It will be convenient to associate an orientation with a given hyperplane. For this purpose we will let e1 , . . . , en denote the canonical orthonormal basis for Rn ; that is, each ej is defined by 1 if i = j j ei = 0 if i = j.
188
RITTER AND GADER
Directions in Rn will be specified in terms of unit vectors emanating from the origin 0 with endpoints lying on the (n − 1)-dimensional unit sphere n n−1 n 2 = x∈R : xi = 1 . S i=1
Thus, a direction v is uniquely determined by a system of directional cosines that determines the of v on S n−1 . This system is given by cos θi = ncoordinates i 2 e · v = vi with i=1 cos θi = 1. An oriented hyperplane E(v) is simply a hyperplane E with an associated directional unit vector v that is normal (perpendicular) to E. We note that since the vector −v points in the opposite direction of v (i.e., is the antipode of v on S n−1 ), each hyperplane can be endowed with one of two possible directions. Given two oriented hyperplanes E1 (v) and E1 (w), then E1 and E2 are said to be parallel whenever v = ±w or, equivalently, whenever v · w = ±1. If v · w = ±1, then E1 ∩ E2 is a linear subspace of dimension n − 2. A special case occurs when v · w = 0. In this case E1 and E2 are said to be perpendicular. An oriented hyperplane E(v) separates Rn into two open half-spaces + H (v) and H − (v) that are bounded by E(v). If E is given by Eq. (34), then E can also be expressed in terms of the function f (x) = a1 x1 + a2 x2 + · · · + an xn − b = 0.
(35)
We shall use the convention of identifying H + (v) and H − (v) with the halfspaces {x ∈ Rn : f (x) > 0} and {x ∈ Rn : f (x) < 0}, respectively. The closure of H + (v) is the set H + (v) = {x ∈ Rn : f (x) ≥ 0}. Similarly, H − (v) = {x ∈ Rn : f (x) ≤ 0}. Therefore, H + (v)∩H − (v) = E(v). To reduce the notational burden we shall use E instead of E(v) and H + and H − instead of H + (v) and H − (v), respectively, whenever the associated orientation v is obvious from the discussion. Closed half-spaces play a key role in specifying convex polytopes. This is due to the fact that a bounded set X is a convex polytope if and only if it is the nonempty intersection of a finite number of closed half-spaces (Hadwiger, 1957; Eggleston, 1963; Valentine, 1964). As a consequence, an n-dimensional polytope may be explicitly specified as the set of solutions to a system of linear inequalities Mx ≤ b that provides for the position of the vertices. Here M is a real-valued m × n matrix and b a real-valued m-vector. Suppose X is a convex subset of Rn . An oriented hyperplane E(v) is said to cut X if and only if X ∩ H + (v) = ∅ = X ∩ H − (v). If E(v) intersects the closure of X but does not cut X, then E(v) is said to be a support hyperplane of X. A point x is said to be an extreme point of the convex set X if and only if x ∈ X and there are no two points y, z ∈ X with y = x = z such that
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
189
x ∈ (y, z). In particular, if X is a convex polytope, then x ∈ X is an extreme point if and only if x is a vertex of X. The following example will aid in illuminating this concept. Example 8.1. Let B n denote the n-dimensional unit ball defined by B n = {x ∈ Rn : x12 + · · · + xn2 ≤ 1}. Denoting the boundary of B n by ∂B n , we have ∂B n = S n−1 . If E1 (v) and E2 (v) are the two hyperplanes tangent to S n−1 at the points v and −v, respectively, then E1 (v) and E2 (v) are support hyperplanes of B n . The two points v and −v are extreme points of B n . Figure 2 illustrates this concept for to ascertain that ( n = 2. It+ is also easy − − n = (v) ∩ H (v) and B [H (v) ∩ H (v)]. Bn ⊂ H + v∈S n−1 1 2 1 2 Example 8.1 illustrates the fact that the convex set B n is completely specified in terms of its support hyperplanes. More generally we have that if a closed set X has a nonempty interior, and if at every boundary point of X there exists a support hyperplane for X, then X is convex (Hadwiger, 1957; Eggleston, 1963). In addition to the canonical basis vectors, the directional vectors pertinent to the results presented in this article are the vectors e and v(i, j ) defined by e=
e1 + · · · + en e1 + · · · + en
and v(i, j ) =
ei − e j , ei − ej
(36)
where i < j , 1 ≤ i < n, and 1 < j ≤ n.
F IGURE 2. For n = 2, the (n − 1)-dimensional hyperplanes E1 (v) and E2 (v) are the lines given by the equations x2 =√x1 − 1√and x2 = x1 + 1, respectively, where the orientation v is given by the vector v = (1/ 2, −1/ 2) . The lightly shaded area between the lines corresponds to H1+ (v) ∩ H2− (v), while B 2 is the darker shaded disk. Note that every point of ∂B 2 is an extreme point of B 2 .
190
RITTER AND GADER
In view of the definition of v(i, j ) in Eq. (36) we have that for l = 1, . . . , n, ⎧ √1 ⎪ if l = i ⎪ ⎨ 2 −1 vl (i, j ) = √ if l = j ⎪ ⎪ ⎩ 2 0 if i = l = j and, hence, v(i, j ) · v(i, j ) = 1 1 v(i, j ) · v(i, s) = 2
if j = s
1 since j > s 2 v(i, j ) · v(r, s) = 0 if {i, j } ∩ {r, s} = ∅. v(i, j ) · v(j, s) = −
The next theorem follows from the above listed properties of the directional vectors v(i, j ). Theorem 8.1. Let E1 = E[v(i, j )] and E2 = E[v(r, s)] be two oriented hyperplanes in Rn . If (i, j ) = (r, s), then E1 ∩ E2 is an (n − 2)-dimensional linear subspace of Rn . Furthermore, if {i, j } ∩ {r, s} = ∅, then E1 and E2 are perpendicular, and if (i, j ) = (r, s), then E1 and E2 are parallel. The next four properties are relevant to our discussion. These properties are a direct consequence of the aforementioned definitions and Theorem 8.1. 7.1. E[v(i, j )] is parallel to E0 [v(i, j )] = {x ∈ Rn : xi − xj = 0}. n−1 7.2. There exist i=i i = [n(n − 1)]/2 distinct hyperplanes of type E0 [v(i, j )] in Rn . ( 7.3. Each E0 [v(i, j )] contains the line L(0) and n−1 i=1 E0 [v(i, i+1)] = L(0). 7.4. If l ∈ / {i, j } and r ∈ R, then E0 [v(i, j )] is perpendicular to the hyperplane Er (el ) = {x ∈ Rn : xl = r} since v(i, j ) · el = 0. The above listed properties can be readily visualized in lower dimensions (n ≤ 3). For example, if n = 3, then the plane E0 [v(1, 2)] = {x ∈ R3 : x1 − x2 = 0} is perpendicular to the plane E0 (e3 ) = {x ∈ R3 : x3 = 0} and intersects E0 (e3 ) in the line {x ∈ R3 : x1 = x2 , x3 = 0} (see also Figure 3). Furthermore, E0 [v(1, 2)] ∩ E0 [v(2, 3)] = {x ∈ R3 : x1 = x2 = x3 } = L(0). Property 7.1 implies that E[v(i, j )] = {x ∈ Rn : xj = xi + b} for some constant b and, hence, E[v(i, j )] ∩ E0 (ej ) = {x ∈ Rn : xi + b = 0}. For l ∈ {1, . . . , n}, we define Λ(l) ≡ {1, . . . , n}\{l}. If x ∈ Er (el } is a given point, then for each i ∈ Λ(l) there exists an (n − 2)-dimensional linear subspace Sr (i, l, x) defined by Sr (i, l, x) = {y ∈ Er (el ): yi = xi }. This
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
191
F IGURE 3. The (x1 , x2 )-plane corresponds to the surface E0 (e3 ) containing the point x. The surfaces E[v(1, 3)], E[v(2, 3)], and E[v(1, 2)] are all in general position with respect to E0 (e3 ) but only E[v(1, 2)] is perpendicular to E0 (e3 ). The intersections E[v(1, 3)] ∩ E0 (e3 ) = S0 (1, 3, x) and E[v(2, 3)] ∩ E0 (e3 ) = S0 (2, 3, x) are as indicated. Note also that E[v(1, 3)] ∩ E[v(2, 3)] ∩ E[v(1, 2)] = L(x).
subspace is the intersection of Er (el ) with a hyperplane of type E[v(i, l)] if i < l, or of type E[v(l, i)] whenever l < i. Specifically, if 1 ≤ i < l ≤ n, then the hyperplane E[v(i, l)] = {y ∈ Rn : yi + b = yl }, where b = r − xi satisfies the equation $ % Er el ∩ E v(i, l) = y ∈ Er el : yi + b = r = Sr (i, l, x). (37) Similarly, for 1 ≤ l < i ≤ n we obtain % $ Er el ∩ E v(l, i) = y ∈ Er el : yi + b = r = Sr (i, l, x).
(38)
Figure 3 provides an example of these surfaces and their intersections when the dimension is n = 3. In this figure, x ∈ E0 (e3 ). To reduce notation, we define E[v(i, l)] whenever 1 ≤ i < l ≤ n E(i, l, x) = (39) E[v(l, i)] whenever 1 ≤ l < i n. If x, y ∈ Rn−1 and x ≤ y, then the hyperbox determined by x and y is the set {z ∈ Rn−1 : xi ≤ zi ≤ yi , i = 1, . . . , n − 1}. Thus, the hyperbox is convex, and if x < y, then the hyperbox is an (n − 1)-dimensional parallelotope. Suppose x, y ∈ E0 (el ), x ≤ y, and Bl (x, y) denotes the hyperbox determined by x and y. For each i ∈ Λ(l), S0 (i, l, x) separates E0 (el ) into two open regions. Assuming that x = y, then there exists at least one i such that xi < yi .
192
RITTER AND GADER
F IGURE 4. The hyperbox (rectangle) B3 determined by x and y. Here B3 is shown as a subset of E0 (e3 ), bounded by the lines S0 (i, 3, x) and S0 (i, 3, y), where i ∈ {1, 2}.
For each such i, let U (i, y) denote the open region determined by S0 (i, l, x) that contains y and let H (i, y) denote the closure of U (i, y). If for some i neither of the open regions determined by S0 (i, l, x) contains y, then yi = xi . In this case choose one of the two open regions arbitrarily and let H (i, y) denote its closure. Similarly let H (i, x) be the closure of the open region determined by S0 (i, l, y) that contains x. With these definitions we have that - H (i, x) ∩ H (i, y) . Bl (x, y) = i∈Λ(l)
Thus, Bl (x, y) corresponds to the compact region bounded by the surfaces . i∈Λ(l) [S0 (i, l, x) ∪ S0 (i, l, y)]. Figure 4 provides an example of a hyperbox B3 (x, y) in E0 (e3 ) ⊂ R3 . Orientations for hyperplanes are given in terms of a unique (up to inverse) orthogonal vector. For lower-dimensional linear subspaces, directions are assigned in terms of a set of vectors. A k-dimensional linear subspace E k ⊂ Rn will also be referred to as a k-dimensional plane. For 0 < k ≤ n − 1 we say that a directional vector v ∈ S n−1 is a parallel direction for E k , or a parallel direction affiliated with E k , if for every point x ∈ E k , (x + v) ∈ E k . The set of parallel directions that can be affiliated with E k induces a (k − 1)dimensional subsphere S on S n−1 as illustrated in Example 8.2. This set is empty in case k = 0. A zero-dimensional linear subspace E 0 is simply a point, which has no direction. Two planes E k and E l (k ≤ l) are called parallel if every parallel direction affiliated with E k is also a parallel direction affiliated with E l . Equivalently, if S = S k−1 ∩ S l−1 , where S k−1 and S l−1 denote the subspheres of S n−1 induced by the parallel directions affiliated with E k and E l , respectively, then E k and E l are parallel if and only if S = S k−1 . Hence for parallel linear subspaces, S k−1 ⊂ S l−1 . Obviously, all lines of the form L(x) = {y ∈ Rn : y = a + x, a ∈ R} are parallel
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
193
F IGURE 5. The subsphere ⊂ S 2 induced by the parallel directions affiliated with E(v). √ S√ Note that the vector u = (1/ 2, 1/ 2, 0) ∈ S is a parallel direction affiliated with E(v) and v is orthogonal to u as shown.
one-dimensional linear subspaces with affiliated parallel directions e and −e. If every parallel direction of E k is orthogonal to every parallel direction affiliated with E l , then E k and E l are said to be orthogonal. Note that the definition of orthogonality given here is not synonymous with the notion of perpendicularity. For example, if E1 and E2 are two hyperplanes that are not parallel, then E1 ∩ E2 is an (n − 2)-dimensional linear subspace that is parallel to both E1 and E2 . Therefore, within the confines of the definitions of perpendicular and orthogonal given here, two hyperplanes can be perpendicular but never orthogonal. In general, however, we have that if E k and E l are two planes with k ≤ l, then E k ∩ E l is either empty or a linear space of dimension less or equal to k. Example 8.2. Let n = 3, k = 2, and v = v(1, 2). The sphere S ⊂ S 2 induced by the set of parallel directions affiliated with E(v) is given by S = {x ∈ Rn : x1 = x2 and x12 + x22 + x32 = 1} as shown in Figure 5. The vector v is orthogonal to every vector emanating from the origin 0 and having endpoint on S. If E(v) passes through the origin, then E(v) = {x ∈ R3 : x1 = x2 } and E(v) ∩ S 2 = S. If a, b ∈ R, then every line of form L = {x ∈ R3 : x2 = a −x1 , x3 = b} is a one-dimensional subspace orthogonal to E(v). The planes E(v) and E0 (e3 ) are perpendicular but not orthogonal linear subspaces of R3 .
IX. T HE S HAPE OF F (X) Suppose that X = {x1 , . . . , xk } ⊂ Rn is lattice independent and n > 1. If k = 1, then F (X) = L(x1 ) and F (X) is simply a line with one affiliated parallel direction, namely e if we ignore the opposite direction −e. Thus,
194
RITTER AND GADER
unless otherwise stated, we assume that k > 1. Because of the common parallel direction e affiliated with each line L(x), where x ∈ Rn , we have that L(x) ∩ E0 (el ) = ∅. We denote the point of intersection by x(l). That is, {x(l)} = L(x) ∩ E0 (el ). Now if x ∈ F (X), then by Theorem 6.2, L(x) ⊂ F (X). Hence E0 (el ) separates F (X). By setting Fl (X) = F (X) ∩ E0 (el ), we have $ % Fl (X) = y ∈ E0 el : ∃x ∈ F (X) with x(l) = y . (40) Since x(l) = a + x for some a ∈ R, it follows that / a + Fl (X) , F (X) =
(41)
a∈R
where a + Fl (X) = {y ∈ Rn : y = a + x, x ∈ Fl (X)}. According to Eq. (41), F (X) is simply the union of all translates of Fl (X) in the direction parallel to e. Therefore, by knowing the shape of Fl (X), we also know the shape of F (X). Theorem 9.1.
Fl (X) is bounded.
Proof. Let ul = kξ =1 xξ (l), ml = kξ =1 xξ (l), and Bl denote the hyperbox determined by ul and ml . That is, Bl = Bl (ml , ul ). Figure 4 provides an illustration of this box for the case X ⊂ R3 and l = 3. ξ Let X(l) = {x1 (l), . . . , xk (l)} and for ξ = 1, . . . , k define aξ = −xl so that xξ (l) = aξ + xξ . Thus, if w(l)ij and m(l)ij denote the (i, j )th entry of WX(l)X(l) and MX(l)X(l) , respectively, then wij =
k
ξ xi
ξ =1
=
k
ξ − xj
=
k
ξ ξ a ξ + x i − aξ + x j
ξ =1
ξ ξ xi (l) − xj (l) = w(l)ij
ξ =1
and mij =
k
k ξ ξ ξ ξ xi − x j = aξ + x i − a ξ + x j
ξ =1
=
k
ξ =1
ξ ξ xi (l) − xj (l) = m(l)ij .
ξ =1
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
195
Therefore, WXX = WX(l)X(l) and MXX = MX(l)X(l) . Now, if x ∈ Fl (X), then for i = 1, . . . , n n
xi = [WXX ∨ x]i = [WX(l)X(l) ∨ x]i =
w(l)ij + xj
j
≥ w(l)il + xl =
k
k ξ ξ ξ xi (l), xi (l) − x (l) + xl =
ξ =1
ξ =1
ξ
since xl = xl (l) = 0. Similarly, xi = [MXX ∧ x]i = [MX(l)X(l) ∧ x]i =
n
m(l)ij + xj
j
≤ m(l)il + xl =
k
ξ ξ xi (l) − xl (l) + xl
=
ξ =1
It follows that Fl (X) ⊂ Bl .
k
ξ =1 x
ξ (l)
≤ x ≤
k
ξ
xi (l).
ξ =1
k
ξ =1 x
ξ (l)
and, therefore, x ∈ Bl and
It is obvious from the proof that Fl (X) is bounded in Er (el ) for any r ∈ R and not just r = 0. The hyperbox Bn determined by un = kξ =1 xξ (n) and mn = kξ =1 xξ (n) is bounded by the (n − 2)-dimensional planes S0 (i, n, mn ) and S0 (i, n, un ), where i = 1, . . . , n−1. If the dimension n = 3, then the hyperbox B3 reduces to a rectangle or a line segment as shown in Figure 4. To obtain further notational reduction we set S0 (i, mn ) = S0 (i, n, mn ), where i = 1, . . . , n, and denote the hyperplane E[v(i, n)] containing the point mn by E(i, mn ). Thus, S0 (i, mn ) = E(i, mn ) ∩ E0 (en ). Similarly, we have S0 (i, un ) = E(i, un ) ∩ E0 (en ), where E(i, un ) corresponds to the oriented hyperplane E[v(i, n)] containing the point un . A consequence of these definitions and Theorem 9.1 is that the hyperplanes E(i, mn ) and E(i, un ) are support hyperplanes of F (X). In particular, if HX (i, mn ) and HX (i, un ) denote the closed half-spaces of Rn determined by E(i, mn ) and E(i, un ), respectively, that contain F (X), then F (X) ⊂
/
(a + Bn ) =
a∈R
n−1 -
HX i, un ∩ HX i, mn .
i=1
As the next example shows, generally F (X) =
.
a∈R (a
+ Bn ).
(42)
196
RITTER AND GADER
Example 9.1. If x1 = (8, 8, 7) and x2 = (9, 8, 5) , then x1 (3) = (1, 1, 0) , x2 (3) = (4, 3, 0) , u3 = x2 (3), and m3 = x1 (3). The hyperbox B3 , determined by u3 and m3 , is given by B3 = {x ∈ R3 : 1 ≤ x1 ≤ 4, 1 ≤ x2 ≤ 3, x3 = 0}. The point x ∈ B3 , given by x = (2, 2.5, 0) is not a 0 0 0 11 fixed point of WXX = −1 0 1 since WXX ∨ x = (2.5, 2.5, 0) . Therefore, −4 −3 0 . F (X) = a∈R (a + B3 ). The reason that Eq. (42) does not provide for an equality is that not all possible half-spaces containing F (X) are included in the intersection specified by the equation. The remaining half-spaces needed to obtain equality are derived from support hyperplanes that are perpendicular to E0 (en ). Note that according to Theorem 8.1, the hyperplanes E(i, un ) and E(i, mn ) all intersect E0 (en ) but are not perpendicular to E0 (en ). For l = n, let E(i, ul ) and E(i, ml ) denote the hyperplanes containing ul and ml , respectively, as defined by Eq. (39); i.e., E(i, ul ) = E(i, l, ul ) and E(i, ml ) = E(i, l, ml ). Now, if i = n, then E(i, ul ) = E[v(l, n)] and, hence, corresponds to either E(l, mn ) or E(l, un ) as it is a support hyperplane of F (X) parallel to both of these planes. If i = n, then by Property 7.4 both of the planes E(i, ul ) and E(i, ml ) are perpendicular to E0 (en ). It is these additional hyperplanes that are needed to completely specify the shape of F (X). This situation can be nicely illustrated in low dimensions as shown in Figure 6 of Example 10.1. In this example n = 3, X = {x1 , x2 }, and Figure 6 shows the intersection of E0 (e3 ) with all the pertinent planes E(i, ul ) and E(i, ml ) for i, l ∈ {1, 2, 3} with i = l. Let HX (i, ul ) and HX (i, ml ) denote the half-spaces containing F (X) determined by the support hyperplanes E(i, ul ) and E(i, ml ), respectively. Thus, if B denotes the set n / B= (a + Bl ) ∩ E0 en , (43) l=1 a∈R
then B=
n
HX i, u ∩ HX i, m l
l
2
∩ E0 en .
l=1 i∈Λ(l)
A direct consequence of the next theorem is that F (X) = Theorem 9.2.
.
a∈R (a
(44) + B).
Fn (X) = B.
Proof. Suppose y ∈ Fn (X). If α = −yl , where yl denotes the lth coordinate of y, then α + y ∈ F (X) ∩ E0 (el ) = Fl (X). By Theorem 9.1, α + y ∈ Bl
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
197
. and, therefore, (n .L(y) ⊂ a∈R (a + Bl ) ∀l = 1, . . . , n, which means that L(y) ⊂ l=1 [ a∈R (a + Bl )]. Hence, n / {y} = L(y) ∩ E0 en ⊂ (a + Bl ) ∩ E0 en . l=1 a∈R
Thus, y ∈ B. This shows that Fn (X) ⊂ B. Next, suppose y ∈ B. Set y(l) = −yl + y, vl = yl + ml , and wl = yl + ul . Since y ∈ B and y(l) ∈ E0 (el ), ml ≤ y(l) ≤ ul with mll = yl (l) = ull = 0 for l = 1, . . . , n. Also, vl = yl + ml ≤ yl + y(l) ≤ yl + ul = wl . Thus, since yl + y(l) = y, we now have vl ≤ y ≤ wl ∀l = 1, . . . , n. (45) n l l If v = l=1 v and w = l=1 w , then it follows from Eq. (45) that v ≤ y ≤ w and, therefore, vi ≤ yi ≤ wi for i = 1, . . . , n. But vii = yi + mii = yi = yi + uii = wii and, hence, vi = nl=1 vil = vii = yi = wii = nl=1 wil = wi ∀i. Therefore, v = y = w. ξ Now if we set αξ,l = yl − xl , then n n n k yl + v= vl = xξ (l) yl + ml = n
l=1
=
n k
l=1
αξ,l + xξ .
l=1
ξ =1
(46)
l=1 ξ =1
Equation (46) shows that v and, hence, y is a linear minimax sum over X. It now follows from Corollary 6.1 that y ∈ F (X). But since yn = 0, y ∈ Fn (X) and, therefore, B ⊂ Fn (X). Obviously, an argument similar to the one establishing Eq. (46) could have been used to show that w is a linear minimax sum over X. . Since Bl is a convex parallelotope or hyperbox, Bl = a∈R (a + Bl ) is a parallelotopic beam. Thus, if(Pl = Bl ∩ E0 (en ), then Pl is a convex n parallelotope. l=1 Pl is a convex polytope and since . Therefore, B = F (X) = a∈R (a + B), F (X) is a convex prismatic beam. This provides a complete geometric description of the fixed point set F (X). A consequence of Theorem 9.2 is the following. Corollary 9.1. If X = {x1 , . . . , xk } ⊂ Rn , then F (X) is a convex sublattice of (Rn , ∨, ∧).
198
RITTER AND GADER
X. R EMARKS C ONCERNING THE D IMENSIONALITY OF F (X) The dimension of F (X), denoted by dim F (X), is a pertinent geometric property that was not discussed . (nin the preceding section. Since F (X) = l=1 Pl and each Pl is a convex polytope a∈R (a + B), where B = derived from the hyperbox Bl , dim F (X) is directly linked to the dimensions of the hyperboxes Bl . Each hyperbox Bl is determined by the points ml and ul . Since it is possible that ml ≤ ul , with strict inequality ml < ul not holding, we may encounter cases where Bl ⊂ Rn is not (n − 1)-dimensional (see, for example, Figure 4). This raises several questions concerning the dimensionality of F (X) when X ⊂ Rn consists of k lattice-independent points. For example, if X and Y are lattice-independent subsets of Rn , with each consisting of k vectors, is dim F (X) = dim F (Y )? Can dim F (X) be less than or greater than k? To provide some answers to these questions, it will be instructive to visualize how the support hyperplanes of F (X) and their corresponding halfspaces containing F (X) intersect E0 (en ) as this intersection defines B. Example 10.1. If x1 and x2 are as in Example 9.1, then x1 (3) = (1, 1, 0) , x2 (3) = (4, 3, 0) , x1 (2) = (0, 0, −1) , x2 (2) = (1, 0, −3) , x1 (1) = (0, 0, −1) , x 2 (1) = (0, −1, −4) , and m3 = x1 (3), u3 = x2 (3), m1 = x2 (1), u1 = x(1), while m2 = (0, 0, −3) and u2 = (1, 0, −1) . The hyperboxes Bl , where l = 1, 2, 3, are shown in Figure 6. Since L(x1 ) = L(u1 ) = L(m3 ), L(x2 ) = L(u3 ) = L(m1 ), and [L(m2 ) ∪ L(u2 )] ∩ [L(x1 ) ∪ L(x2 )] = ∅, we have that the set A, defined by A=
3 / $ l % L u ∪ L ml ∩ E0 e3 , l=1
is given by A = {m3 , u3 , m2 (3), u2 (3)}, where m2 (3) = −m23 + m2 and u2 (3) = −u23 + u2 . Observe that S0 (1, u2 ) contains u2 and x2 (2). Hence, the corresponding support hyperplane E(1, u2 ), having the property that E(1, u2 ) ∩ E0 (e2 ) = S0 (1, u2 ), contains the points u2 (3) and x2 (3) (= u3 ) as it contains the lines L(x2 ) and L(u2 ). Similarly, the support hyperplane E(1, m2 ) containing m2 and x1 (2), contains the points m2 (3) and x1 (3) (= m3 ) as illustrated in Figure 6. It is also not difficult to determine that no point outside the shaded region defining B remains fixed under the action of WXX while every point in the shaded region is a point of F (X). Additionally, Figure 6 shows that B = C(A) and that every point of A is an extreme point of the convex set B.
199
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
F IGURE 6. (1–3) The individual hyperboxes (rectangles) Bl ⊂ E0 (el ) for l = 3, 2, and 1, respectively. The convex set B.shown in (4) corresponds to the intersection of E0 (e3 ) with the three rectangular beams Bl = a∈R (a + Bl ) = {y ∈ R3 : y = a + x, x ∈ Bl , a ∈ R}; i.e., ( . B = ( 3l=1 Bl ) ∩ E0 (e3 ) = [ a∈R (a + B2 )] ∩ E0 (e3 ).
Note that in the above example we have B = C(A). This brings up the question as to whether or not we always have that B = C(A), where n / $ l % A= L u ∪ L ml ∩ E0 en .
(47)
l=1
The next example answers this question in the negative. Example 10.2. Suppose x1 = (2, 3, 4, 1) and x2 = (4, 3, 4.5, 2) . Then x1 (4) = (1, 2, 3, 0) , x1 (3) = (−2, −1, 0, −3) , x1 (2) = (−1, 0, 1, −2) , x1 (1) = (0, 1, 2, −1) ,
x2 (4) = (2, 1, 2.5, 0), x2 (3) = (−0.5, −1.5, 0, −2.5) , x2 (2) = (1, 0, 1.5, −1) , x2 (1) = (0, −1, 0.5, −2) .
Thus, ul = xl (l) for l = 1, 2 and m1 = x2 (1), while m2 = x1 (2). Therefore, u1 (4) = m2 (4) = x1 (4) and u2 (4) = m1 (4) = x2 (4). Also, since m3 = (−2, −1.5, 0, −3)
and u3 = (−0.5, −1, 0, −2.5) ,
200
RITTER AND GADER
we obtain m3 (4) = (1, 1.5, 3, 0)
and u3 (4) = (2, 1.5, 2.5, 0) .
Finally, m4 = (1, 1, 2.5, 0)
and
u4 = (2, 2, 3, 0) .
As a result, A = {m2 (4), u2 (4), m3 (4), u3 (4), m4 , u4 }. Note that the plane x3 = 3 in E0 (e4 ) contains the points u4 , m3 (4), and m2 (4), while the plane x3 = 2.5 contains the points m4 , u3 (4), and u2 (4). Let σ12 = u4 , m3 (4), m2 (4) and σ22 = m4 , u3 (4), u2 (4). Thus σ12 and σ22 are triangles in the planes x3 = 3 and x3 = 2.5, respectively. Now consider the / σ12 ∪ σ22 , but point v1 = (2, 1.5, 3, 0) and the join σ12 , σ22 . Obviously v1 ∈ 1 v is a point in the plane x3 = 3 (see Figure 7). We claim that v1 ∈ / σ12 , σ22 . Suppose to the contrary that v1 is a point in the join of the two simplexes. Then there must exist points x ∈ σ12 and y ∈ σ22 such that v1 ∈ x, y. However, the line L = L(x, v1 ) is a subset of the plane x3 = 3. Since L is unique, x, y ⊂ L, which is impossible since y is not a point in the plane x3 = 3. This shows / σ12 , σ22 = m2 (4), u2 (4), m3 (4), u3 (4), m4 , u4 = C(A). On the that v1 ∈
F IGURE 7. The rectangular box B4 determined by m4 and u4 , and the set B as subsets of the three-dimensional hyperplane E0 (e4 ) in R4 . The set B ⊂ B4 is the dark shaded region contained / A but v1 and v2 = (1, 1.5, 2.5, 0) (not shown in the figure) are vertices in B4 . The points v1 , v2 ∈ of the parallelepiped B.
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
201
other hand, v1 = m3 (4) ∨ u2 (4), which means that v1 ∈ F (X) ∩ E0 (e4 ) = F4 (X). Therefore, C(A) = B. The above example shows that the equality C(A) = B does not hold in general. Since B is a convex polytope, it is the convex hull of its vertices. If D denotes the set of vertices of B, then the example shows that A ⊂ D. As can be inferred from Figure 7, v1 is a vertex of B and D = A ∪ {v1 , v2 }, where v2 = (1, 1.5, 2.5, 0) . Thus, B is the convex hull of eight points, namely B = C(D). Another interesting observation concerning Example 10.2 is . (a + B) that B is three-dimensional, which means that F (X) = a∈R is four-dimensional even though X ⊂ R4 contains only two independent points. Similarly, the dimension of F (X) for the two-point set X ⊂ R3 in Example 10.1 is three. As the next theorem indicates, the dimensionality of F (X) depends on the relative positions of the points of X in the space Rn . The dimension of F (X) will be denoted by dim F (X). Theorem 10.1. For each k ∈ {1, . . . , n} there exists a set X ⊂ Rn of k lattice independent vectors such that dim F (X) = k. Proof. For j = 0, 1, . . . , k − 1, define xj ∈ Rn by 1 if i ≤ j j xi = 0 if i > j, where i = 1, . . . , n. Let X = {x0 , . . . , xk−1 }. To show that dim F (X) = k−1 j j k−1 = k, observe that x0 = 0 = k−1 j =0 x and x j =0 x . Therefore, the hyperbox Bn ⊂ E0 (en ) determined by x0 and xk−1 is the standard (k − 1)dimensional unit hypercube {x ∈ Rn : 0 ≤ xi ≤ 1 if 0 ≤ i ≤ k − 1, and xi = 0 if i > k − 1}. Now since B ⊂ Bn , we have that / / dim F (X) = dim (a + B) ≤ dim (a + Bn ) = k. (48) a∈R
a∈R
If σ k−1 denotes the (k−1)-dimensional simplex x0 , x1 , . . . , xk−1 , then since F (X) is convex we have that σ k−1 ⊂ F (X) ∩ E(en ) = B. Therefore, / a + σ k−1 ≤ dim F (X). (49) k = dim a∈R
It follows from Eqs. (48) and (49) that k = dim F (X). It remains to be shown that X is lattice independent. For j = 0, 1, . . . , n, j define Xj = X\{x j }, W j = WXj Xj , and let wim denote the (i, m) entry j j j k−1 j 0 = of W j . Suppose j = 0. Then w1,n j =1 (x1 −xn ). But x1 −xn = 1−0 = 1
202
RITTER AND GADER
for j = 1, . . . , k − 1. Therefore,
W ∨x 0
0
1
n
=
0 w1,m
0 + xm
=
m=1
W 0 ∨ x0
n
0 w1,m > 0 = x10 ,
m=1
and = For the remainder of the proof let Λ = {1, . . . , n}, K = {0, . . . , k − 1}, K(j ) = K\{j }, and Λ(i) = Λ\{i}, where j ∈ K and i ∈ Λ, respectively. Suppose j ∈ K and j > 0. Then j j j wj +1,j + xj = wj +1,j + 1 = xjl +1 − xjl + 1. x0 .
l∈K(j )
Note that if l < j , then xjl +1 − xjl = 0 − 0 = 0 and if l > j , then xjl +1 − xjl = j j 1 − 1 = 0. Therefore, xjl +1 − xjl = 0 for every l ∈ K(j ) and wj +1,j + xj = 1
whenever 1 ≤ j ≤ k − 1. As a consequence we have
W ∨x j
j
j +1
= =
n
j
j
wj +1,m + xm
m=1 j wj +1,j
j j j j wj +1,m + xm > 0 = xj +1 , + xj ∨ m∈Λ(j )
which shows that W j ∨ xj = xj whenever 1 ≤ j ≤ k −1. As we already have shown that this is also true for j = 0, we have that W j ∨ xj = xj for every j ∈ K. Therefore xj ∈ / Fix(Xj ) ∀j ∈ K. In view of Corollary 6.1, xj is lattice j independent of X\{x } ∀j ∈ K. This proves that X is lattice independent. In view of Theorem 10.1 and Example 10.2, it may be reasonable to assume that if X = {x1 , . . . , xk } ⊂ Rn is lattice independent, then k ≤ dim F (X) ≤ n. However, as the next example shows, this inequality need not hold. Example 10.3. Straightforward computation verifies that the set X = {x1 , x2 , x3 , x4 } ⊂ R3 , where x1 = (11, 4, 1) , x2 = (8.5, 7.5, 2) , x3 = (7, −2.5, −1) , and x4 = (8, 5.5, 4) , is lattice independent. This can also be ascertained using Corollary 6.2 and the set X(3) = {xj (3): j = 1, 2, 3, 4} shown in Figure 8. Since dim F (X) cannot exceed the dimension of the space, we have that dim F (X) ≤ 3 < 4 = k. In fact, it.can be easily inferred from Figure 8 that dim F (X) = 3 since F (X) = a∈R (a + F3 (X)) and dim F3 (X) = 2. This example shows that in contrast to linear independence, the number of lattice-independent points in Rn can exceed the dimension of Rn .
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
203
F IGURE 8. The dark shaded region F3 (X) determined by the four lattice independent points x1 (3) = (10, 3, 0), x2 (3) = (6.5, 5.5, 0), x3 (3) = (8, −1.5, 0), and x4 (3) = (4, 1.5, 0).
XI. S TRONG L ATTICE I NDEPENDENCE In the preceding section we observed that the cardinality of a set of lattice independent points in Rn can exceed n, the dimensionality of the space. In fact, as the examples in this and the preceding section demonstrate, it is possible that for set X = {x1 , . . . , xk } ⊂ Rn of lattice independent points to have k < dim F (X) or dim F (X) < k, depending on the relative positions of the points of X in Rn . This is in stark contrast to the spanning properties of linearly independent sets of points as well as affinely independent sets of points in Rn . In view of this undesirable situation it is natural to ask if there exists some minimal set of lattice independent points that generates the same linear minimax span as a given set X. More precisely, given any set X = {x1 , . . . , xk } ⊂ Rn , which may or may not be lattice independent, does there always exist a lattice independent set B = {v1 , . . . , vm } ⊂ Rn such that LMSR (v1 , . . . , vm ) = LMSR (x1 , . . . , xk ) and m ≤ dim F (X) ≤ n? The existence and properties of such a set B are the focus of section. this n For r ∈ R, with r > 0, let Er (e) = {x ∈ Rn : x i=1 i = rn}. Since the hyperplane Er (e) has orientation e, Er (e) is orthogonal to L(x) for any point x ∈ Rn . Therefore, Er (e) cuts F (X). Furthermore, Er (e) is orthogonal to every oriented hyperplane of type E[v(i, j )] and intersects the standard canonical coordinate axes of Rn . More precisely, for j = 1, . . . , n, each point zj ∈ Rn with coordinates rn if i = j j zi = 0 if i = j corresponds to the point of intersection of Er (e) with the ith axis. Figure 9 provides an illustration for the case Er (e) ⊂ R3 . The plane Er (e) is a convenient tool for proving the next theorem.
204
RITTER AND GADER
F IGURE 9. The portion of the plane Er (e) ⊂ R3 representing the intersection of the plane with the positive quadrant of R3 .
Theorem 11.1. If X = {x1 , x2 } ⊂ Rn is lattice independent, then there exist integers j1 , j2 ∈ {1, . . . , n} such that ∀i ∈ {1, . . . , n}, xj21 − xi2 ≤ xj11 − xi1
and xj12 − xi1 ≤ xj22 − xi2 .
(50)
Proof. For k = 1, 2, let {yk } = L(xk )∩Er (e). Thus, yk is of form yk = ak +xk for some ak ∈ R. Observe that yjk − yik = xjk − xik
for i, j ∈ {1, . . . , n}
(51)
and k = 1, 2. Since X is lattice independent, y1 = y2 . Hence, there exists an integer j ∈ {1, . . . , n} such that either yj1 < yj2 or yj2 < yj1 . Suppose without loss of generality that yj2 < yj1 . In this case there must exist another integer l ∈ {1, . . . , n}, with l = j , such that yl1 < yl2 for otherwise yi2 ≤ yil ∀i = 1, . . . , n. But then rn =
n i=1
yi2 =
i∈Λ(j )
yi2 + yj2
yj21 − yi2 . This shows that yj21 − yi2 ≤ yj11 − yi1
∀i ∈ {1, . . . , n}.
An analogous argument shows that yj12 − yi1 ≤ yj22 − yi2
∀i ∈ {1, . . . , n}.
The conclusion now follows from Eq. (51). If xk (j1 ) = −xjk1 + xk for k = 1, 2, then by Theorem 11.1 x k (j1 )i = xik − xjk1 ≤ xi2 − xj21 = x 2 (ji )i
for i = 1, . . . , n.
Thus, x k (j1 )i ≤ x 2 (j1 ). Therefore, uj1 = x2 (j1 ) and mj1 = x1 (j1 ). Similarly, uj2 = x1 (j2 ) and mj2 = x2 (j2 ). Hence, L(uj1 ) = L(mj2 ) and L(uj2 ) = L(mj1 ). It follows that if X = {x1 , x2 } ⊂ Rn is lattice independent, then the number of points in A is bounded above by 2(n − 1). In view of Theorem 10.1 we now have 2 ≤ card(A) ≤ 2(n − 1),
(52)
where card denotes the cardinality. From Example 10.1 we also know that card(A) < card(D) = 8, where D denotes the set of vertices of the polytope B. This leads to the conjecture that in general, card(D) ≤ 2n. In the specific case, where card(X) = 2, we have 2 ≤ dim F (X) ≤ n with dim(X) = n when n = 4 for specific sets of two points. It is a bit suprising that an n-dimensional set of points can be lattice dependent on just two lattice independent points. A fact more pertinent to this section than the observation regarding the dimensionality of F ({x1 , x2 }) is the conclusion of Theorem 11.1, namely, the inequality (51). This inequality is equivalent to the equations xj21
− xi2
=
2 ξ =1
ξ
ξ
xj1 − xi
and
206
RITTER AND GADER
xj11 − xi1 =
2 ξ =1
ξ
ξ
ξ
ξ
ξ
ξ
xj1 − xi
∀i = 1, . . . , n
(53)
and xj12
− xi1
xj22
− xi2
=
2 ξ =1
=
2 ξ =1
xj2 − xi xj2 − xi
and
∀i = 1, . . . , n.
(54)
This set of equations is reminiscent of Eqs. (15) and (16) defining the notion of diagonally max and min dominance for square matrices, a notion that generalizes to sets of vectors. Definition 11.1. A set of vectors {x1 , . . . , xk } ⊂ Rn is said to be max dominant if and only if for every λ ∈ {1, . . . , k} there exists an index jλ ∈ {1, . . . , n} such that xjλλ − xiλ =
k
ξ
ξ
ξ =1
xjλ − xi
∀i ∈ {1, . . . , n}.
(55)
Similarly, X is said to be min dominant if and only if for every λ ∈ {1, . . . , k} there exists an index jλ ∈ {1, . . . , n} such that xjλλ − xiλ =
k ξ =1
ξ
ξ
xjλ − xi
∀i ∈ {1, . . . , n}.
(56)
The notion of max or min dominance is the key to the concept of strong lattice independence. Definition 11.2. A set of lattice independent vectors {x1 , . . . , xk } ⊂ Rn is said to be strongly lattice independent if and only if X is max dominant or min dominant (or both). Equivalently, we have a set {x1 , . . . , xk } ⊂ Rn of lattice independent vectors that is strongly lattice independent if and only if it satisfies the condition that for every λ ∈ {1, . . . , k} there exists an index jλ ∈ {1, . . . , n} such that ξ
ξ
xjλ − xi ≤ xjλλ − xiλ ,
(57)
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
207
∀i ∈ {1, . . . , n} and ∀ξ ∈ {1, . . . , k}, or that for every λ ∈ {1, . . . , k} there exists an index jλ ∈ {1, . . . , n} such that ξ
ξ
xjλ − xi ≥ xjλλ − xiλ ,
(58)
∀i ∈ {1, . . . , n} and ∀ξ ∈ {1, . . . , k}. By Theorem 11.1, any two lattice independent vectors are strongly lattice independent as they satisfy both the max and min dominant condition. As another example we have the following. Theorem 11.2. For each k ∈ {1, . . . , n} there exists a set of vectors X ⊂ Rn such that dim F (X) = k and X is strongly lattice independent. Proof. Let X = {x0 , . . . , xk−1 } denote the set of vectors constructed in the proof of Theorem 10.1. We have already shown that X is lattice independent and that dim F (X) = k. To prove strong lattice independence, we show that X satisfies the max dominant condition. Again let K = {1, . . . , k}, K(j ) = K\{j }, and Λ = {1, . . . , n}. Observe that xn0 − xi0 = 0 for i = 1, . . . , n and for j ∈ K(0) 0 if i > j j j xn − x i = −1 if i ≤ j. Therefore, j
j
xn − xi ≤ xn0 − xi0 If j ∈ K(0), then
∀i ∈ Λ and ∀j ∈ K.
j xj
j − xi
=
and for l < j xjl − xil = j
3
(59)
0 if i ≤ j 1 if i > j,
0 −1
if l < i if i ≤ l.
j
Therefore, xjl − xil ≤ xj − xi ∀i ∈ Λ whenever l < j . Next suppose that j < l. If i ≤ j < l, then j
j
xjl − xil = 0 = xj − xi .
(60)
If j < i ≤ l, then j
j
xjl − xil = 1 − 1 = 0 < 1 = xj − xi .
(61)
If l < i ≤ n, then j
j
xjl − xil = 1 = xj − xi .
(62)
208
RITTER AND GADER
This proves that j
j
xjl − xil ≤ xj − xi
∀i ∈ Λ, ∀j ∈ K(0), and ∀l ∈ K.
(63)
Equations (59) and (63) imply that X is max dominant and, hence, strongly independent. The indices satisfying the inequality (57) created in the proof of the theorem are unique. This holds in general for any strongly lattice independent set of vectors. Theorem 11.3. Let X = {x1 , . . . , xk } ⊂ Rn be strongly lattice independent. If X is max dominant (or min dominant) and for λ ∈ {1, . . . , k}, jλ denotes the index satisfying inequality (57) [or (58)], then jλ = jξ whenever λ = ξ . Proof. Suppose that jλ = jξ and λ = ξ . If X is max dominant, then ξ
ξ
ξ
ξ
ξ
ξ
(64)
ξ
ξ
xjξ − xi = xjλ − xi ≤ xjλλ − xiλ = xjλξ − xiλ ≤ xjξ − xi . ξ
ξ
It follows that xjλλ −xiλ = xjξ −xi ∀i ∈ {1, . . . , n}. Let a = xjλλ −xjλ = xiλ −xi ξ
and observe that a is a constant. This means that a +xi = xiλ ∀i ∈ {1, . . . , n}. Therefore, xλ = a + xξ . This contradicts the lattice independence of X. Thus, we must have jλ = jξ whenever λ = ξ . If X is min dominant, then the proof proceeds in a similar manner with the inequalities in Eq. (64) reversed. It follows from Example 10.3 that the number of lattice independent points in Rn can exceed the dimension of the space. The same does not hold true for strong lattice independence. Corollary 11.1. If X = {x1 , . . . , xk } ⊂ Rn is strongly lattice independent, then k ≤ n. Proof. Suppose n < k. Assuming that X is max dominant, then for each ξ ξ λ = 1, . . . , k there exists an index jλ ∈ {1, . . . , n} such that xjλ − xi ≤ xjλλ − xiλ ∀i ∈ {1, . . . , n}. But since n < k and 1 ≤ jλ ≤ n for each λ ∈ {1, . . . , k}, we must have that for some λ = ξ , jλ = jξ . This contradicts Theorem 11.3. Hence k ≤ n. An analogous argument holds in case X is min dominant. It follows that the lattice independent set defined in Example 10.3 is not strongly lattice independent. Thus, lattice independence does not imply strong lattice independence (except for k = 2). However, as the next
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
209
example shows, there does exist a set of vectors B = {v1 , v2 , v3 } such that LMSR (v1 , v2 , v3 ) = LMSR (x1 , x2 , x3 , x4 ) or, equivalently, F (B) = F (X), where the vectors xξ are the vectors of Example 10.3. Example 11.1. Let X = {x1 , x2 , x3 , x4 } ⊂ R3 be the set of vectors defined in Example 10.3 and let B = {v1 , v2 , v3 } be given by v1 = (10, 0.5, 0) , v2 = (6.5, 5.5, 0) , and v3 = (4, −1.5, 0) . In contrast to the set X(3) = {xξ (3): ξ = 1, 2, 3, 4}, the points of B are all extreme points . of F3 (X) as shown in Figure 10. By Corollary 6.2, F (X) = F [X(3)] = a∈R [a+F3 (X)]. Since vξ ∈ F3 (X) for ξ = 1, 2, 3, F (B) ⊂ F (X). On the other hand, x1 (3) = 4.5 + v3 ∧ v1 ∨ v2 x2 (3) = v2 x3 (3) = −2 + v1 ∨ v3 x4 (3) = −4 + v2 ∨ v3 . Thus, xξ (3) ∈ F (B) for ξ ∈ {1, 2, 3, 4} and, hence, F (X) = F [X(3)] ⊂ F (B). Therefore, F (X) = F (B). It is easy to ascertain that B is lattice independent. To show that B is strongly lattice independent, set jλ = λ for λ = 1, 2, and 3. It follows that ξ ξ vjλ − vi ≤ vjλλ − viλ for i = 1, 2, 3 and ξ = 1, 2, 3. Thus, B is max dominant and, therefore, strongly lattice independent. Similarly, the set A = {w1 , w2 , w3 }, where w1 = (4, 3, 0) , w2 = x3 (3), and w3 = u3 = 4ξ =1 xξ (3), is lattice independent, min dominant, satisfies the equality F (A) = F (X), and is a subset of the set of extreme points of F3 (X). Several additional conclusions are also a direct consequence of this example. First, the example shows that in contrast to linear independence, the number of lattice independent points in Rn can exceed the dimension of Rn . In fact, if we set x1 = (10, 3, 0, 0) , x2 = (6.5, 5.5, 0, 0) , x3 = (8, −1.5, 0, 0) , and x4 = (4, 1.5, 0, 0) , then X = {x1 , x2 , x3 , x4 } ⊂ R4 is lattice independent in R4 and F4 (X) coincides with the set F3 (X) shown in Figure . 10. Thus, F4 (X) is two-dimensional and, therefore, dim F (X) = dim{ a∈R [a + F4 (X)]} = 3 < k = n = 4. This means that for a set X of k lattice independent points in Rn it is possible to have k < dim F (X) or dim F (X) < k. Another interesting observation is the location of the strongly lattice independent points v1 , v2 , and v3 or w1 , w2 , and w3 . The location of these sets of points indicates a scheme for deriving a set of strongly lattice independent points by selecting certain extreme points {v1 , . . . , vk } from the set Fn (X) with the property that F (X) = F ({v1 , . . . , vk }). As
210
RITTER AND GADER
F IGURE 10. The dark shaded region corresponds to F3 (X). The set B = {v1 , v2 , v3 } is a subset of the set of extreme points ofthe convex set F3 (X) as well as of F (X). Furthermore, v1 ∨ v2 = 4ξ =1 xξ (3) = u3 and v3 = 4ξ =1 xξ (3).
a final observation we note that the set {x1 (3), x2 (3), x3 (3)} is affinely independent but not strongly lattice independent. Hence, affine independence does not imply strong lattice independence. The major remaining goal of this section is devoted to proving the converse, namely, the fact that strong lattice independence implies affine independence. In the subsequent lemmas and theorem, we suppose that the set X = {x1 , . . . , xk } ⊂ Rn is strongly lattice independent and that for each xλ ∈ X, jλ denotes the coordinate index satisfying inequality (57). However, it should be obvious from the ensuing discussion that the same results are obtained if one assumes that for each xλ ∈ X, jλ denotes the coordinate index satisfying inequality (58). Also, to reduce notational complexity, let Eλ = Ex λ (ejλ ). jλ
Lemma 11.1.
If ξ = λ, then {xξ , xλ } Eξ ∩ Eλ .
Proof. Suppose xξ , xλ ∈ Eξ ∩ Eλ . Since x ∈ Eξ ∩ Eλ if and only if ξ xjξ if i = jξ xi = xjλλ if i = jλ , ξ
we have xjλξ = xjλξ and xjλλ = xjλ . But then ξ
ξ
ξ
xjξ − xi ≥ xjλξ − xiλ = xjξ − xiλ
∀i = 1, 2, . . . , n,
ξ
or xiλ ≥ xi for i = 1, 2, . . . , n. Similarly, ξ
ξ
ξ
xjλλ − xiλ ≥ xjλ − xi = xjλλ − xi
∀i = 1, 2, . . . , n,
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES ξ
211
ξ
or xi ≥ xiλ for i = 1, 2, . . . , n. Therefore, xiλ = xi for i = 1, 2, . . . , n, and xξ = xλ , which contradicts the fact that λ = ξ . Lemma 11.2. Let λ ∈ {1, 2, . . . , k}. If xξ ∈ Eλ ∀ξ ∈ {1, 2, . . . , k}\{λ}, then xλ = kξ =1 xξ . ξ
Proof. Since xξ ∈ Eλ ∀ξ ∈ {1, 2, . . . , k}, we have xjλ = xjλλ ∀ξ ∈ {1, 2, . . . , k} and, hence, ξ
ξ
ξ
xjλλ − xi = xjλ − xi ≤ xjλλ − xiλ
∀i = 1, 2, . . . , n, and ∀ξ ∈ {1, 2, . . . , k}.
ξ
But this implies that xiλ ≤ xi for i = 1, 2, . . . , n and ∀ξ ∈ {1, 2, . . . , k}, which verifies the conclusion. With these two lemmas and the fact that any nonempty subset of a strongly lattice independent set is also strongly lattice independent we are now able to prove the major goal of this section. Theorem 11.4. If X = {x1 , . . . , xk } ⊂ Rn is strongly lattice independent, then X is affinely independent. Proof. Obviously, any two lattice independent points are strongly lattice independent as well as affinely independent. Next we show that if k = 3, then x1 , x2 , and x3 cannot all lie on the same line L determined by a given pair of points xξ and xλ , where ξ, λ ∈ {1, 2, 3}. Suppose the contrary, that all three points lie on L. Then one of the points must lie between the other two. Suppose, without loss of generality, that x3 lies between x1 and x2 . Let Y = {x1 , x2 }. Since F (Y ) is convex, the line segment x1 , x2 is a subset of F (Y ) and, hence, x3 ∈ F (X). In view of Corollary 6.1, this contradicts the fact that X is lattice independent. The remainder of the proof proceeds by induction. We assume that the conclusion of the theorem holds for any set in Rn containing k − 1 strongly lattice independent vectors for some integer k − 1 ≥ 3 and that X = {x1 , . . . , xk } is strongly lattice independent. We divide the proof into two possible cases. First, suppose that for some η ∈ {1, . . . , k}, X ⊂ Eη . By induction hypothesis, the set of vectors Y = X\{xη } is affine independent. Let σ k−2 denote the (k − 2)-dimensional simplex spanned by the vectors of Y and let L(σ k−2 ) denote the (k−2)-dimensional linear subspace of Rn spanned by σ k−2 , that is, by the vectors of Y . Assume that xη ∈ L(σ k−2 ) or, equivalently, that xη is an affine combination of the vectors of Y so that X is not affine independent. Since X ⊂ Eη , we
212
RITTER AND GADER
have L(σ k−2 ) ⊂ Eη . Furthermore, by Lemma 11.2, xη = kξ =1 xξ . Now let Fξ be a hyperplane parallel to Eξ and containing xη for ξ = η. Obviously Fξ = Eξ , for otherwise {xξ , xη } ⊂ Eξ ∩ Eη , which contradicts Lemma 11.1. Let Lξ = Fξ ∩ L(σ k−2 ). Then xη ∈ Lξ for ξ = {1, 2, . . . , k}\{η}, and each Lξ is a (k − 3)-dimensional linear subspace of the (k − 2)-dimensional linear subspace L(σ k−2 ). Thus, each Lξ separates L(σ k−2 ). Since xη ∈ Lξ ∩ Lγ for any pair ξ, γ ∈ {1, 2, . . . , k}\{η} and Lξ = Lγ whenever ξ = γ , then ( Lξ ∩ Lγ is of dimension less or equal to k − 4. Inductively, dim kξ =η Lξ = ( k − [(k − 1) + 2] = −1, which is impossible since xη ∈ kξ =η Lξ means that (k η k−2 ) cannot be true, ξ =η Lξ = ∅. Therefore, our assumption that x ∈ L(σ which means that X is affinely independent. Next suppose that for ξ ∈ {1, 2, . . . , k}, X Eξ and that X = {x1 , . . . , xk } is not affinely independent. Under this assumption then for an index η ∈ {1, 2, . . . , k}, xη is an affine combination of the vectors of Y = X\{xη }. Again, let σ k−2 denote the (k − 1)-dimensional simplex spanned by the vectors of Y , let L(σ k−2 ) denote the (k − 1)-dimensional linear subspace of Rn spanned by σ k−2 , and assume that xη ∈ L(σ k−2 ). Similarly, let Fξ be a hyperplane parallel to Eξ and containing xη . Obviously, Fη = Eη , but it is also possible that for some ξ = η, Fξ = Eξ . As before, let Lξ = Fξ ∩L(σ k−2 ) ( and observe that xη ∈ Lξ for ξ = {1, 2, . . . , k}. Hence, xη ∈ kξ =1 Lξ = ∅. Since L(σ k−2 ) Fξ for ξ = {1, 2, . . . , k}, Lξ is a (k − 3)-dimensional linear ( subspace of L(σ k−2 ). It follows that dim kξ =1 Lξ = k − (k + 2) = −2, ( which contradicts the fact that kξ =1 Lξ = ∅. Thus our assumption that X is not affinely independent is false. We observed earlier that if X = {x1 , . . . , xk } ⊂ Rn is lattice independent, then it is possible to have k < dim F (X) or dim F (X) < k, depending on the relative geometric position of the points of X in Rn . As the next corollary shows, this undesirable situation does not exist if X is strongly independent. Corollary 11.2. If X = {x1 , . . . , xk } ⊂ Rn is strongly lattice independent, then k ≤ dim F (X) ≤ n. Proof. For λ = 1, . . . , k, let jλ denote the coordinate index of xλ satisfying inequality (57). It follows from Corollary 6.2 that the set Y = {x1 (n), . . . , xk (n)} ⊂ En (e) is lattice independent. Furthermore, ξ ξ ξ ξ ξ ξ xjλ (n) − xi (n) = −xnξ + xjλ − −xnξ + xi = xjλ − xi ≤ xjλλ − xiλ = −xnλ + xjλλ − −xnλ + xiλ = xjλλ (n) − xiλ (n)
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
213
∀i = 1, 2, . . . , n and ξ = 1, . . . , k. This proves that Y is strongly lattice independent. Thus, by Theorem 11.4, Y is affine independent. Therefore, the convex of Y is a (k − 1)-dimensional simplex σ . Since Fn (X) is convex, σ ⊂ Fn (X). Hence, k − 1 = dim(σ ) < dim Fn (X) and, therefore, / / 2 a + Fn (X) = dim F (X) ≤ n. k = dim (a + σ ) ≤ dim a∈R
a∈R
Although the theorem does not specify the dimension of F (X), it does indicate the possible range of the dimension. The exact dimension is dependent on the location of the points of X in Rn . For example, suppose X = {x1 , x2 } ⊂ R3 , with x1 = x2 . We claim that dim F (X) = 2 if and only if X ⊂ E[v(i, j )] for some (i, j ) with 1 ≤ i < j ≤ 3. To prove this, suppose X ⊂ E[v(i, j )] and consider the set Y = {y1 , y2 } ⊂ E0 (ej ), where yk = xk (j ) = −xjk + xk for k = 1, 2. Assume without loss of generality that j = 3 and i = 1. Under these assumptions, y11 = y12 = c for some c ∈ R, y22 = a + y21 for some a ∈ R, and y31 = y32 = 0. It follows that 1 1 B3 (y1 , y2 ) = (c, . y2 , 0) , (c, a + y2 , 0) = F3 (X). Thus, dim F3 (X) = 1 and, hence, dim{ a∈R [a + F3 (X)]} = 2. The remaining cases are analogous. If {x1 , x2 } is not a subset of E[v(i, j )] for any (i, j ) with 1 ≤ i < j ≤ 3, let yk = xk (3) = −x3k + xk for k = 1, 2. Then y11 = y12 for X ⊂ E[v(1, 3)]. Assume that y11 < y12 and let c = y12 − y11 . We also have y21 = y22 , for X ⊂ E[v(2, 3)]. Hence, y21 < y22 or y22 < y21 . Suppose that y21 < y22 . If d = y22 −y21 , then c = d, for y2 = (c + y11 , c + y21 , 0) , which means that X ⊂ E[v(1, 2)]. We now have two possibilities, namely either d < c or c < d. Suppose that d < c. In this case y11 = y12 − c < y12 − d and, therefore, (y12 − d, y22 − d, −d) ∨ (y11 , y21 , 0) = (y12 − d, y21 , 0) ∈ F3 (X) with (y12 − d, y21 , 0) not on the line determined by y1 and y2 . Thus, σ = (y12 − d, y21 , 0) , y1 , y2 is a 2-simplex with σ ⊂ F3 (X). It follows that dim F (X) = 3. All the remaining cases can be argued in a similar fashion. This proves our claim.
XII. PATTERN R ECONSTRUCTION FROM N OISY I NPUTS The theorems listed in Section V addressed the issue of pattern recall for lattice associative memories when the input patterns are not corrupted by noise or missing data. We now consider the case where inputs are distorted versions of the exemplar inputs imprinted on the memories WXX and MXX . Lattice-based associative memories are extremely robust in the presence of certain types of noise. Specifically, a distorted version x˜ γ of a pattern xγ is said to have undergone an erosive change whenever x˜ γ < xγ and a dilative
214
RITTER AND GADER
change whenever x˜ γ ≥ xγ . To illustrate the behavior of these memories in the presence of erosive and dilative noise, we provide two visual examples. The first example illustrates the behavior for binary patterns, while the second example illustrates the behavior for real-valued patterns. In both cases the patterns were derived from digital images by converting the image arrays into vectors using the usual row-scanning order. More specifically, given an m × n image p, we define a corresponding (m × n)2 -dimensional vector x by xn(i−1)+j = p(i, j )
for i = 1, . . . , m, and j = 1, . . . , n.
Thus, if i = 1, we obtain the coordinate entries xj = p(1, j ) for j = 1, . . . , n, if i = 2, we obtain xn+j = p(2, j ) for j = 1, . . . , n, and so on. For our first example we used the 10 boolean image patterns of size 18 × 18 shown in Figure 11. Using the row-scan conversion, we converted these ξ images into binary pattern vectors by setting x18(i−1)+j =1 = 1 if pξ (i, j ) was ξ
a black pixel (i.e., pixel of low intensity) and x18(i−1)+j =1 = 0 if pξ (i, j ) was a white pixel. For uncorrupted input, perfect recall is guaranteed if we use the memories WXX or MXX (Corollary 5.2). We first corrupted each pattern xξ by randomly selecting 13% of its coordinates and using the following dilation ξ ξ procedure: If xi is one of the selected coordinates and xi = 0, then set its new ξ value to xi = 1, otherwise keep its original value. The top row of Figure 12 illustrates the result of this assignment for the first 5 patterns. Presenting all 10 patterns corrupted in this manner to the memory MXX resulted in perfect recall. However, presenting WXX with these dilatively corrupted patterns resulted in complete recall failure. ξ ξ Using the same methodology but changing xi = 1 into xi = 0 corresponds to erosive noise (or loss of data). The top row of Figure 13 shows 5 (ξ = 6, . . . , 10) of the 10 corrupted patterns xξ . Presenting all 10 of the corrupted patterns to the memory WXX results in perfect recall, while MXX is unable to recall any one of the corrupted patterns.
F IGURE 11. The 10 image patterns pξ . The top row shows the first 5 patterns ξ = 1, . . . , 5, going from left to right. The bottom row represents the remaining 5 patterns, ξ = 6, . . . , 10, again starting with the left-most pattern and going to the right.
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
215
F IGURE 12. The top row shows the first 5 of the 10 image patterns corrupted by additive (dilative) noise. The bottom row shows the output of the memory MXX when presented with the corresponding corrupted patterns shown in the top row.
F IGURE 13. The top row shows the last 5 patterns, ξ = 6, . . . , 10, of the 10 patterns corrupted using erosive (or subtractive) noise. The bottom row shows the output of the memory WXX when presented with the corresponding patterns in the top row.
Another experiment consisted of nonrandom removal of data. Here we simply changed a set of nonzero coordinate values of our choosing into zero values. Figure 14 serves as an example of such nonrandom changes of the pattern representing the letter “X” (ξ = 5). In this case, even with 75% of the pattern missing, the memory WXX provides perfect recall as shown in the figure. In most practical pattern recognition problems, patterns generally consist of positive real-valued patterns although, occasionally, negative values are also being used. In these problems, each vector coordinate represents some measured pattern feature value of an object such as height, roundness, smoothness, age, etc., expressed by a numerical value within a range of possible values for that particular feature. Thus, it is important to remember that patterns are vectors and, generally, not images. Hence, image processingrelated techniques and representations such as rotations, translation, scaling, and smoothing have no direct meaning within this particular framework. To avoid misleading the reader we need to point out that pattern recognition applied to whole images is also an active area of research, in particular in such endeavors as face recognition. However, in this article we restrict our discussion to vectorial pattern recognition, with each vector coordinate
216
RITTER AND GADER
F IGURE 14. The top row shows the image pattern “X” (ξ = 5) corrupted by three types of nonrandom erosive noise. The bottom row shows the output of the memory WXX when presented with the corresponding pattern in the top row.
F IGURE 15. The top row shows the image pattern “X” (ξ = 5) corrupted by three types of nonrandom dilative noise. The bottom row shows the output of the memory MXX when presented with the corresponding pattern in the top row.
representing a particular pattern feature as described above. The reason we are using images is for illustration purposes. Visual examples often purvey a better understanding and aid in illuminating the strengths and weaknesses of lattice-based associative memories. With this in mind, we will now take a look at the behavior of WXX and MXX when confronted with real-valued noisy patterns. For a visual example consider the three image patterns p1 , p2 , and p3 shown in the top row of Figure 16. Each pξ is a 256-gray scale 50 × 50 pixel image. Using the row-scan method, each pattern image pξ was converted into its ξ ξ corresponding vector format xξ = (x1 , . . . , x2500 ). Corrupting each pattern xξ with 50% of randomly generated erosive and dilative noise within the range of [0, 255] results in almost perfect recall when using the memories WXX
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
217
F IGURE 16. The top row shows the three original patterns; the second row shows these patterns corrupted by 50% erosive noise in the range [0, 255]. The third row is the output of the memory WXX when presented with the corrupted patterns, while the bottom row shows the output of the memory MXX when presented with the erosively corrupted patterns.
and MXX , respectively. In this experiment for the normalized mean square error (NMSE) we get NMSE < 10−3 , where the NMSE is computed for each ξ ξ ξ ξ as j (x˜j − xj )2 / j (xj )2 . In the three cases the NMSE was close to zero but not equal to zero and visually not detectable. The results of this experiment are shown in Figures 16 and 17. The small nonzero error is due to the set of fixed points of WXX and the location of xξ in R2500 relative to F (X), an issue to be discussed in Example 12.1. The reason for the robustness of WXX and MXX in the presence of erosive and dilative noise, respectively, is a consequence of the next theorem, which was proven in Ritter et al. (1998). Theorem 12.1. Suppose that x˜ γ denotes an eroded version of xγ . The equation WXX ∨ x˜ γ = xγ holds if and only if for each row index i ∈ {1, . . . , n} there exists a column index ji ∈ {1, . . . , n} such that γ γ γ ξ ξ xi − xi + xji . (65) x˜ji = xji ∨ ξ =γ
218
RITTER AND GADER
F IGURE 17. The top row shows the three original patterns; the second row shows these patterns corrupted by 50% dilative noise in the range [0, 255]. The third row is the output of the memory WXX when presented with the corrupted patterns, while the bottom row shows the output of the memory MXX when presented with the corrupted patterns.
Similarly, if x˜ γ denotes a dilated version of xγ , then the equation MXX ∧ x˜ γ = xγ holds if and only if for each row index i ∈ {1, . . . , n} there exists a column index ji ∈ {1, . . . , n} such that γ γ γ ξ ξ (66) x˜ji = xji ∧ xi − xi + xji . ξ =γ
Although this theorem provides necessary and sufficient conditions for the bounds of corruption of the pattern xγ that guarantees perfect recall, it also implies that WXX will fail miserably if dilative noise not satisfying these bounds is present. The third row of Figure 17 provides a good example of such catastrophic failure. Our experiments have shown that insertion of only minute amounts of dilative noise often results in complete recall failure. Similar comments hold for the memory MXX and erosive noise and the bottom row of Figure 16 illustrates the recall failure for MXX in the presence of erosive noise. Hence, neither memory WXX nor MXX is useful in the presence of random noise, which, generally, consists of of both erosive as well as dilative
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
219
F IGURE 18. The top row shows the three original patterns; the second row shows these patterns corrupted by 50% random noise in the range [−255, 255]. The third row is the output of the memory WXX when presented with the corrupted patterns, while the bottom row shows the output of the memory MXX when presented with the corrupted patterns.
noise. Figure 18 illustrates this failure when the memories WXX and MXX are presented with the randomly corrupted image patterns x1 , x2 , and x3 . The following simple example serves to illuminate the inherent weaknesses of the memories WXX and MXX . Example 12.1. Let x1 = (3, 2) , x2 = (5.5, 0.5) , x3 = (6, 4.5) , and x4 = (8, 3) . If X = {x1 , x2 , x3 , x4 }, then F (X) = F ({x1 , x2 }) = F ({x1 , x4 }) (see Figure 19). Next let x˜ 1 = (0.5, 2) and x = (1, 1.5) represent two distorted versions of x1 containing only erosive noise, x˜ 2 = (5.8, 1) a (dilative) distorted version of x2 , x˜ 3 = (3, 4.5) an eroded version of x3 , and x˜ 4 = (8.5, 2) a randomly distorted version of x4 containing both dilative and erosive noise. Figure 19 shows the of the distorted patterns under the 0orbits 1 action of the transform WXX = −5 0 . Specifically, WXX ∨ x˜ 1 = x1 while WXX ∨ x = (2.5, 1.5) < x1 . Thus, an eroded version of a pattern xξ may not be mapped to xξ even though WXX seems to be robust in the presence of erosive noise. This explains the small NMSE previously mentioned in reference to Figure 16.
220
RITTER AND GADER
F IGURE 19. The orbits of points when using the transform WXX : The set of fixed points of WXX is indicated by the shaded area, while the points resting at the end of the arrows are transformed under the action of WXX to points at the tips of the arrows.
Since x˜ 2 ∈ F (X), WXX ∨ x˜ 2 = x˜ 2 = MXX ∧ x˜ 2 . Hence, even a minutely distorted version x˜ 2 of x2 will not be recovered by these lattice matrix memories whenever x˜ 2 ∈ F (X). The point x3 is in the interior of F (X). Thus any distorted version of x˜ 3 , no matter how small the distortion, will ever be correctly recalled. Figure 19 indicates this for a distorted version of x˜ 3 . In this particular example, WXX ∨ x˜ 3 = (5.5, 4.5) ≤ x3 while MXX ∧ x˜ 3 = x1 . On the other hand, WXX ∨ x˜ 4 = (8.5, 3.5) > x4 while MXX ∧ x˜ 4 = (7, 2) < x4 . This illustrates the fact that the coordinate values of a point x ∈ / F (X) increase under the application of WXX and decrease under the application of MXX . This explains the intensity changes in Figures 16, 17, and 18, where the light images correspond to high intensity values and the dark images to low intensity values. This simple example clearly demonstrates that the recall capabilities of the memories WXX and MXX are actually very poor in the presence of any type of noise, be it erosive, dilative, or random. It is, therefore, somewhat surprising to observe the robustness displayed by WXX in the presence of erosive noise (Figures 13, 14, and 16) and that of MXX in the presence of dilative noise (Figures 12, 15, and 17). The explanation for this phenomenon is actually fairly simple. Note that in Example 12.1 the transform WXX will map any point lying on the infinite line segments L1 = {(x, 2): x ≤ 3}, L2 = {(5.5, y): y ≤ 0.5}, and L4 = {(8, y): y ≤ 3} to the point x1 , x2 , and x4 , respectively, while any point x ∈ / Lξ will not be mapped to xξ . Similarly, the points on the line segments L1 = {(3, y): y ≥ 2}, L2 = {(x, 0.5): x ≥ 5.5}, and L4 = {(x, 3): x ≥ 8} will be mapped by MXX to the points x1 , x2 , and x4 , ξ ξ respectively. This means that if a pattern point xξ = (x1 , x2 ) lies on the
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
221
boundary of F (X), then only one of its coordinates can be eroded (or dilated) for WXX (or MXX ) to be able to recover the eroded (dilated) version of xξ . Furthermore, the eroded or dilated version cannot be an element of F (X). In contrast to the two-dimensional vector patterns in Example 12.1, in Figure 16 each vector pattern xξ is 2500-dimensional. In analogy to the line segments Lξ ⊂ R2 in Example 12.1, we now have infinite hyperplane sections of dimension 2499, which allow for far more vector coordinate distortions. But the problems encountered in the two-dimensional example also exist in the higher-dimensional spaces. If a distorted version is not a point in the correct hyperplane section or lies in the fixed point set generated by the exemplar patterns, then recovery cannot be achieved when using the lattice matrix memories. Furthermore, correct recall of randomly distorted versions of the exemplar patterns is generally not achievable by these memories.
XIII. K ERNEL V ECTORS To solve the problem of randomly distorted input patterns, Ritter et al. (1998, 1999, 2003b) proposed a method based on the notion of kernel vectors. The kernel method does not apply directly to Example 12.1 since the requirement for the existence of kernel vectors is not met by the set X = {x1 , x2 , x3 , x4 } given in the example. However, the concept of kernel vectors can be generalized to apply to any set of pattern vectors X = {x1 , . . . , xk } ⊂ Rn . This generalization also exposes the inherent weakness of the kernel method. The simplest generalization of the original notion of kernel vectors is as follows. Let m = kξ =1 xξ . For each coordinate i = 1, . . . , n, choose an erosive noise bound ni ≤ mi (in certain cases it may be desirable to set ni = −∞). The vector n = (n1 , . . . , nn ) is called an erosive bound for X = {x1 , . . . , xk }. Next, let Y = {y1 , . . . , y } ⊂ Rn be any set of strongly lattice independent vectors satisfying the max dominant condition such that F (Y ) = F (X). Additionally, assume that n ≤ yλ ∀λ = 1, . . . , . This latter condition can be easily achieved by a translation aλ + yλ for an appropriately chosen aλ ∈ R. The equation yjλλ − yiλ = (aλ + yjλλ ) − (aλ + yiλ ) implies that strong lattice independence is preserved under these translations. A set of kernel vectors Z = {z1 , . . . , z } for X is defined by setting λ yjλ if i = jλ ziλ = (67) ni if i = jλ . As a consequence of this definition we obtain the foiling two properties: zλ ∧ zγ = n ∀λ = γ WY Y ∨ z = y λ
λ
∀λ = 1, . . . , .
(68) (69)
222
RITTER AND GADER
Equation (68) follows from the definition of the vectors of the set Z and Theorem 11.3. Note also that since F (X) = F (Y ), Eq. (69) is equivalent to WXX ∨ zλ = yλ ∀λ = 1, . . . , . To prove that Eq. (69) holds, we show that given γ ∈ {1, . . . , }, there exists a j ∈ {1, . . . , n} such that γ γ γ ξ ξ yi − y i + y j (70) zj = yj ∨ ξ =γ
for every i = 1, . . . , n. Equation (69) then follows from Theorem 12.1. To prove Eq. (70), let j = jγ . Then γ yj
∨
γ yi
ξ − yi
ξ =γ γ
= yi −
ξ + yj
=
k
k ξ γ γ ξ ξ ξ yi − y i + y j = y i + yj − y i
ξ =1 k
ξ =1
γ γ γ γ γ ξ ξ yi − y j = y i − y i − y j = y j = z j .
ξ =1
The set of vectors Z = {z1 , . . . , z } defined by Eq. (67) is called a set of kernel vectors for X bounded by n. If X = Y , 0 ≤ kξ xξ , and n = 0, then Z corresponds to the notion of a kernel for X as proposed by Ritter et al. (2003b). Thus, the definition of kernel vectors given here generalizes the notion of kernel vectors defined by Ritter et al. (2003b). Kernel vectors provide for a new mini-max autoassociative memory T : Rn → Rn defined by T (x) = WXX ∨ (MZZ ∧ x)
∀x ∈ Rn .
(71)
Recall of randomly corrupted patterns or patterns with missing data can often be achieved by T if n is large. Employing our previous image patterns and using an erosive noise bound n = 0, we selected kernel vectors for boolean images as shown in Figures 20 and 21, and for gray-valued images as shown in Figure 22. The reason for choosing n = 0 is that image values are nonnegative. Hence, 0 ≤ kξ =1 xξ and random noise in images is also nonnegative. However, in many pattern recognition tasks some feature values (i.e., coordinate values of pattern vectors) may be negative. Examples of pattern features that may have negative values are slopes, temperatures, gradients, and curvature. In these cases, the erosive noise bound n = 0. The results portrayed by the images recalled in Figure 22 are somewhat misleading. Although to the human eye they appear to be identical to the exemplar pattern images, simple image subtraction proves otherwise. The recalled third image consisting of strawberries differs from the exemplar in
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
223
F IGURE 20. Recall of randomly corrupted boolean patterns. The top row shows the original exemplar patterns while the second row shows the image representation of the corresponding kernel vectors. The location of the nonzero coordinate of each kernel vector is indicated by the black pixel. The third row shows the exemplar patterns corrupted with 20% of random noise. The bottom row represents the output of the memory T when presented with the corresponding corrupted patterns in row three.
F IGURE 21. Recall of randomly corrupted boolean patterns. The top row shows the original exemplar patterns while the second row shows the image representation of the corresponding kernel vectors. The location of the nonzero coordinate of each kernel vector is indicated by the black pixel. The third row shows the exemplar patterns corrupted with 20% of random noise. The bottom row represents the output of the memory T when presented with the corresponding corrupted patterns in row three.
224
RITTER AND GADER
F IGURE 22. The top row shows the three original patterns; the second row shows the location of the nonzero valued kernel entries with the pixel size magnified for better visibility. The third row shows the exemplar patterns corrupted by 50% random noise in the range [−255, 255]. The bottom row shows the output of the memory T when presented with the corrupted patterns.
several locations. This is due to the fact that the corrupted image pattern was located in the correct section of the hyperplane Er (ej3 ) where r = xj33 . To obtain a better understanding of the behavior of T , we revisit Example 12.1. If n = (−2, −1) and Y = {x1 , x2 }, then Z = {z1 , z2 }, where z1 = (−2, 2) and z2 = (5.5, −1) . The set of fixed points F (X) becomes a subset of F (Z), the set of fixed points of MZZ , as illustrated in Figure 23. Although generally T (x) = WXX ∨ x, T and WXX share the same set of fixed points and T (x) = WXX ∨ x ∀x ∈ F (Z). In particular, the points x˜ 1 , x, x˜ 2 , x˜ 3 , and x˜ 4 remain fixed under the action of MZZ , but get mapped by T to the same points as those in Example 12.1 under the action of WXX . On the other hand, the points y = (−2, 4.5) and u = (8, −1) get mapped by MZZ to z1 and z2 , respectively. Thus, T (y) = x1 and T (u) = x2 . If y and u are corrupted versions of x1 and x2 , respectively, containing both erosive and dilative noise, then T recovers x1 and x2 from these corrupted versions. In dimension n > 2, recall of randomly corrupted patterns or patterns with missing data can often be achieved by T since Rn \F (X) is connected and the various line segments of orbits of points shown in Figure 23 are replaced by hyperplane sections.
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
225
F IGURE 23. The orbits of points when using the transform T . Specifically, T (y) = T (z1 ) = T (˜x2 ) = x1 and T (u) = T (z2 ) = x2 . The shaded area represents F (Z) which includes the slightly darker shaded area representing F (X).
Another problem, however, exists. The width of the set F (Z), defined as the diameter of the set E(e) ∩ F (Z), depends not only on n, the erosive noise bound, but also on the choice of the strongly lattice independent set Y . The fixed point set F (Z), which depends on the choice of Y , plays a critical role in the recall performance of the mini-max memory T . For example, the point u = (8, −1) in the preceding example resembles an eroded version of x4 = (8, 3) more than a randomly distorted version of x2 = (5.5, 0.5) . Yet, T (u) = x2 . If we had chosen Y = {y1 , y2 } with y1 = x1 and y2 = x4 instead, then the width of F (Z) would be larger than in the previous case as shown in Figure 24. Furthermore, we would now have z2 = u and T (z2 ) = x4 , as desired. Of course, the point y = (−2, 4.5) also resembles an eroded version of x3 = (6, 4.5) more than it does a randomly corrupted version of x1 = (3, 2) , but T (y) = x1 . However, since x˜ 4 ∈ F (Z) and x14 = x˜14 , we have
F IGURE 24. The orbits of points when using the transforms T . Here T (v) = T (z2 ) = x4 , while T (y) = T (z1 ) = T (˜x1 ) = x1 .
226
RITTER AND GADER
T (˜x4 ) = x4 in this as well as the preceding case. In fact, x˜ 4 = α + x4 with α > 0 is just a slightly brighter version (uniformly increased values) of x4 . This is similar to the example of the recall of the strawberry image from its corrupted version discussed earlier. To make the width of F (Z) independent of the choice of the strongly lattice independent set Y , we need a more general definition of kernel vectors than the one specified by Eq. (67). In the general definition, stated below, the user specifies a desirable erosive tolerance level εi ≥ 0 for each coordinate i = 1, . . . , n. If Y = {y1 , . . . , y } is a set of strongly lattice independent vectors such that F (Y ) = F (X), then the set of kernel vectors Z = {z1 , . . . , z } is defined by λ if i = jλ yjλ λ zi = (72) yiλ − εi if i = jλ , where λ = 1, . . . , and i = 1, . . . , n. Using the same argument as before shows that Eq. (69) is satisfied by this more general concept of kernel vectors. Moreover, the width of F (Z) now depends only on the width of F (X) and the choice of the erosive tolerance ε = (ε1 , . . . , εn ) . Returning to our previous example and choosing y1 = x1 , y2 = x2 , and ε = (7.5, 4) , we obtain z1 = (−4.5, 2) and z2 = (5.5, −3.5) . If instead we had chosen y1 = x1 and y2 = x4 , then z1 = (8, −2) and z2 = (5.5, −3.5) . In either case, √ E(e)∩F (Z) remains the same, namely diameter[E(e) ∩ F (Z)] = 7.25 × 2. This can be easily verified by choosing the line E(e) = {(x, y): y = 7 − x}, which is perpendicular to F (Z) and passes through the point u. For either set Z, we now have T (y) = T (˜x2 ) = (5.5, 4.5) and T (u) = x4 . In any of the above variations of defining kernel vectors, it is necessary to have a set of strongly lattice independent vectors Y such that F (Y ) = F (X) or, equivalently, WXX = WY Y . Thus the question remains of how Y is determined given an arbitrary set of pattern vectors X. The following example suggests an answer to this question. Example 13.1. (1) Let X = {x1 , x2 , x3 , x4 } ⊂ R3 , where x1 = (11, 4, 1) , x2 = (8.5, 7.5, 2) , x3 = (7, −2.5, −1) , and x4 = (8, 5.5, 4) . In Example 10.3 we observed that X is lattice independent. Since dim F (X) = 3, it follows from Corollary 11.2 that X is not strongly lattice independent. However, in view of Example 11.1, the extreme points v1 = (10, 0.5, 0) , v2 = (6.5, 5.5, 0) , and v3 = (4, −1.5, 0) , with location as indicated in Figure 10, are strongly lattice independent and WV V = WXX , where V = {v1 , v2 , v3 }. The following is a simple and fast algebraic method of obtaining these extreme points. Let W = {w1 , w2 , w3 } ⊂ R3 be the set of
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
227
column vectors of the matrix WXX =
0 −9.5 −10
1 4 0 −1.5 −5.5 0
.
j
Hence, wi = wij . A quick check shows that j
j
wjk − wik ≤ wj − wi
for i, k ∈ {1, 2, 3}.
This shows that the elements of W satisfy inequality (57). Furthermore, wj (3) = vj for j = 1, 2, and 3. It follows that the vectors of W are extreme points of F (X) and F (W ) = F (X). As the next example shows, this need not always be the case. (2) Let X = {x1 , x2 } ⊂ R4 , where x1 = (2, 3, 4, 1) and x2 = (4, 3, 4.5, 2) are as in Example 10.2. In this case we have ⎛ ⎞ 0 −1 −2 1 ⎜ −1 0 −1.5 1 ⎟ WXX = ⎝ 0.5 1 0 1.5 ⎠ −2 −2 −3 0 and W ⊂ R4 consists of four vectors. These vectors satisfy inequality (57) but are not lattice independent and, hence, not strongly lattice independent. However, the set V = {w1 , w2 } is strongly lattice independent and 3 + w2 = x1 , while 4 + w1 = x2 . Hence, WV V = WXX . The method of deriving the set of strongly lattice independent vectors from the matrix WXX in the above example provided the key ingredient in proving the next theorem (Ritter et al., 2006). Theorem 13.1. Let X = {x1 , . . . , xk } ⊂ Rn and let W ⊂ Rn be the set of vectors consisting of the columns of the matrix WXX . If V ⊂ W denotes the smallest set of vectors with the property that WV V = WXX , then the set V is strongly lattice independent. We derived an effective and computationally efficient algorithm for obtaining V from WXX . This algorithm is vital for deriving the kernel vectors discussed earlier in this section. Furthermore, since by Theorem 11.4 the set V is affinely independent, this method of determining V provides for a computationally effective method of computing endmembers in hyperspectral imagery (Ritter et al., 2006).
228
RITTER AND GADER
XIV. A SSOCIATIVE M EMORIES BASED ON D ENDRITIC C OMPUTING From our discussion in the preceding section it is evident that exact recall of randomly corrupted patterns is generally not achievable. Major obstacles are exemplar patterns or corrupted patterns lying in the fixed point set of the matrix WXX and corrupted patterns x˜ ξ that do not lie in the correct hyperplane sections E that have the property that T (E) = xξ . Thus, a corrupted version x˜ ξ of xξ may be arbitrarily close in terms of distance to its exemplar xξ and yet correct recall will not be achievable by the matrix memories WXX , MXX , or T . To overcome this problem and to investigate applications of neural networks based on dendritic computing, a new associative memory was proposed that was inspired by the morphology of cortical neurons (Ritter and Urcid, 2003; Ritter et al., 2004). The various ANNs that are currently in vogue, such as radial basis function neural networks and support vector machines, have very little in common with actual biological neural networks. A major aim of this article is to introduce a model of an artificial neuron that bears a closer resemblance to neurons of the cerebral cortex than those found in the current literature. We will show that this model has greater computational capability and pattern discrimination power than single neurons found in current ANNs. Since our model mimics various biological processes, it will be useful to provide a brief background on the morphology of a biological neuron. A typical neuron of the mammalian brain has two processes called, respectively, dendrites and axons. The axon is the principal fiber that forms toward its ends a multitude of branches, called the axonal tree. The tips of these branches, called nerve terminals or synaptic knobs, make contact with the dendritic structures of other neurons. These sites of contact are called synaptic sites. The synaptic sites of dendrites are the places where synapses take place. Dendrites have many branches that create large and complicated trees and the number of synapses on a single neuron of the cortex typically ranges between 500 and 200,000. Figure 25 provides a simplified sketch of the processes of a biological neuron. It is also well known that there exist two types of synapses: excitatory synapses that play a role in exciting the postsynaptic cell to fire impulses and inhibitory synapses that try to prevent the neuron from firing impulses in response to excitatory synapses. The postsynaptic membranes of the dendrites will thus either accept or inhibit the received input from other neurons. It is worthwhile to note that dendrites make up the largest component in both surface area and volume of the brain. Part of this is due to the fact that dendrites span all cortical layers in all regions of the cerebral cortex (Eccles, 1977; Koch and Segev, 1989; Segev, 1998). Thus, when attempting
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
F IGURE 25.
229
Simplified sketch of the processes of a biological neuron.
to model artificial brain networks, dendrites, which make up more than 50% of the neuron’s membrane, cannot be ignored. This is especially true in light of the fact that some researchers have proposed that dendrites, and not neurons, are the elementary computing devices of the brain, capable of implementing such logical functions as AND, OR, and NOT (Eccles, 1977; Koch and Segev, 1989; Segev, 1998; Arbib, 1998; Holmes and Rall, 1992; McKenna et al., 1992; Mel, 1993; Rall and Segev, 1987; Shepherd, 1992). To take advantage of recent advances in neurobiology and the biophysics of neural computation and to nudge the field of ANNs back to its original roots, we proposed a model of single neuron computation that takes into account the computation performed by dendrites (Ritter and Urcid, 2003). Extrapolating on this model, we constructed a single-layer, feedforward neural network based on dendritic computing within the lattice domain (Ritter et al., 2003a). In this model, a set of n input neurons N1 , . . . , Nn provides information through its axonal arborization to the dendritic trees of a set of m neurons M1 , . . . , Mm . Explicitly, the state value of a neuron Ni (i = 1, . . . , n) propagates through its axonal tree all the way to the terminal branches that make contact with the neuron Mj (j = 1, . . . , m). The weight of an axonal , branch of neuron Ni terminating on the kth dendrite of Mj is denoted by wij k where the superscript ∈ {0, 1} distinguishes between excitatory ( = 1) and inhibitory ( = 0) input to the dendrite (see also Figure 26). The kth dendrite of Mj will respond to the total input received from the neurons N1 , . . . , Nn and will either accept or inhibit the received input. The computation of the kth dendrite of Mj is given by j τk (x) = pj k (−1)1− xi + wij (73) k , i∈I (k) ∈L(i)
where x = (x1 , . . . , xn ) denotes the input value of the neurons N1 , . . . , Nn with xi representing the value of Ni , I (k) ⊆ {1, . . . , n} corresponds to the
230
RITTER AND GADER
F IGURE 26. Morphological perceptron with dendritic structure. Terminations of excitatory and inhibitory fibers are marked with • and ◦, respectively. Dj k denotes dendrite k of Mj and Kj its number of dendrites. Neuron Ni can synapse Dj k with excitatory or inhibitory fibers, e.g., weights 1 and w 0 , respectively, denote excitatory and inhibitory fibers from N to D w1j j k and from Nn 1 k nj 2 to Dj 2 .
set of all input neurons with terminal fibers that synapse on the kth dendrite of Mj , L(i) ⊆ {0, 1} corresponds to the set of terminal fibers of Ni that synapse on the kth dendrite of Mj , and pj k ∈ {−1, 1} denotes the excitatory (pj k = 1) or inhibitory (pj k = −1) response of the kth dendrite of Mj to the received input. It follows from the formulation L(i) ⊆ {0, 1} that the ith neuron Ni can have at most two synapses on a given dendrite k. Also, if the value = 1, 1 ) is excitatory, and inhibitory for = 0 since in this then the input (xi + wij k 0 ). case we have −(xi + wij k j
The value τk (x) is passed to the cell body and the state of Mj is a function of the input received from all its dendrites. The total value received by Mj is given by τ j (x) = pj
Kj
j
τk (x),
(74)
k=1
where Kj denotes the total number of dendrites of Mj and pj = ±1 denotes the response of the cell body to the received dendritic input. Here again, pj = 1 means that the input is accepted, while pj = −1 means that the cell rejects the received input. The next state of Mj is then determined by an activation function f , namely yj = f [τ j (x)]. The total computation of Mj is, therefore,
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
231
given by Kj yj (x) = f pj pj k k=1
(−1)
1−
xi + wij k
2 .
(75)
i∈I (k) ∈L(i)
Figure 26 provides a graphic representation of this model. It is also worth noting that at first glance Eqs. (73) and (74) appear to involve only addition and minimization. However, due to the duality a ∨ b = −(−a ∧ −b) and the fact that (−1)1− = −1 for = 0 and pj , pj k ∈ {−1, 1}, the maximum operation is implicitly included in these equations. A modification of the lattice-based perceptron with dendritic structure leads to a different autoassociative memory than the lattice matrix memories discussed in the preceding sections. For this new autoassociative memory a set of sensory (input) neurons N1 , . . . , Nn that receives input x from the space Rn with Ni receiving input xi , the ith coordinate of x is defined. If, as before, X = {x1 , . . . , xk } ⊂ Rn represents the set of exemplar patterns, then the input neurons will propagate their input values xi to a set of k hidden neurons H1 , . . . , Hk , where each Hj has exactly one dendrite. Every input neuron Ni has exactly two axonal fibers terminating on the dendrite of Hj . The weights of the terminal fibers of Ni terminating on the dendrite of Hj are given by j −(xi − α) if = 1 (76) wij = j −(xi + α) if = 0, where i = 1, . . . , n and j = 1, . . . , k. The parameter α > 0 is a user-defined noise parameter that must satisfy the inequality α < 12 dmin , where $ % 2α < dmin = min d xξ , xγ : ξ < γ , ξ, γ ∈ {1, . . . , k} (77) and d(xξ , xγ ) denotes the Chebyshev (checkerboard) distance between the γ ξ patterns xξ and xγ defined by d(xξ , xγ ) = max{|xi − xi |: i = 1, . . . , n}. For a given input x ∈ Rn , the dendrite of the hidden unit Hj computes τ j (x) =
n 1
. (−1)1− xi + wij
(78)
i=1 =0
The state of the neuron Hj is determined by the hard-limiter activation function 3 0 if z ≥ 0 f (z) = (79) −∞ if z < 0. The output of Hj is given by f [τ j (x)] and is passed along its axonal fibers to n output neurons M1 , . . . , Mn . The activation function defined by Eq. (79) is a
232
RITTER AND GADER
hard-limiter in the algebra A = (R−∞ , ∨, +) since the zero of A is −∞ (for the operation ∨) and the unit of A corresponds to 0. This mirrors the hardlimiter in the algebra (R, +, ×) defined by f (z) = 0 if z < 0 and f (z) = 1 if z ≥ 0, since in this algebra the zero is 0 and the unit is 1. Similar to the hidden layer neurons, each output neuron Mh , where h = 1, . . . , n, has one dendrite. However, each hidden neuron Hj has exactly one excitatory axonal fiber and no inhibitory fibers terminating on the dendrite of Mh . Figure 27 illustrates this dendritic network model. The excitatory fiber j of Hj terminating on Mh has synaptic weight vj h = xh . The computation performed by Mh is given by τ h (s) =
k
(sj + vj h ),
(80)
j =1
where sj denotes the output of Hj , namely sj = f [τ j (x)], with f defined in Eq. (79). The activation function g for each output neuron Mh is the linear identity function g(z) = z. Each neuron Hj will have the output value sj = 0 if and only if x is an element of the hypercube $ % j j B j = x ∈ R n : x i − α ≤ xi ≤ x i + α (81)
F IGURE 27. The topology of the dendritic model of an autoassociative memory. The network is fully connected; all axonal branches from input neurons synapse via two fibers on all hidden neurons, which in turn connect to all output nodes via excitatory fibers. Only a few connections are shown above.
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
233
F IGURE 28. The top row shows the original six pattern images; the second row shows these patterns corrupted by 75% of random noise. The third row illustrates the incorrect output of the matrix-based memory T when presented with the noisy input; the output appears to be shifted toward the high (white) pixel values. The fourth row is identical to the first row and represents the output of the dendritic memory when presented with the noisy input.
and sj = −∞ whenever x ∈ Rn \B j . Thus, the output of this network will j j be y = (y1 , . . . , yn ) = (x1 , . . . , xn ) = xj if and only if x ∈ B j . That is, whenever x is a corrupted version of xj with each coordinate of x not exceeding the allowable noise level α, then x will be associated with xj . If the amount of noise exceeds the level α, then the network rejects the input by yielding the output vector (−∞, . . . , −∞) . Obviously, each uncorrupted pattern xξ will be associated with xξ . To illustrate the performance of this new autoassociative memory and compare it to the lattice-based matrix memories, we again use a visual example consisting of six image patterns with pixel values in the range [0, 255]. The top row of Figure 28 shows the six exemplar patterns p1 , . . . , p6 . These pictorial patterns were converted into a set of (column) exemplar pattern vectors X = {x1 , . . . , x6 } and stored in the autoassociative memory WXX . After extracting a set of kernel vectors Z = {z1 , . . . , z6 } bounded by n = 0 from X, the patterns were corrupted with 76% of random noise in the range of [−72, 72]. The third and fourth rows of Figure 28 show the recall performance of the memories T (x) = WXX ∨ (MZZ ∧ x) and the dendritic autoassociative memory, respectively, when presented with the corrupted patterns. A further improvement of this model is to make the noise parameter α dependent on each ξ = 1, . . . , k and each coordinate i = 1, . . . , n. This will allow more freedom to tailor the net to a specific problem. In particular,
234
RITTER AND GADER
F IGURE 29.
Boxes with variable noise parameters αi (j ). Each pattern in the box containing
xj will be associated with xj .
the weights can be defined as follows: j − xi − αi1 (j ) wij = j − xi + αi0 (j )
if = 1 if = 0.
(82)
In this way, the hyperboxes are not centered at xξ allowing for various schemes for creating large noise surrounds about a pattern. Care needs to be taken that the boxes do not overlap. Figure 29 shows an example of such generalized boxes that will associate each point in the box B j = {x ∈ j j Rn : xi − αi1 (j ) ≤ xi ≤ xi + αi0 (j )} with the pattern yj . The figure also illustrates the superiority of this model over the lattice correlation memory through the elimination of problems associated with the set of fixed points of WXX and MXX . A simple modification of the dendritic autoassociative memory leads to a heteroassociative memory that takes advantage of the dendritic comuting concept. The only modification appears at the weights of the axonal fibers of the hidden neurons and at the output layer. Specifically, suppose that the associated pattern sets are given by X = {x1 , . . . , xk } ⊂ Rn and Y = {y1 , . . . , yk } ⊂ Rm . In this case the topology of the network is the same as the one for the autoassociative memory shown in Figure 27 except that the output layer now consists of m neurons, namely M1 , . . . , Mm . As before, each output neuron Mh , where h = 1, . . . , m, has one dendrite, and each hidden neuron Hj has exactly one excitatory axonal fiber and no inhibitory fibers terminating on the dendrite of Mh . However, the excitatory fiber of Hj , terminating on Mh , j now has synaptic weight vj h = yh . The computation performed by Mh is given by Eq. (80) and the activation function g for each output neuron Mh
LATTICE TRANSFORMS AND ASSOCIATIVE MEMORIES
235
F IGURE 30. The top row depicts the images p1 , p2 , p3 , which were converted into the prototype patterns of the set X = {x1 , x2 , x3 }, while the bottom row shows the images used to generate the corresponding association patterns from the set Y = {y1 , y2 , y3 }.
is the linear identity function g(z) = z. Hence, again, each neuron Hj will have the output value sj = 0 if and only if x is an element of the hypercube B j defined by Eq. (81), and sj = −∞ whenever x ∈ Rn \B j . Thus, the j j output of this network will be y = (y1 , . . . , ym ) = (y1 , . . . , ym ) = yj if and only if x ∈ B j . That is, whenever x is a corrupted version of xj with each coordinate of x not exceeding the allowable noise level α, then x will be associated with yj . If the amount of noise exceeds the level α, then the network rejects the input by yielding the output vector (−∞, . . . , −∞) . Obviously, each uncorrupted pattern xξ will be associated with yξ . To illustrate the performance of this associative memory, we use a visual example consisting of the associated image pairs P = {p1 , p2 , p3 } and Q = {q1 , q2 , q3 } shown in Figure 30. Each pξ is a 50 × 50-pixel 256-gray scale image, whereas each qξ is a 30 × 50-pixel 256-gray scale image, where ξ = 1, 2, 3. Using the standard row-scan method, each pattern image pξ and qξ ξ ξ was converted into an associative pair of pattern vectors xξ = (x1 , . . . , x2500 ) ξ ξ and yξ = (y1 , . . . , y1500 ) . Thus, for this particular example, we have X = {x1 , x2 , x3 } ⊂ R2500 and Y = {y1 , y2 , y3 } ⊂ R1500 . The images were then distorted by randomly corrupting 95% of the coordinates of each xξ within a noise level α chosen to satisfy the inequality in Eq. (77). In this particular example, we set α = 25 dmin . Numerically, we have d(x1 , x2 ) = 198, d(x1 , x3 ) = 211, and d(x2 , x3 ) = 188, where the range of pixel values (vector features) is [0, 255]. Hence, dmin = d(x2 , x3 ) = 188 and α = 75.2. The top row of Figure 31 shows the images thus corrupted while the bottom row shows the perfect recall association achieved by the network. In this experiment, recall succeeds because the amount of distortion is controlled not to exceed the allowable level α. When the noise is above α,
236
RITTER AND GADER
F IGURE 31. The top row shows the prototype patterns x˜ 1 , x˜ 2 , x˜ 3 corrupted with random noise within the range [−α, α] and the bottom row shows the perfect recall association achieved by the proposed model.
the memory described thus far fails to recognize the patterns and rejects them by outputting a vector (−∞, . . . , −∞) . Failure of recall is illustrated in Figure 32, where the output vector components of −∞ are conventionally depicted as white pixels. The model can be refined to be more tolerant of random noise by using the idea of variable noise boxes discussed earlier [Eq. (82)]. Instead of choosing a single parameter α for all prototype patterns in the set X, we set one allowable noise parameter αξ > 0 for each ξ ∈ {1, . . . , k}, satisfying $ % 1 min d xξ , xγ : γ ∈ K(ξ ) , (83) 2 where K(ξ ) = {1, . . . , k}\{ξ }. The model is similar to the one described above, with the exception that the weights of the axonal fibers of Ni terminatαξ
M − B and therefore the order relationship is reversed. If A = B, then M − A = M − B. There are then two cases: #(X) < #(Y ) or #(X) = #(Y ), and $(X) ≤ $(Y ). When #(X) < #(Y ), #(Xc ) = −#(X), #(Y c ) = −#(Y ), and therefore #(Xc ) > #(Y c ) and the order relationship is reversed. If #(X) = #(Y ), then $(Xc ) = −$(X), $(Y c ) = −$(Y ), and therefore the order relationship is reversed as well. Property 10 holds because first M − (M − A) = A and because ej (θ+2π) = j θ e . Figure 2 illustrates the complement of complex number X. This complement procedure is quite different from Serra’s complementation, illustrated on the real number Y in the same figure. D. Umbra The umbra U of a real-valued function f is defined as the portion of space that is below the function, union with the function itself. For a single real-
254
RIVEST
F IGURE 3.
The umbra of a single complex number, X, on the complex plane.
valued sample P , it is all the points of the real axis that are smaller than or equal to P . In the image processing practice, the signals are limited between ±M, or between 0 and M, depending on the application. The umbra of a complex sample X is similarly defined using our order relationship: U = {Y : Y X}.
(16)
Figure 3 illustrates graphically the umbra of a single sample on the complex plane. The umbra is the union of the following regions on the complex plane: 1. The interior of the circle centered on the origin and radius A. 2. The path on the the circle centered on the origin and radius A, starting from X, passing by θ = π, and ending at X∗ , X∗ included if $(X) > 0.
E. Maximum (∨) and Minimum (∧) The max ∨ and the min ∧ operators are defined: 3 X ∨ Y = X if Y X Y otherwise
(17)
MATHEMATICAL MORPHOLOGY
3 X ∧ Y = Y if Y X X otherwise. These operators are dimensionality preserving (Rivest et al., 1992):
255 (18)
λX ∧ λY = λ(X ∧ Y )
(19)
λX ∨ λY = λ(X ∨ Y ).
(20)
This property is important because it ensures that the output of these operators has the same physical units as the units of the input signals. This property enables us to predict the properties of the output, when the signals are attenuated or amplified. The ∨ and ∧ operators preserve the dimensionality of the input signal because they merely choose between two samples, based on their order relationship.
III. D ILATIONS AND E ROSIONS The dilation of a complex signal F by a flat structuring element B is denoted δBC (F ) and is defined as the maximum value of the translations of F by the vectors −b of B, trans−b (F ) (Serra, 1982): δBC (F ) = trans−b (F ). (21) b∈B
The erosion of a complex signal F by the flat structuring element B uses the minimum instead of the maximum: εBC (F ) = trans−b (F ). (22) b∈B
It should be noted that the dilation and the erosion, using our order relationship, does not create new signal values. As is the case with the scalar dilation and erosion, this transformation merely chooses among a certain number of samples the one that is the output. These operators also preserve the dimensionality of the samples, exactly as they do for gray-tone images when we use flat structuring elements. This is because the operators ∨ and ∧ preserve the dimensionality, as mentioned in Eqs. (19) and (20). λδBC (F ) = δBC (λF )
(23)
λεBC (F ) = εBC (λF ).
(24)
They also commute under spatial or time scaling, like their relatives in graytone and binary image processing. For signals, the time axis is t. For images,
256
RIVEST
F IGURE 4.
Complex dilation on a test signal. Top: amplitude. Bottom: phase, in radians.
F IGURE 5.
Complex erosion on a test signal. Top: amplitude. Bottom: phase, in radians.
MATHEMATICAL MORPHOLOGY
257
F IGURE 6. Complex and classical dilation on a real chirped radar signal. Top graph: input. Second graph: classical dilation. Next graphs: structuring element diameters for 8, 16, and 32 samples.
F IGURE 7. Complex and classical erosion on a real chirped radar signal. Top graph: input. Second graph: classical erosion. Next graphs: structuring element diameters for 8, 16, and 32 samples.
258
RIVEST
the image plane coordinates are (x, y). In general, let K be the coordinates, and λK a uniform scaling λ over these coordinates. F (K) is a function of coordinates K, the dilation δBC (F )(K) is also function of K, and so is B(K): C (25) δBC (F )(λK) = δB(λ K) F (λK) , C εBC (F )(λK) = εB(λ K) F (λK) ,
(26)
where F (λ) and B(λ) denote spatial or time scaling on the image plane of signal F and structuring element B by the scaling factor λ. This is because such scaling is strictly a space/time operation. Figure 4 illustrates a dilation on a test signal, while Figure 5 illustrates an erosion. Figures 6 and 7 illustrate the dilation and the erosion on a radar signal used for ground mapping. The signal is chirped, that is, the frequency of the carrier wave at the beginning of the pulse is either higher or lower than the frequency at the end of the pulse. In this specific example, the size of the features in the signal tends to increase at the end of the radar pulse because the carrier frequency becomes lower at the end of the pulse.
IV. O PENINGS , C LOSINGS , AND M ORPHOLOGICAL F ILTERS Openings and closings are the basis of all morphological filters. Complex morphology is no exception. All the properties that are desirable for a filter are preserved in complex morphology. A complex opening γBC (f ) of complex signal f by the structuring element B is defined the same way as a classical morphological opening; it is an erosion followed by a dilation: γBC (f ) = δ Cˇ εBC (f ) , (27) B
where B is the structuring element and Bˇ is the transposition of B. Figure 8 illustrates an example of such an opening on a complex signal. Figure 9 illustrates the result of a classical opening on a real signal, the chirped radar signal we used in Figure 6 along with the complex opening. There are significant differences between the classical opening and the complex opening on the real signal. The classical opening is unsuitable to process such a class of signals. It tends to increase the parasitic DC component in the signal, which is highly undesirable. This is caused by the fact that openings widen the valleys in signals. For RF signals the definition of a valley is based on the signal amplitude, or, in the case of real signals, on the signal’s absolute value. The complex opening removes the whole waveform that is smaller than the structuring element, regardless of its polarity.
MATHEMATICAL MORPHOLOGY
F IGURE 8.
259
Complex opening of a test signal. Top: amplitude. Bottom: phase, in radians.
F IGURE 9. Complex and classical opening on a real chirped radar signal. Top graph: input. Second graph: classical opening. Next graphs: structuring element diameters for 8, 16, and 32 samples.
260
RIVEST
The complex closing is defined: φBC (f ) = ε Cˇ δBC (f ) . B
(28)
It is the dual of the opening and it therefore works the same way as the opening on the complement of the signal. Figure 10 illustrates an example of a closing on a complex test signal while Figure 11 illustrates it on the chirped signal, along with its classical counterpart. The classical closing has the same problems as the classical opening, which is expected. The complex opening and the complex closing feature the same properties as their classical counterparts; they are increasing and idempotent operators. The opening is antiextensive while the closing is an extensive operator. The opening and the closing are the two operators upon which morphological filters are built. Their complex counterparts can also be built using complex openings and complex closings in exactly the same way. Serra (1988) presents the most frequently used classical morphological filters. Their complex equivalent is relatively straightforward to develop. However, such an extension is beyond the scope of this article.
F IGURE 10.
Complex closing of a test signal. Top: amplitude. Bottom: phase, in radians.
MATHEMATICAL MORPHOLOGY
261
F IGURE 11. Complex and classical closing on a real chirped radar signal. Top graph: input. Second graph: classical closing. Next graphs: structuring element diameters for 8, 16, and 32 samples.
V. G EODESY Morphological operators are usually defined on Rn , that is, in the Euclidean space such as the image plane. There is, however, a class of operators called “geodesic operators” that is defined in spaces where the Euclidean distance is replaced by a geodesic distance. These operators have been defined on both binary and gray-tone images. This section extends the concept of geodesy to complex signals. Lantuéjoul and Beucher (1981) first presented the concept of geodesy, applied to binary images. This concept was then extended to gray-tone images with success, especially with the development of the morphological approach to segmentation, where it became its very basis. Beucher (1990) and Meyer and Beucher (1990) present an excellent overview on geodesy applied to functions and gray-tone image segmentation. Euclidean morphology operates on the whole image plane. The Euclidean distance is the length of the shortest path between two points in this space. In geodesy, this space is partitioned by sets that are collectively referred to as a “geodesic mask.” These sets have arbitrary shapes. For instance, geodesic
262
F IGURE 12. Z is infinite.
RIVEST
Geodesic distance between X and Y . By convention, the distance between X and
masks can have holes and disjoint particles. The geodesic distance in such a space is defined as the length of the shortest path between two points, the path being included in the geodesic mask. By convention, if the path is not included in the mask, the distance between the points is infinite. Figure 12 illustrates the concept; d(X, Y ) is finite, while d(X, Z) is infinite. A. Structuring Element In Euclidean morphology, we frequently use a flat disk of radius ρ as a structuring element. In geodesy, ρ becomes a geodesic radius. Consequently, the shape of the structuring element changes, depending on the geodesic mask. B. Dilations and Erosions Geodesic dilations and erosions are defined in the same fashion as their Euclidean counterparts [Eqs. (21) and (22)]. The difference resides in the translations, which become geodesic.
MATHEMATICAL MORPHOLOGY
263
In Euclidean morphology, the iteration of a dilation or an erosion is equivalent to a single operator with a larger structuring element. It is also the case in geodesy. This property is important in Euclidean morphology because it allows the decomposition of large structuring elements into smaller parts. In geodesy, we use this property for the same reasons. For instance, a Euclidean dilation with a disk of radius ρ can be implemented by iterating the dilation n times with a disk of radius e: % $ δρ (X) = δe . . . δe δe (X) . . . , (29) 56 7 4 Iterated n times
with n = ρ/e, ρ ≥ e. This result is similar for erosions. It also suggests the following implementation, for a geodesic dilation, δρG (X), of binary image X into a geodesic mask, G, and a geodesic radius, ρ: % $ δρG (X) = lim δe . . . δe δe (X) ∩ G ∩ G . . . ∩ G (30) e→0 4 56 7 n times
with n = ρ/e, ρ ≥ e, and X ⊆ G. The geodesic erosion is then % $ ερG (X) = lim εe . . . εe εe (X) ∪ G ∪ G . . . ∪ G e→0 4 56 7
(31)
n times
with n = ρ/e, ρ ≥ e, and X ⊇ G. The geodesic dilation with an infinitely small structuring element is identical to a Euclidean dilation followed by an intersection. By duality, the geodesic erosion with an infinitely small structuring element is identical to a Euclidean erosion followed by a union. For functions and complex signals, the definitions are almost identical: % $ δρg (f ) = lim δe . . . δe δe (f ) ∧ g ∧ g . . . ∧ g (32) e→0 4 56 7 n times
with n = ρ/e and f g. The dilation is now complex and the intersection operator has been replaced by the complex minimum operator. f g means that every point of complex function f is smaller (according to our order relationship) than or equal to its counterpart in function g. The geodesic erosion is % $ ερg (f ) = lim εe . . . εe εe (f ) ∨ g ∨ g . . . ∨ g (33) e→0 4 56 7 n times
with n = ρ/e and f " g. The erosion is complex and the union operator has been replaced by the complex minimum.
264
RIVEST
F IGURE 13. Complex geodesic dilation of f (t) into g(t), a geodesic mask. f (t) is the function that features four impulses at samples 250, 500, 1000, and 2250. The top graph is the amplitude and the bottom one is the phase.
These two new geodesic operators have the same properties as their singlevalued counterparts. This is because our relationship is a total order relationship, and because the complement of a complex-valued function, as we defined it, reverses that order relationship. Figure 13 shows a geodesic dilation on a complex signal. The behavior of the amplitude is identical to gray-scale geodesic dilations. A dilation of f is carried out until it is constrained by the geodesic mask, as shown in Figure 13. The phase behavior, especially between samples 2000 and 2500, is caused by the unconstrained dilation of f in the center of the interval. The value of the peak located at 2250 is propagated until it is intersected with the geodesic mask. As δf () is constrained by the geodesic mask, the output of the transformation is g instead of f . C. Reconstructions We define the morphological reconstruction by dilation of mask g by marker function f , Rfδ (g), as the geodesic dilation of f inside g, using a geodesic
MATHEMATICAL MORPHOLOGY
265
disk whose radius ρ is infinite: g
Rfδ (g) = δ∞ (f ).
(34)
The dual operator of this type of reconstruction is the reconstruction by erosion, Rfε (g), defined as g
Rfε (g) = ε∞ (f ).
(35)
Equations (32) and (33) suggest we iterate geodesic dilations and erosions until idempotence, that is, until no change occurs from one iteration to the next. It is, however, preferable to use either recursive algorithms (Meyer, 1987), or Vincent’s (1990, 1993) fast sequential algorithms, which were designed for generic computer architectures. The extension of these algorithms to complex signals is simple. Figure 14 shows an example of complex reconstruction by dilation. A reconstruction by dilation preserves the geodesic mask maxima that have been marked by the function to be dilated. Reconstructions by erosion preserve minima in the same way. Reconstructions are invariant to scaling, amplitude, and shapes. These operators exclusively depend on the topological properties of objects.
F IGURE 14. Complex morphological reconstruction by dilation of mask g(t) by marking function f (t). f (t) is identical to f (t) in Figure 13.
266
RIVEST
D. Openings and Closings by Reconstruction The morphological opening is an erosion, followed by a dilation: γB (f ) = δBˇ εB (f ) ,
(36)
where B is the structuring element and Bˇ is the transposition of B. The opening by reconstruction replaces the dilation with a reconstruction by dilation: δ (f ). γBR (f ) = R[ε B (f )]
(37)
The morphological closing is defined:
φB (f ) = εBˇ δB (f ) .
(38)
The closing by reconstruction is ε φBR (f ) = R[δ (f ). B (f )]
(39)
These operators have been defined on binary and gray-tone images. Their definition on complex signals is the same. Dilations, erosions, and reconstructions become complex transformations. Figures 15–17 show an example of
F IGURE 15. Input spectrogram. The horizontal axis is the time and the vertical axis is the frequency. The units are arbitrary.
MATHEMATICAL MORPHOLOGY
267
F IGURE 16. Complex horizontal opening with a horizontal (constant frequency) structuring element on a spectrogram. All structures that had a duration shorter than the duration of the structuring element were deleted.
F IGURE 17. Complex horizontal opening by reconstruction with the same structuring element as in Figure 16. Note that structures that were connected with the large constant frequency portion of the signal were preserved.
268
RIVEST
such an opening on a spectrogram, which is a sequence of overlapped fast Fourier transforms (FFT). On that example, we used a horizontal structuring element. The complex morphological opening preserved the horizontal components, that is, the spectrogram sections where the signal had a constant frequency. The opening by reconstruction also preserved all the components that were connected to these constant frequency sections. The structuring element length was 201 samples. It should be noted that the higher-frequency component at the bottom of the spectrogram has not been fully removed by the opening by reconstruction. This is because there is a low-amplitude path between the horizontal segment starting approximately at sample 1500 and the high-frequency component of the signal. This path is caused by the brutal frequency transition of the low-frequency component of the signal at that time; such a transition widened the spectrum and spread the spectral power over the whole FFT. This clearly shows that the reconstruction is scale independent. That transition is over a single FFT, an event two orders of magnitude smaller than the scale of the structuring element. The complex opening by reconstruction has the same properties as its graytone counterpart: idempotence, antiextensivity, and increasing and duality with complex closing by reconstruction. Openings and closings by reconstruction better preserve the shapes than their morphological counterparts. E. Regional Maxima and Minima Beucher (1990) defined the regional maximum as a plateau, that is, a region of constant value, which can be accessed only through ascending paths. A regional minimum is a plateau accessible through descending paths only. The idea of ascending or descending paths depends on the order relationship used. Regional maxima and minima can be extended to complex signals by using the complex order relationship. Classically, these objects are detected using reconstructions. For gray-tone pictures, the regional maxima of f , Rmax(f ), is composed of all the points x such that $ % Rmax(f ) = x: f (x) = Rfδ −h (f ) , h → 0, (40) where h is a small value, tending toward 0. Similarly, regional minima are detected: $ % Rmin(f ) = x: f (x) = Rfε +h (f ) ,
h → 0.
(41)
These definitions have to be modified for complex morphology, because our order relationship is not the same as it is in gray-scale morphology. For gray-scale signals, f − h < f, h > 0. This is generally not true for complex signals.
269
MATHEMATICAL MORPHOLOGY
F IGURE 18. Regional maxima on a Fourier transform. The detection function is drawn with asterisks. Only the amplitude of the Fourier transform is shown.
We therefore propose the following definitions for the complex regional minima and maxima: $ % δ Rmax(f ) = x: f (x) = R(1−λ)f (f ) , λ → 0, (42) where λ ∈ R+ and f (x) ∈ Cn , and $ % ε Rmin(f ) = x: f (x) = R(1+λ)f (f ) ,
λ → 0.
(43)
Figure 18 shows an example of regional maxima detection on a Fourier transform. For the purpose of clarity, only the amplitude of the Fourier transform is shown. F. Domes and Lakes The algorithms used to find regional maxima and minima are extended by using arbitrary h. In gray-scale morphology, these extensions are called “domes” and “lakes”: $ % Domeh (f ) = x: f (x) = Rfδ −h (f ) , (44)
270
RIVEST
where h ∈ R+ , f (x) ∈ Rn , and
$
%
Lakeh (f ) = x: f (x) = Rfε +h (f ) .
(45)
Regional maxima and minima are sensitive to noise. Noise introduces small extraneous maxima and minima in signals. If the noise level is known, dome detectors can alleviate this problem by merging together maxima that are separated by altitude variations smaller than h. Lake detectors perform in the same fashion on signal minima. Complex domes and lakes are an extension of complex regional maxima and minima: $ % δ Domeλ (f ) = x: f (x) = R(1−λ)f (f ) , (46) where λ ∈ ]0, 1], f (x) ∈ Cn , and $ % ε Lakeλ (f ) = x: f (x) = R(1+λ)f (f ) .
(47)
Figure 19 shows an example of a complex dome detector on the same Fourier transform used in Figure 18. The function f (x) was lowered by
F IGURE 19. The operation of a complex dome detector on a Fourier transform. Only the amplitude is shown.
MATHEMATICAL MORPHOLOGY
F IGURE 20. is shown.
271
Same as Figure 19. λ is changed and the domes become larger. Only the amplitude
1 − λ = 0.8 and this new function, 0.8f (x), was dilated into f (x). The detection was done by comparing the reconstruction result with f (x). It is displayed with asterisks. Figure 20 shows the effect of changing λ. By lowering f (x) further we merged smaller domes into larger ones. It is also possible to consider creating a hierarchy of domes; this hierarchy would depend on their relative amplitudes and the way they merge together as λ changes. Geodesic operators are controlled by topological parameters. The marker function that controls the reconstruction of objects is such a parameter. This example illustrates how the idea of connectedness is used to manipulate signals. In this article, the idea of connected paths has been extended from graytone functions to complex signals by modifying the usual order relationship. Minima, maxima, and dome and lake detectors are other types of operators that merge the idea of signal intensity and topology. Openings and closings by reconstruction are a class of operators that combines both the idea of topology with the idea of scale or shape through the use of a structuring element. Topological operators are very robust because they are scale and intensity invariant.
272
RIVEST
VI. T OP H ATS Openings and closings suppress objects that are smaller than the structuring element. These have an effect that is similar to low-pass filters. Top hats, invented by Meyer (1979) and also described in Serra (1982), are the morphological equivalent of high-pass filters. There are two types of top hat transformations: the white and the black top hat. The white top hat with structuring element B is defined: wTopf (B) = f − γf (B).
(48)
The opening removes small bright peaks in f ; the difference between f and the opening then enhances the very peaks that were removed by the opening. The black top hat is defined: bTopf (B) = φf (B) − f.
(49)
This top hat enhances minima that are smaller than the structuring element B. It should be noted that these minima appear as maxima because of the order in which the subtraction is done. These definitions do not fundamentally change for complex signals. It is simply a matter of using complex openings and closings instead of the classical operators. These two transformations, coupled with thresholding, are often used as peak detectors. Figure 21 shows an example of a white top hat applied on the Fourier transform of a chirped radar signal. Figure 22 shows an example of a black top hat on the same signal. It should be noted that this operator creates new amplitudes and phases because of the arithmetic operation. As we can see in Figures 21 and 22, the operation is neither extensive nor antiextensive. The size of the structuring element in both examples is 11 samples in length. In Figure 21, all the amplitude peaks smaller than 11 samples were enhanced. In Figure 22, all the amplitude valleys were inverted and enhanced by the black top hat. Openings and closings by reconstruction are also used to create top hats. These are called reconstruction top hats.
VII. M ORPHOLOGICAL G RADIENTS Gradients are operators frequently used in segmentation because they enhance variations in signals. In images, these variations are assumed to be edges. Beucher (1990) invented the morphological gradient and described its various aspects. A survey of the various morphological gradients is available (Rivest et al., 1993). Morphological gradients are based on three different combinations of operators:
MATHEMATICAL MORPHOLOGY
273
F IGURE 21. Complex white top hat on a Fourier transform. Top graph: input and output amplitudes. Middle graph: input phase in radians. Bottom graph: top hat phase.
F IGURE 22. Complex black top hat on a Fourier transform. Top graph: input and output amplitudes. Middle graph: input phase in radians. Bottom graph: top hat phase.
274
RIVEST
• The arithmetic difference between a dilation and an erosion with the same structuring element B: MgB (f ) = δB (f ) − εB (f ).
(50)
This is the morphological gradient, also called the Beucher gradient. • The difference between a dilation and the input function f , called the external gradient: EgB (f ) = δB (f ) − f.
(51)
• The difference between the input function f and an erosion, called the internal gradient: IgB (f ) = f − εB (f ).
(52)
In digital images, the edges are either on the external contours of the shapes or on the internal contours or on both. The external gradient generates external edges, while the internal gradient generates internal ones. The morphological gradient generates both. Usually, the size of the structuring element is as small as possible. In the continuous case, when B is infinitesimally small, the distinction between the three basic types of gradients vanishes and Beucher (1990) demonstrated that it was the same as the gradient modulus |∇f (x)|, where ∂f ∂f ∇f (x) = , ∂x1 ∂x2 in the image plane. Figure 23 illustrates the morphological gradient on a one-dimensional profile. Complex morphological gradients use the same definitions. Figure 24
F IGURE 23.
Illustration of the morphological gradient on a one-dimensional profile.
MATHEMATICAL MORPHOLOGY
275
F IGURE 24. Beucher gradient on a complex signal. Top two graphs: amplitude and phase in radians of the test signals. Bottom two graphs: amplitude and phase in radians of the Beucher gradient.
shows the application of a complex Beucher gradient on a test signal, while Figure 25 shows a Beucher gradient on a chirped radar signal. The gradient indeed enhances amplitude variations, as seen on the third graph of Figure 24. However, it also creates amplitudes and phases that did not exist in the input signal. This is caused by the arithmetic operation. This gradient, like any derivative-based operator or high-pass filter, enhances small features in the signal. These features are, more often than one might wish, noise. The size of the structuring element may also be arbitrary. Gradients using large structuring elements are called “thick gradients.” The edges then tend to have the same thickness as the diameter of the structuring element.
VIII. C OMPLEX WATERSHED The watershed transformation comes from an analogy between an image and a digital elevation model. A high intensity is analogous to a high altitude. Minima become lakes and maxima become mountains. Each “lake” in an image drains an area called a catchment basin. These areas are separated by watersheds. The watershed transformation works by flooding the image
276
RIVEST
F IGURE 25.
Complex Beucher gradient on a chirped (real) signal.
through its minima until catchment basins collide. Such a collision occurs only at the divides. Whenever a collision occurs, a dam is erected to prevent mixing of water coming from different minima. At the end of the flooding process, only catchment basins separated by dams remain; the dams usually track the image crest-lines. This transformation, used in conjunction with morphological gradients, constitutes the basis of the morphological approach to segmentation (Beucher, 1982, 1990; Beucher and Lantuéjoul, 1979; Meyer and Beucher, 1990). The morphological approach to the segmentation problem involves the use of markers. Markers located inside the object boundaries and markers that designate the background are created. It is assumed that between the shape and the background markers lies the edge that outlines the object to be segmented. There are many ways to create those markers. Most of them use image processing and pattern recognition techniques. However, it is often preferable to generate the markers manually. The operator does so by selecting the objects to be segmented. The application then combines the operator’s natural intelligence and expertise with the mechanical and tedious aspects of finding the edges and performing the measurements. Vincent and Soille (1991) invented a fast algorithm that uses pixel queues. Meyer and Beucher modified it in two ways: first by flooding through markers
MATHEMATICAL MORPHOLOGY
277
only and second by using hierarchical queues (Meyer, 1990; Meyer and Beucher, 1990). Regardless of the implementation, all these flooding simulations are based upon the order relationship in R. Higher-altitude pixels are flooded last. With the order relationship proposed in this article, it is possible to determine whether a complex pixel is “higher” than another one. Therefore, the extension of watershed algorithms through flooding simulations is done by a mere change of order relationships. Figure 26 shows an example of such a watershed, applied on a timefrequency representation of the chirped radar signal we used in other examples. The horizontal axis represents frequencies, while the vertical axis represents the time at which a windowed FFT has been computed on that signal. For the purpose of this illustration, we flooded the complex spectrogram using Meyer’s hierarchical queue. To generate the markers by which the complex image would be flooded, we assumed there would be only one frequency crest-line that was to be tracked over the spectrogram. This translates into the assumption that there would be only two catchment basins separated by
F IGURE 26. The complex watershed on a spectrogram. Left: the amplitude of the spectrogram, in gray scale, black being a high amplitude. Right: the complex watershed of this spectrogram. The horizontal axis is the frequency while the vertical axis is the time. The units are arbitrary because it is a series of FFTs.
278
RIVEST
that crest-line. We therefore positioned the markers on both sides of the image and used the watershed to grow the catchment basins at the locations we knew the frequency peaks were not located, that is, at frequency 0 and at normalized frequency π. The watershed transformation is therefore expanded with little difficulty to complex signals. The only modification needed was to change the order relationship used in the flooding simulations. All the present watershed algorithms use the usual order relationship on R. They can all be expanded to complex signals through that modification.
IX. M EASUREMENTS A. Basic Measurements and Minkowski Functionals In morphology, measurements are done using the Minkowski functionals shown in Table 1. These measurements and their linear combinations, applied on a signal transformation, generate every possible measurement (Serra, 1982). These are the volume, the area, the norm, the perimeter, and the connectivity number. A complex signal is a three-dimensional object; it is described with three axes: the time, the real, and the imaginary axes. In practice, the most important measure is the volume because it yields measurements that always have a physical signification. This is caused by the discrepancies between the physical units that describe signals; the time axis is expressed in seconds while the other axes are in volts. The other functionals, with the exception of the connectivity number, do not preserve physical units (Rivest et al., 1992). TABLE 1 M INKOWSKI F UNCTIONALa Space dimensions
k=0
k=1
n=0 n=1
Connectivity number Perimeter
n=2
Area
Connectivity number Perimeter
n=3
Volume
Area
k=2
Connectivity number Norm
k=3
Connectivity number
a Minkowski functionals W k where n is the space dimension and k = 0, 1, . . . , n. These are the n
volume, the area, the norm, the perimeter, and the connectivity number.
MATHEMATICAL MORPHOLOGY
279
With gray-tone images, the object from which a volume is measured is the gray-tone function and all the space below it; it is the umbra of the function. The umbra is dependent upon the order relationship used. For complex signal samples, as shown in Figure 3, it is a disk with its radius equal to the amplitude of the signal sample. Therefore, the umbra of a complex function f (t) is the juxtaposition of the circular disks of radii |f (t)|. At time t, the area of the umbra is π |f (t)|2 . Therefore, the volume, or the Lebesgue measure, M[f (t)] of the umbra of the complex function f (t) is the integral of all the superposed disks:
∞ 8 8 8f (t)82 dt. M f (t) = π (53) −∞
This measurement is also the energy of f (t) times π . For RF signals in particular, the integral
∞ f (t) dt
(54)
−∞
tends to zero because there is usually no DC component in this type of signal. Although the average component of an image is an important feature, RF circuit designers strive to remove that component because it does not carry information and it is considered parasitic. This severely limits the usefulness of such a measurement because the DC component is in this case an artifact. B. Granulometries and Pattern Spectra Granulometry and pattern spectrum are two methodologies that are used with goals similar to spectral analysis. They make it possible to obtain size distributions of objects in pictures by using morphological openings and closings. The granulometry and pattern spectrum proposed in this article share the same axiomatic basis as the classical granulometry. The tools were modified by using complex openings and closings instead of the usual openings; the Lebesgue measure was replaced by power measurements. Granulometries are an axiomatization of the sieving process. A size distribution of a powder can be obtained by passing it through progressively finer sieves and by weighting the amount of powder that remains in each sieve. Matheron (1975) formalized this process and discovered that a family of openings and closings of size λ was equivalent to a family of sieves. The Lebesgue measure is equivalent to the action of measuring masses.
280
RIVEST
A granulometry of function f with a family of structuring elements B scaled by λ is then defined: G(f, λB) = M γλB (f ) , (55) where M[γλB (f )] is the Lebesgue measure of the opening of f with structuring element B scaled by a real, positive factor λ. The Lebesgue measure for binary images is the surface of objects. For gray-tone images, it is the volume and for functions, it is
T /2
M f (t) =
f (t) dt,
(56)
−T /2
T → ∞ being the length of the integration interval. Maragos (1989) defined the pattern spectrum as d M γλB (f ) . (57) dλ The pattern spectrum measures what was removed by the openings, that is, the objects that were trapped in the sieves. In contrast, the granulometry measures what passed through the sieves. The pattern spectrum actually measures the difference between successive openings. Maragos also defined the pattern spectrum for negative sizes using closings instead of openings: PS(f, λB) = −
d M φ−λB (f ) , λ < 0. (58) dλ This is the equivalent of performing the pattern spectrum on the complement of an image, because the closing is the dual of the opening. There are granulometries based on closings instead of openings as well. These are also the same as computing a granulometry on the complement of an image. PS(f, λB) =
C. Power Granulometry Classical granulometries use the Lebesgue measure to assess the amount of signal elements that survived the openings. RF signals are usually symmetrical with respect to the time axis, unless there is an undesirable DC component added to them. Complex morphological operators tend to preserve such symmetry. Consequently, the Lebesgue measure tends to zero for such signals. As λ increases, any deviation of the Lebesgue measure from zero is likely caused by artifacts such as border and sampling effects. This situation is corrected by replacing the Lebesgue measure with the signal power. The
MATHEMATICAL MORPHOLOGY
281
power granulometry is then defined: 1 G (f, λB) = T
T /2
P
1 = T
C ∗ C γλB f (t) γλB f (t) dt
−T /2
T /2
8 C 8 8γ f (t) 82 dt, λB
(59)
−T /2
C [f (t)] is the complex opening with structuring element B scaled where γλB C [f (t)]∗ is the complex conjugate of γ C [f (t)]. by a factor λ. γλB λB Performing the Lebesgue measure on the difference between successive openings is the same as performing the derivative of the Lebesgue measure over successive openings:
% 1$ M γλB f (t) − γ(λ+)B f (t) →0 $ % d = − M γ(λ)B f (t) . dλ lim
(60)
It is no longer the case for nonlinear measurements such as power. Therefore, it is no longer possible to use the current definition of pattern spectrum in Eq. (57). It is preferable to compute the power of the residues between the successive openings. This yields the following definition for the power pattern spectrum: 1 PS (f, λB) = lim →0 T
T /2
P
8 C 82 8 dt, 8γ f (t) − γ C λB (λ+)B f (t)
−T /2
λ ≥ 0.
(61)
Following Maragos, the power pattern spectrum for negative sizes is 1 PS (f, λB) = lim →0 T
T /2
P
λ ≤ 0. φ C () is the complex closing.
8 C 8φ
(−λ+)B
82 C f (t) − φ−λB f (t) 8 dt,
−T /2
(62)
282
RIVEST
D. Examples To illustrate these concepts, the following signals have been used: 1. A simulated fixed frequency radar pulse. The period of the sinusoid was 50 samples. 2. Another simulated fixed frequency radar pulse. The period of the sinusoid was 25 samples. 3. A simulated signal composed of two tones. The period of the first tone was 50 samples and it was 25 for the second tone. 4. A real chirped radar signal, with the period of the carrier starting at 50 samples and ending at 85 samples. These signals were analyzed with a power granulometry and a power pattern spectrum with a flat structuring element of diameter λ = {3, 5, 7, . . . , 41}. Figure 27 shows the simulated pulse with a period of 50 samples. The signal instantaneous power is periodic with a 25-sample period. Figure 28 shows the resulting power granulometry and power pattern spectrum. The main feature of the power pattern spectrum is its peak at 25 samples. This corresponds well with the period of the power fluctuations present in the signal. The rate of
F IGURE 27.
Simulated radar pulse. Period = 50 samples.
MATHEMATICAL MORPHOLOGY
F IGURE 28. 50 samples.
283
Power granulometry and power pattern spectrum of a simulated pulse. Period =
growth of this peak increases with λ. This reflects the shape of the sinusoid; the structuring element progressively brings the signal amplitude down to zero, which is where the power fluctuations are the largest. Figure 29 shows the power granulometry and pattern spectrum of the simulated pulse featuring a carrier period of 25 samples. The granulometry and the pattern spectrum scaled by a factor two along the λ axis. The peak in the pattern spectrum has moved to a period of 13 samples. The remaining peak, at λ = 25 samples, was caused by sampling artifacts. Figure 30 shows the measurements of a two-tone signal. It should be noted that even though this signal is the result of the linear combination of two signals, this is not visible in the pattern spectrum. This is to be expected, because the pattern spectrum shows the shapes in the time domain instead of the frequency domain. The successive peaks can be observed in Figure 31, which details a section of the pulse. These peaks and valleys do indeed exhibit the various sizes shown on the pattern spectrum. Figure 32 shows the measurements of a chirped radar signal. The difference with the other signals is striking. The chirped radar features wider peaks
284
F IGURE 29. 25 samples.
RIVEST
Power granulometry and power pattern spectrum of a simulated pulse. Period =
F IGURE 30. Power granulometry and power pattern spectrum of a simulated pulse. Superposition of two tones (period: 20 and 50 samples).
MATHEMATICAL MORPHOLOGY
F IGURE 31.
F IGURE 32.
Pulse fragment, superposition of two tones (period: 20 and 50 samples).
Power granulometry and power pattern spectrum of a chirped radar pulse.
285
286
RIVEST
because the size of the instantaneous power variations varies more widely than in the case of the simulated signals. This section presented a new type of granulometry and pattern spectrum. These are more appropriate to the context of complex signal processing than the classical granulometry and pattern spectrum for the following reasons: first, complex signals are often represented using complex functions and second, these signals are compared using their amplitude and power instead of the value of the signals themselves. The granulometry was modified in two ways. The Lebesgue integral was replaced by the power measurement. The morphological opening was also replaced with a complex opening. It turned out that this complex opening was more appropriate for purely real signals than the classical opening. The definition of pattern spectrum was also modified. Instead of computing the first derivative of the granulometric curve relative to the structuring element size λ, the power measurement on the difference between adjacent openings was computed. Granulometries and pattern spectra perform time-domain measurements of objects embedded in signals. The transformations used to obtain these measurements are nonlinear; therefore, the concept of linear superposition is not valid in this context. Two sinusoids cannot be separated in this fashion; the superposition of two pattern spectra does not directly correspond to the pattern spectrum of the two signals that featured these spectra. Granulometries and pattern spectra have similar goals to spectral analysis, yet they are complementary to this methodology.
X. C ONCLUSION In this chapter, we expanded mathematical morphology to complex signals. This expansion was carried out by changing the order relationship used to compare two signal samples. This order relationship is a total order relationship—the same type as the one that orders gray-tone pixels. It is also compatible with the practice of RF and complex signal processing, for which energy and power considerations are of fundamental importance. It therefore has physical and practical significance. Moreover, the relationship gracefully degenerates into the same order relationship used in gray-tone morphology when complex signals become real and positive. Yet there is an inevitable degree of arbitrariness in this order relationship. For instance, it might be argued that the lexicographic ordering used when two samples feature the same amplitude should be reversed, therefore giving priority to the imaginary part instead of the real part as we suggested in this article. Indeed, there is no reason to prefer one over the other; a choice must be made.
MATHEMATICAL MORPHOLOGY
287
A new complementation procedure was also invented because of the new order relationship. The usual gray-tone complement was not applicable to that class of signals. For instance, the negation, which is used on Rn , does not reverse our order relationship. In the same way as the new order relationship, the complementation also degenerates into the usual gray-tone complement when the complex signals become real and positive. Because of this new total order relationship and this new complementation procedure, all the mathematical results that were applicable to gray-tone morphology are also applicable to complex signals. Fundamental properties such as extensivity and antiextensivity have to be assessed through the new order relationship. Duality is achieved by using the complex complement. In this article, we created the basic morphological operators using the new order relationship. We then carried on by presenting examples of complex openings and closings. Geodesic transformations were then presented. The extension of the classical transformation to complex transformations was straightforward, with the exception of the complex regional minima and maxima detectors in Section V.E. The additions and subtraction by constants were replaced by multiplications because this is more compatible with the order relationship. The shape of the umbra of a complex function is radically different from the shape of a gray-tone function. Therefore, the implementation of the Lebesgue measure was changed in this article. It turned out that this measure is the same as the energy of the complex signal under analysis, normalized by a constant. In this research, we also discovered that this new mathematical morphology was also useful in processing real signals. It turns out that this kind of morphology is more appropriate to AC signals, that is, signals that are not characterized by their DC component, such as audio, ultrasound, or RF signals. Classical mathematical morphology does not perform well on those signals because it tends to create undesirable artifacts such as increasing the energy of the output, regardless of whether an operator is extensive or not, and creating artificial DC components. This extension of mathematical morphology enables us now to apply it to the following examples: • Complex signals, such as communication and radar signals that feature an in-phase and an in-quadrature component. • Fourier transforms. • Spectrograms and other time-frequency representations. At the present time, RF signals are best understood using linear methods such as the Fourier transform and convolutions. This is perfectly justifiable; after all, RF signals propagate through linear media. However, nonlinearities do exist in most RF systems by design. When nonlinearities are accidental,
288
RIVEST
Fourier analysis usually breaks down. Harmonics and intermodulation distorsion make the spectrum of the signal unrecognizable. When nonlinearities exist by design, using linear tools to analyze and model the RF system is difficult. These nonlinearities are frequent: they exist whenever modulation is used and they exist in pattern recognition systems whenever information is irreversibly destroyed. Morphological tools show some promise in filtering, pattern recognition, and feature extraction of radar signals. Granulometries can be used with goals similar to those of Fourier analysis. Another application of the frequency representation of signals is the modeling of linear operators. Granulometries, by obtaining the size distribution of a signal, can also model the spatial component of nonlinear transformations.
R EFERENCES Astola, J., Haavisto, P., Neuvo, Y. (1990). Vector median filters. Proc. IEEE 78 (4), 678–689. April. Barnett, V. (1976). The ordering of multivariate data. J. R. Statist. Soc. A 139 (Part 3), 318–355. April. Beucher, S. (1982). Watersheds of functions and picture segmentation. In: ICASSP 82, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (Paris), pp. 1928–1931. May. Beucher, S. (1990). Segmentation d’images et morphologie mathématique. Ph.D. thesis, Ecole des Mines de Paris. June. Beucher, S., Lantuéjoul, C. (1979). Use of watersheds in contour detection. In: International Workshop on Image Processing: Real-Time Edge and Motion Detection/Estimation (Rennes. CCETT/IRISA). September. Chanussot, J., Lambert, P. (1998). Total ordering based on space filling curves for multivalued morphology. In: Heijmans, H., Roerdink, J. (Eds.), Mathematical Morphology and Its Applications to Image and Signal Processing. Kluwer Academic Publishers, Amsterdam, pp. 51–58. Comer, M.L., Delp, E.J. (1992). An empirical study of morphological operators in color image enhancement. In: Proceedings SPIE, Image Processing Algorithms and Techniques III, vol. 1657, pp. 314–325. Comer, M.L., Delp, E.J. (1999). Morphological operations for color image processing. J. Elect. Imaging 8 (3), 279–289, July. Heijmans, H. (1994). Morphological Image Operators. Advances in Electronics and Electron Physics. Academic Press, Boston. Lantuéjoul, C., Beucher, S. (1981). On the use of geodesic metric in image analysis. J. Microsc. 121, 39–49. Maragos, P. (1989). Pattern spectrum and multiscale shape representation. IEEE Trans. Pattern Anal. Mach. Intell. 11 (7), 701–716.
MATHEMATICAL MORPHOLOGY
289
Marr, D. (1976). Early processing of visual information. Phil. Trans. R. Soc. London B 275 (942), 483–524. Matheron, G. (1975). Random Sets and Integral Geometry. John Wiley & Sons, New York. Meyer, F. (1979). Cytologie quantitative et morphologie mathématique. Ph.D. thesis, Ecole des Mines de Paris. Meyer, F. (1987). Algorithmes séquentiels. In: Onzième Colloque GRETSI. June. Meyer, F. (1990). Algorithmes ordonnés de ligne de partage des eaux. Technical Report N-06/90/MM, Centre de Morphologie Mathématique Ecole des Mines de Paris. Meyer, F., Beucher, S. (1990). Morphological segmentation. J. Visual Commun. Image Represent. 1 (1), 21–46. September. Rivest, J.-F. (2004). Morphological operators on complex signals. Signal Process. 84, 133. January. Rivest, J.-F., Serra, J., Soille, P. (1992). Dimensionality and image analysis. J. Visual Commun. Image Represent. 3 (2). June. Rivest, J.F., Soille, P., Beucher, S. (1993). Morphological gradients. J. Electron. Imaging 2 (4), 326–336. October. Serra, J. (1982). Image Analysis and Mathematical Morphology. Academic Press, London. Serra, J. (1987). Morphological optics. J. Microsc. 145 (Pt1), 1–22. January. Serra, J. (1988). Introduction to morphological filters. In: Image Analysis and Mathematical Morphology; Theoretical Advances. Academic Press, London, pp. 101–114. Chapter 5. Talbot, H., Evans, C., Jones, R. (1998). Complete ordering and multivariate mathematical morphology. In: Heijmans, H., Roerdink, J. (Eds.), Mathematical Morphology and Its Applications to Image and Signal Processing. Kluwer Academic Publishers, Amsterdam, pp. 27–34. Vincent, L. (1990). Algorithmes morphologiques à base de files d’attente et de lacets. Extension aux graphes. Ph.D. thesis, Ecole des Mines de Paris. May. Vincent, L. (1993). Morphological grayscale reconstruction in image analysis: Applications and efficient algorithms. IEEE Transact. Image Process. 2 (2), 176–201, April. Vincent, L., Soille, P. (1991). Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transact. Pattern Anal. Machine Intell. 13 (6), 583–598. June.
This page intentionally left blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 144
Ranking Metrics and Evaluation Measures JIE YUa , JAUME AMORESb , NICU SEBEc , AND QI TIANa a Department of Computer Science, The University of Texas at San Antonio, San Antonio,
Texas 78249, USA b IMESIA Research Group, INRIA, Rocquencourt, France c Faculty of Science, The University of Amsterdam, Amsterdam, The Netherlands
I. Introduction . . . . . . . . . . . . A. Similarity Estimation in Computer Vision . . B. Distance Metric as Similarity Measurement . II. Distance Metric Analysis . . . . . . . . A. Maximum Likelihood Approach . . . . . B. Distance Metric Analysis . . . . . . . C. Generalized Distance Metric Analysis . . . III. Boosting Distance Metrics for Similarity Estimation A. Motivation . . . . . . . . . . . B. Boosted Distance Metrics . . . . . . . C. Related Work . . . . . . . . . . IV. Experiments and Analysis . . . . . . . . A. Distance Metric Analysis in Stereo Matching . B. Distance Metric Analysis in Motion Tracking . C. Boosted Distance Metric on Benchmark Data Set D. Boosted Distance Metric in Image Retrieval . V. Discussion and Conclusions . . . . . . . References . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
291 292 294 295 295 297 298 300 300 301 303 303 303 308 310 310 312 314
I. I NTRODUCTION Similarity has been a research topic in the field of psychology for decades, for example, early researchers were Wallach (1958) and Tversky and Krantz (1977). Recently there has been a huge resurgence in the topic. Similarity judgments are considered to be a valuable tool in the study of human perception and cognition, and play a central role in theories of human knowledge representation, behavior, and problem solving. Tversky (1977) describes the similarity concept as “an organizing principle by which individuals classify objects, form concepts, and make generalizations.” 291 ISSN 1076-5670/05 DOI: 10.1016/S1076-5670(06)44004-0
Copyright 2006, Elsevier Inc. All rights reserved.
292
YU ET AL .
A. Similarity Estimation in Computer Vision Retrieval of images by similarity, that is, retrieving images that are similar to an already retrieved image (retrieval by example) or to a model or schema, is a relatively old idea. Some might date it to antiquity, but more seriously it appeared in specialized geographic information systems databases around 1980, in particular, in the “query by pictorial example” system of IMAID (Chang and Fu, 1980). From the start it was clear that retrieval by similarity called for specific definitions of what it means to be similar. In the mapping system, a satellite image was matched to existing map images from the point of view of similarity of road and river networks, easily extracted from images by edge detection. Apart from theoretical models (Aigrain, 1987), it was only in the beginning of the 1990s that researchers started to look at retrieval by similarity in large sets of heterogeneous images with no specific model of their semantic contents. The prototype systems of Kato (1992), followed by the availability of the QBIC commercial system using several types of similarities (Flicker et al., 1995), contributed to making this idea more and more popular. Typically, a system for retrieval by similarity rests on three components: • Extraction of features or image signatures from the images, and an efficient representation and storage strategy for this precomputed data. • A set of similarity measures, each of which captures some perceptively meaningful definition of similarity and should be efficiently computable when matching an example with the whole database. • A user interface for the choice of which definition of similarity should be applied for retrieval, presentation of retrieved images, and supporting relevance feedback. The research in the area has made the following evident: • A large number of meaningful types of similarity can be defined. Only part of these definitions is associated with efficient feature extraction mechanisms and (dis)similarity measures. • Since there are many definitions of similarity and the discriminating power of each of the measures is likely to degrade significantly for large image databases, the user interaction and the feature storage strategy components of the systems will play an important role. • Visual content-based retrieval is best used when combined with the traditional search, both at the user interface and at the system level. The basic reason for this is that content-based retrieval is not seen as a replacement of parametric (SQL), text, and keywords search. The key is to apply contentbased retrieval where appropriate, which is typically where the use of text
RANKING METRICS AND EVALUATION MEASURES
293
and keywords is suboptimal. Examples of such applications are where visual appearance (e.g., color, texture, and motion) is the primary attribute as in stock photo/video, art, etc. A concept of similarity is inherently present in stereo matching. In a stereo matching setup, shots of a given static scene are captured from different viewpoints and the resulting images differ slightly due to the effect of perspective projection. The following features distinguish stereo matching from image matching in general: • The important differences in the stereo images result from the different viewpoints, and not, for example, from changes in the scene. We therefore seek a match between two images, as opposed to a match between an image and an abstract model (although the latter may be an important step in determining the former). • Most of the significant changes will occur in the appearance of nearby objects and in occlusions. Additional changes in both geometry and photometry can be introduced in the film development and scanning steps, but can usually be avoided by careful processing. If the images are recorded at very different times, there may be significant lighting effects. • Modeling based on stereo matching generally requires that, ultimately, dense grids of points are matched. Ideally, we would like to find the correspondences (i.e., matched locations) of every individual pixel in both images of a stereo pair. However, it is obvious that the information content in the intensity value of a single pixel is too low for unambiguous matching. In practice, therefore, coherent collections of pixels are matched. Matching is complicated by several factors related to the geometry of the stereo images. Some areas that are visible in one image may be occluded in the other, for instance, and this can lead to incorrect matches. Periodic structures in the scene can cause a stereo matcher to confuse a feature in one image with features from nearby parts of the structure in the other image, especially if the image features generated by these structures are close together compared with the disparity of the features. If there is a large amount of relief in the scene (e.g., a vertical obstruction that projects above the ground plane in an aerial view), the corresponding features may be reversed in their positions in the two stereo images. Similarity is also present in a video sequence where motion is the main characterizing element. Here the frames differ slightly due to a change in the relative position of spatial entities in the sequence or to camera movement. Methods that compute an approximate estimation of motion follow two approaches. One method takes into account temporal changes of gray-level primitives, from one frame to the following one, and computes a dense
294
YU ET AL .
flow usually at every pixel of the image. The other method is based on the extraction of a set of sparse characteristic features of the objects, such as corners or salient points, and their tracking in subsequent frames. Once interframe correspondence is established, and constraints are formulated on object rigidity, motion components are obtained by solving a set of nonlinear equations (Aggarwal and Nandhakumar, 1988). Gudivada and Raghavan (1995) listed different possible types of similarity for retrieval: color similarity, texture similarity, shape similarity, spatial similarity, etc. Some of these types can be considered as global image characteristics or only in some parts of the image, and can be considered independently of scale or angle, depending on whether we are interested in the scene represented by the image or in the image itself. Representation of features of images—such as color, texture, shape, and motion—is a fundamental problem in visual information retrieval. Image analysis and pattern recognition algorithms provide the means to extract numeric descriptors that give a quantitative measure of these features. Computer vision enables object and motion identification by comparing extracted patterns with predefined models. B. Distance Metric as Similarity Measurement In many science and engineering fields, the similarity between two features is determined by computing the distance between them using a certain distance metric. In computer vision as well as some other applications, the Euclidean distance or SSD (L2 —sum of the squared differences) is one of the most widely used metrics. However, it has been suggested that it is not appropriate for many problems (Zakai, 1964). From a maximum likelihood perspective, it is well known that the SSD is justified when the feature data distribution is Gaussian (Sebe et al., 2000) while the Manhattan distance or SAD (L1 —sum of the absolute differences), another commonly used metric, is justified when the feature data distribution is Exponential (double or two-sided exponential). Therefore, which metric to use can be determined if the underlying data distribution is known or well estimated. The common assumption is that the real distribution should fit either the Gaussian or the Exponential. However, in many applications this assumption is invalid. Finding a suitable distance metric becomes a challenging problem when the underlying distribution is unknown and could be neither Gaussian nor Exponential. In content-based image retrieval (Smeulders et al., 2000) feature elements are extracted for different statistical properties associated with entire digital images, or perhaps with a specific region of interest. The heterogeneous sources suggest that the elements may be from different distributions. In
RANKING METRICS AND EVALUATION MEASURES
295
previous work, most of the attention focused on extracting low-level feature elements such as color-histogram (Swain and Ballard, 1991), wavelet-based texture (Haralick et al., 1973; Smith and Chang, 1994), and shape (Mehtre et al., 1997) with little or no consideration of their distributions. The most commonly used method for calculating the similarity between two feature vectors is still to compare the Euclidean distance between them. Although some work has been done to utilize the data model in similarity estimation for image retrieval (Tian et al., 2004; Amores et al., 2006; Sebe et al., 2000), the relation between the distribution model and the distance metric has not been fully studied yet. It has been justified that Gaussian, Exponential, and Cauchy distributions result in L2 , L1 , and Cauchy metrics, respectively. However, distance metrics that fit other distribution models have not been studied yet. The similarity estimation based on feature elements from unknown distributions is an even more difficult problem. Here, based on our previous work (Tian et al., 2004; Amores et al., 2006), we propose a guideline to learn a robust distance metric for accurate similarity estimation. The rest of this article is organized as follows. Section II presents a distance metric analysis using the maximum likelihood approach. Section III describes the boosted distance metric. In Sections IV and V we apply the new distance metrics to estimate the similarity in a stereo matching application, motion tracking in a video sequence, and content-based image retrieval. Discussion and conclusions are given in Section V.
II. D ISTANCE M ETRIC A NALYSIS A. Maximum Likelihood Approach The additive model is a widely used model in computer vision regarding maximum likelihood estimation. Haralick and Shapiro (1993) consider this model in defining the M-estimate: “Any estimate μ defined by a minimization problem of the form min i f (xi − μ) is called an M-estimate.” Note that the operation “–” between the estimate and the real data implies an additive model. The variable μ is either the estimated mean of a distribution or, for simplicity, one of the samples from that distribution. Maximum likelihood theory (Haralick and Shapiro, 1993) allows us to relate a data distribution to a distance metric. From the mathematical– statistical point of view, the problem of finding the right measure for the distance comes down to the maximization of the similarity probability. We use image retrieval as an example for illustration. Consider first two subsets of N images from the database (D): X ⊂ D, Y ⊂ D, which according
296
YU ET AL .
to the ground truth are similar: X≡Y
or x i ≡ y i ,
i = 1, . . . , N
(1)
where x i ∈ X, y i ∈ Y , represent the images from the corresponding subsets. Equation (1) can be rewritten as xi = yi + di ,
i = 1, . . . , N
(2)
where d i represents the “distance” image obtained as the difference between image x i and y i . In this context the similarity probability between two sets of images X and Y can be defined: P (X, Y ) =
N 9
p(x i , y i )
(3)
i=1
where p(x, y) is the probability density function between images x and y. Independence across images is assumed. We define f (x i , y i ) = − log p(x i , y i ).
(4)
Then Eq. (3) becomes P (X, Y ) =
N 9 $
% exp −f (x i , y i )
(5)
i=1
where the function f is the negative logarithm of the probability density function of images x and y. According to Eq. (5) we have to find the function f that maximizes the similarity probability. This is the maximum likelihood estimator for X, given Y (Haralick and Shapiro, 1993). In the above considerations, we are talking about images, but this notion can be extended to feature vectors associated with the images when we are working with image features or, even, can be extended to pixel values in the images. Taking the logarithm of Eq. (5) we find that we have to minimize the expression N
f (x i , y i ).
(6)
i=1
In this case, according to Eq. (4) the function f does not depend individually on its two arguments, query image xi and the predicated one yi , but only on their difference. We thus have a local estimator and we can use f (d i ) instead of f (x i , y i ) where d i = x i − y i and the operation “–” denotes a
RANKING METRICS AND EVALUATION MEASURES
297
pixel-by-pixel difference between the images, or an equivalent operation in feature space. Therefore, minimizing Eq. (6) is equivalent to minimizing N
f (d i ).
(7)
i=1
Maximum likelihood estimation shows a direct relation between the data distribution and the comparison metric. It can be noted that the Gaussian model is related to the L2 metric, while the Exponential model is related to the L1 metric, so is a Cauchy metric, respectively (Tian et al., 2004; Sebe et al., 2000). B. Distance Metric Analysis The Gaussian, Exponential, and Cauchy distribution models result in the L2 metric, L1 metric, and Cauchy metric, respectively (Sebe et al., 2000). It is reasonable to assume that there may be other distance metrics that fit the unknown real distribution better. More accurate similarity estimation is expected if the metric could reflect the real distribution. We call this problem of finding the best distance metric distance metric analysis. It can be mathematically formulated as follows. Suppose we have observations xi = μ + di
(8)
where di , i = 1, . . . , N are data components and μ is the distribution mean or a sample from the same class if it is considered as the center of a subclass from a locality point of view. In most cases μ is unknown and may be approximated for similarity estimation. For some function f (x, μ) ≥ 0
(9)
which satisfies the condition f (μ, μ) = 0, μ can be estimated by μ, ˆ which minimizes ε=
N
f (x, μ). ˆ
(10)
i=1
It is equivalent to satisfy N d f (xi , μ) ˆ = 0. d μˆ i=1
(11)
298
YU ET AL . TABLE 1 D ISTANCE M ETRICS AND M EAN E STIMATION FOR D IFFERENT D ISTRIBUTIONS Distance metric
Arithmetic
Median
ε=
ε=
N i=1 N i=1 N
(xi − μ) ˆ 2
μˆ = N1
N
xi
i=1
|xi − μ| ˆ
2 μˆ −1 xi
Harmonic
ε=
Geometric
2 N xi ε= log μˆ
i=1
Mean estimation
xi
i=1
μˆ = med(x1 , . . . , xN ) N μˆ = N 1
i=1 xi
μˆ =
9 N
1
xi
N
i=1
For some specific distributions, the estimated mean μˆ = g(x1 , x2 , . . . , xN ) has a closed form solution. The arithmetic mean, median, harmonic mean, and geometric mean in Table 1 are in that category. It is well known that the L2 metric (or SSD) corresponds to the arithmetic mean while the L1 metric (or SAD) corresponds to the median. However, no literature has discussed the distance metrics associated with the distribution models that imply the harmonic mean or the geometric mean. Those metrics in Table 1 are inferred using Eq. (11). Figure 1a illustrates the difference among the distance functions f (x, μ) ˆ for the arithmetic mean, median, harmonic mean, and geometric mean. For fair comparison the value of μ is set to be 10 for all distributions. We found that in distributions associated with the harmonic and geometric estimations, the observations that are far from the correct estimate (μ) will contribute less in producing μ, as distinct from the arithmetic mean. In that case the estimated values will be less sensitive to the bad observations (i.e., observations with a large variance), and they are therefore more robust. C. Generalized Distance Metric Analysis The robust property of harmonic and geometric distance metrics motivates us to generalize them and come up with new metrics that may fit the distribution better. Three families of distance metrics in Table 2 are derived from the generalized mean estimation using Eq. (10). The parameters p, q, r define the specific distance metrics and describe the corresponding distribution models that may not be explicitly formulated as Gaussian and Exponential.
RANKING METRICS AND EVALUATION MEASURES
299
F IGURE 1. The distance function f (x, μ) of (a) the arithmetic mean, median, harmonic mean, and geometric mean, (b) first type, (c) second type generalized harmonic mean, and (d) the generalized geometric mean (μ is fixed and set to 10).
We found that in the generalized harmonic mean estimation the first type is generalized based on the distance metric representation, while the second type is generalized based on the estimation representation. However, if p = 1 and q = −1, both types will become ordinary harmonic mean, and if p = 2 and q = 1, both types will become arithmetic mean. As for the generalized geometric mean estimation, if r = 0, it will become an ordinary geometric mean. It is obvious that the generalized metrics correspond to a wide range of mean estimations and distribution models. Figure 1b–d shows the distance metric function f (x, μ) ˆ corresponding to the first type and second type generalized harmonic mean, and the generalized geometric mean estimation, respectively. It should be noted that not all mean estimations have a closed-
300
YU ET AL . TABLE 2 G ENERALIZED D ISTANCE M ETRICS
Distance family Generalized harmonic mean (first type)
Distance metric
ε=
N i=1
(xi )p
Estimation
2 N μˆ (xi )p−1 μˆ = i=1 −1 N (x )p−2 xi i=1 i
−1/q N Generalized N harmonic [(xi )q − (μ) ˆ q ]2 μˆ = N ε= q i=1 (xi ) mean i=1 (second type) 9 1/ N (xi )2r N N Generalized i=1 r log xi (xi )2r (x ) ε = μ ˆ = (x ) geometric i i μˆ i=1 i=1 mean
Comments
p real number p = 1 harmonic mean p = 2 arithmetic mean q real number (q = 0) q = −1 harmonic mean q = 1 arithmetic mean r real number r = 0 geometric mean
form solution as in Tables 1 and 2. In that case μˆ can be estimated by numerical analysis, for example, greedy search of μˆ to minimize ε.
III. B OOSTING D ISTANCE M ETRICS FOR S IMILARITY E STIMATION A. Motivation As mentioned in Section I, the most commonly used distance metric is the Euclidean distance that assumes the data have a Gaussian isotropic distribution. When the feature space has a large number of dimensions, an isotropic assumption is often inappropriate. Besides, the feature elements are often extracted by a different statistical approaches, and their distributions may not be the same and a different distance metric may better reflect the distribution. Thus, an anisotropic and heterogeneous distance metric may be more suitable for estimating the similarity between features. Mahalanobis distance (x i − y i )T W (x i − y i ) is one of the traditional anisotropic distances. It tries to find the optimal estimation of the weight matrix W . It is worth noting that it assumes the underlying distribution is Gaussian, which is often not true. Furthermore, if n is the number of dimensions, the matrix W contains n2 parameters to be estimated, which may not be robust when the training set is small compared to the number of dimensions. Classical techniques such as principal component analysis (PCA) (Jolliffe, 2002) or linear discriminant analysis (LDA) (Duda et al., 2001) may
RANKING METRICS AND EVALUATION MEASURES
301
be applied to reduce the dimensions. However, these methods cannot solve the problems of a small training set and they also assume Gaussian distribution. B. Boosted Distance Metrics Based on the analysis in Section III.A, we propose a boosted distance metrics for similarity estimation where similarity function for a certain class of samples can be estimated by a generalization of different distance metrics on selected feature elements. In particular, we use AdaBoost with decision stumps (Schapire and Singer, 1999) and our distance metric analysis to estimate the similarity. Given a training set with feature vectors x i , the similarity estimation is done by training AdaBoost with differences d between vectors x i and x j ,1 where each difference vector d has an associated label ld : 3 ld = 1 if x i and x j are from the same class (12) 0 otherwise. A weak classifier is defined by a distance metric m on a feature element f with estimated parameter(s) θ, which could be as simple as the mean and/or a threshold. The label prediction of the weak classifier on feature difference d is hm,f,θ (d) ∈ {0, 1}. The boosted distance metric H (d) is learned by weighted training with different distance metrics on each feature element and by selecting the most important feature elements for similarity estimation iteratively. Consequently we derive a predicted similarity S(x, y) = H (x − y) that is optimal in a classification context. The brief algorithm is listed below. Please note that the resulting similarity S(x, y) may not be a true metric, but this is not necessarily a disadvantage. Indeed, nonmetric distances can be more accurate for comparing complex objects, as has been studied recently (Jacobs et al., 2000). The proposed method has three main advantages: (1) the similarity estimation uses only a small set of elements that is most useful for similarity estimation; (2) for each element the distance metric that best fits its distribution is learned; and (3) it adds effectiveness and robustness to the classifier when we have a small training set compared to the number of dimensions. Since the training iteration T is usually much less than the original data dimension, the boosted distance metric works as a nonlinear dimension reduction technique, similar to Viola and Jones (2004), which keeps the most important elements to similarity judgment. It could be very helpful to overcome the small sample 1 The difference d between vectors x and x can be measured by different metrics, for example, i j Euclidean distance d = x i − x j 2 and Manhattan distance d = x i − x j .
302
YU ET AL .
Boosting Distance Metric Algorithm Given: A pairwise difference vector set D and the corresponding label L Number of iterations T Weak classifiers based on each distance metric m for each feature element f Initialization: weight wi,t=1 = 1/|D| Boosting: For t = 1, . . . , T • Train the weak classifier on the weighted sample set • Select the best weak classifier giving the smallest error rate 8 8 wi,t 8hm,f,θ (di ) − li 8 εt = min m,f,θ
• Let ht = hmt ,ft ,θt with mt , ft , θt minimizing error rate • Compute the weights of classifiers (αt ) based on the classification error rate εt 1 , αt = log(β Let βt = 1−ε t t) • Update and normalize the weight for each sample 1−|ht,i −li |
wi,t+1 = wi,t βt
wi,t+1 = wi,t+1 /
wi,t+1
i
end for t Final prediction H (d) =
t
αt ht (d)
set problem. It is worth mentioning that the proposed method is general and can be plugged into many similarity estimation techniques, such as widely used K-NN (Cover and Hart, 1968). Compared with other distance metrics proposed in K-NN, the boosted similarity is especially suitable when the training set is small. Two factors contribute to this. First, if N is the size of the original training set, this is augmented by using a new training set with O(N 2 ) relations between vectors. This makes AdaBoost more robust against overfitting. Second, AdaBoost complements K-NN by providing an optimal similarity. Increasing the effectiveness for small training sets is necessary in many real classification
RANKING METRICS AND EVALUATION MEASURES
303
problems, and in particular it is necessary in applications such as retrieval where the user provides a small training set online. C. Related Work We note that there have been several works on estimating the distance to solve certain pattern recognition problems. Domeniconi et al. (2002) and Peng et al. (2001) propose specific estimations designed for the K-NN classifier. They obtain an anisotropic distance based on local neighborhoods that are narrower along relevant dimensions and more elongated along nonrelevant ones. Xing et al. (2003) propose estimating the matrix W of a Mahalanobis distance by solving a convex optimization problem. They apply the resulting distance to improve the K-means behavior. Bar-Hillel et al. (2003) also use a weight matrix W to estimate the distance by relevant component analysis (RCA). They improve the Gaussian mixture EM algorithm by applying the estimated distance along with equivalence constraints. The work by Hertz et al. (2004) resembles the boosting part of our method, although it is conceptually different. They use AdaBoost to estimate a distance function in a product space (with pairs of vectors) while the weak classifier minimizes an error in the original feature space. Therefore, the weak classifier minimizes a different error than the one minimized by the strong classifier AdaBoost. In contrast, our framework utilizes AdaBoost with weak classifiers that minimize the same error as AdaBoost, and in the same space. Apart from this conceptual difference, Hertz et al. (2004) use Expectation–Maximization of Gaussian mixture as a weak classifier, where they assume the data have a Gaussian mixture distribution and estimate several covariance matrices, which may not work well when the real distribution is not Gaussian or the training set is much larger than the dimensionality of the data.
IV. E XPERIMENTS AND A NALYSIS A. Distance Metric Analysis in Stereo Matching Stereo matching is the process of finding correspondences between entities in images with overlapping scene content. The images are typically taken from cameras at different viewpoints, which implies that the intensity of corresponding pixels may not be the same. In stereo datasets the ground truth for matching corresponding points may be provided by the laboratory where these images are taken. This ground truth is a result of mapping the world coordinates, in which the camera is moving, to the image coordinates, using the three-dimensional (3D) geometry
304
YU ET AL .
relations of the scene. In this case automatic stereo matcher, which is able to detect the corresponding point pairs registered in stereo images of the test set scenes, can be tested. For this stereo matcher it is possible to determine the best metric when comparing different image regions to find the similar ones. The optimum metric in this case will give the most accurate stereo matcher. We use two standard stereo data sets (Castle set and Tower set) provided by Carnegie Mellon University. These datasets contain multiple images of static scenes with accurate information about object locations in 3D. The images are taken with a scientific camera in an indoor setting, the Calibrated Imaging Laboratory at CMU. The 3D locations are given in X–Y –Z coordinates with a simple text description (at best accurate to 0.3 mm), and the corresponding image coordinates (the ground truth) are provided for all 11 images taken for each scene. For each image there are 28 points provided as ground truth in the Castle set and 18 points in the Tower set. An example of two stereo images from the Castle dataset is given in Figure 2. In each of the images we consider the points, which are given by the ground truth, and we want to find the proper similarity estimation, which will ensure the best accuracy in finding the corresponding points according to the ground truth. We cannot use single pixel information but have to use a region around it. So we will perform template matching. Our automatic stereo matcher will match a template defined around one point from an image with the templates around points in the other images to find similar ones. If the resulting points are equivalent to those provided by the ground truth we consider that we have a hit, otherwise, we have a miss. The accuracy is given by the number of hits divided by the number of possible hits (number of corresponding point pairs). Because the ground truth is provided with subpixel accuracy we consider that we have a hit when the corresponding point lies in the neighborhood of one pixel around the point provided by the ground truth.
F IGURE 2.
A stereo image pair from the Castle dataset.
RANKING METRICS AND EVALUATION MEASURES
305
We investigate the influence of similarity measurement on the CMU stereo dataset and another stereo pair from the research literature. Our intention is to try distance measures other than SSD, that is, L2 (which is used in the original algorithms), in calculating the disparity map. The algorithm is described in the following: 1. Obtain the ground truth similarity distance distribution. A template of size 5 × 5 is applied around each ground truth point (i.e., 28 points for each image) and the real distance is obtained by calculating the difference of pixel intensities within the template between sequential frames, which is the difference between frame 2 and frame 1, frame 3, and frame 2, . . . , and so on. 2. Obtain the estimated similarity using distance metric analysis: a. Given the 28 ground truth points in one frame, say frame k, the template matching centered at a ground truth point is applied to find its corresponding point in frame k + 1. i. To find the corresponding point in frame k + 1, we search a band centered at the row coordinate of the pixel provided by the test frame k with height of seven pixels and width equal to the image dimension. The template size is 5 × 5. ii. The corresponding point is determined to minimize the quantity of distance, which is defined by distance metrics. For example, the distance under the L1 metric is the summed absolute difference between the intensity of each pixel in the template and that in the searching area, that is, 25 i=1 |xi,k+1 −xi,k | and the distance under L2 is the summed squared difference between each pixel intensity in the 2 template and that in the searching area, that is, 25 i=1 (xi,k+1 − xi,k ) . For other distance metrics ε, see Tables 1 and 2. b. Apply the template centered at the ground truth point in frame k and its tracked point in frame k + 1 to calculate the pixel intensity difference as the estimated similarity measurement. 3. Apply the Chi-square test (Huber, 1981). The estimated distance and the real distance are compared using the Chisquare test. For a parameterized metric we should choose the parameter value that minimizes the Chi-square test. As our first attempt, the parameters of p, q, and r are tested in the range of −5 to 5 with step size 0.1. Figure 3 shows the real distance distribution and the estimated distance distribution for the distance metrics on the Castle dataset. Both the solid and dashed curves are sampled with 233 points at equal intervals. The Chi-square test value is shown for each metric in Table 3. The smaller the Chi-square test value, the closer the estimation is to the real distribution.
306
YU ET AL .
F IGURE 3. The real distance distribution (dashed line) vs. the estimated distance distribution (solid line) for the Castle dataset.
TABLE 3 ACCURACY (P ERCENT ) OF THE S TEREO M ATCHER ON THE C ASTLE S ETa Distance metric
Chi-square test
Hit rate (%)
L1 L2 Cauchy (Sebe et al., 2000)
0.0366 0.0378 0.0295 (a = 17) 0.0273 0.0378 0.0328 (p = 1.5) 0.0272 (q = 1.6) 0.0239 (r = 1.5) gg (r = 1.5)
78.2 78.2 78.9 (a = 17) 78.2 77.1 78.2 (p = 1.5) 78.6 (q = 1.6) 80.4 (r = 1.5) 80.4
Harmonic mean Geometric mean First type generalized harmonic mean (firstgh) Second type generalized harmonic mean (secondgh) Generalized geometric mean (gg) Best metric a The best parameter is shown.
RANKING METRICS AND EVALUATION MEASURES
307
The generalized geometric mean metric has the best fit to the measured distance distribution. Therefore, the accuracy should be greatest when using the generalized geometric mean metric (Table 3). In all cases, the hit rate for the generalized geometric mean (r = 1.5) is 80.4%, and the hit rate for the Cauchy metric is 78.9%. The hit rates obtained with L1 and L2 are both 78.2%. The Cauchy metric performs better than both L1 and L2 . It should be noted here that the Chi-square test score is not exactly in the same order of the hit rate, though the winner is consistent in both cases. This is because the ground truth is provided with subpixel accuracy during the data collection process, and we consider it is a hit when the corresponding point lies in the neighborhood of one pixel around the point provided by the ground truth. The inconsistency introduced by this rounding distance may explain the observation (not in the exact order for both measures). Similar results are obtained for the Tower set and are not shown here. To evaluate the performance of the stereo matching algorithm under difficult matching conditions, we also use the ROBOTS stereo pair (Lew et al., 1994). This stereo pair is more difficult due to varying levels of depth and occlusions (Figure 4). For this stereo pair, the ground truth consists of 1276 points pairs, with one pixel accuracy. Consider a point in the left image given by the ground truth. The disparity map gives the displacement of the corresponding point position in the right image. The accuracy is given by the percentage of pixels in the test set that the algorithm matches correctly. Table 4 shows the accuracy of the algorithms when different distance metrics are used. Note that the accuracy is lower using the ROBOTS stereo pair, showing that, in this case, the matching conditions are more difficult. But still the second type of generalized harmonic mean with q = 4.1 gives the best result. The Cauchy metric still performs better than L1 and L2 and this observation is consistent with the previous work (Sebe et al., 2000).
F IGURE 4.
ROBOTS stereo pair.
308
YU ET AL . TABLE 4 ACCURACY (P ERCENT ) OF THE S TEREO M ATCHER ON THE ROBOTS S TEREO PAIRa
Distance metric
Chi-square test
Hit rate (%)
L1 L2 Cauchy
0.0399 0.0481 0.0392 (a = 1.3) 0.0782 0.0319 0.0340 (p = 4.7) 0.0201 (q = 4.1) 0.0511 (r = −4.3) Second type gh (q = 4.1)
61.20 59.60 62.80 (a = 1.3) 58.40 54.50 60.40 (p = 4.7) 65.60 (q = 4.1) 58.00 (r = −4.3) 65.60
Harmonic mean Geometric mean First type generalized harmonic mean (firstgh) Second type generalized harmonic mean (secondgh) Generalized geometric mean (gg) Best metric
a The best parameter is shown.
B. Distance Metric Analysis in Motion Tracking In this experiment distance metric analysis is tested on a motion tracking application. We use a video sequence containing 19 images on a moving head in a static background (Tang et al., 1994). For each image in this video sequence, there are 14 points given as ground truth. The motion tracking algorithm between the test frame and another frame performs template matching to find the best match in a 5 × 5 template around a central pixel. In searching for the corresponding pixel, we examine a region of width and height of 7 pixels centered at the position of the pixel in the test frame. The idea of this experiment is to trace moving facial expressions. Therefore, the ground truth points are provided around the lips and the eyes, which are moving through the sequences. In Figure 5, we display the fit between the real data distribution and the four distance metrics. The real data distribution is calculated using the template around points in the ground truth dataset considering sequential frames. The best fit is the generalized geometric mean metric with r = 7.0. Between the first frame and a later frame, the tracking distance represents the average template matching results. Figure 6 shows the average tracking distance of the different distance metrics. The generalized geometric mean metric with r = 7.0 performs best, while the Cauchy metric outperforms both L1 and L2 .
RANKING METRICS AND EVALUATION MEASURES
309
F IGURE 5. The real data distribution (dashed line) vs. the estimated data distribution (solid line) for motion tracking.
F IGURE 6. Average tracking distance of the corresponding points in successive frames; for Cauchy, α = 7.1, and for generalized geometric mean, r = 7.0.
310
YU ET AL .
C. Boosted Distance Metric on Benchmark Data Set In this section we compare the performance of our boosted distance metric with several well-known traditional approaches. The experiment is conducted on 15 benchmark datasets from UCI (Merz and Murphy, 1998). The traditional distance metrics we tested are Euclidean distance, Manhattan distance, RCA distance (Bar-Hillel et al., 2003), Mahalanobis distance with the same covariance matrix for all the classes (Mah), and Mahalanobis with a different covariance matrix for every class (Mah-C). The last three metrics are sensitive to small sample set problems. A diagonal matrix D could be estimated instead of the original weight matrix W to simplify that problem and consequently we can obtain three metrics, RCA-D, Mah-D, and Mah-CD, respectively. To make the comparison complete, we also test original AdaBoost with decision stump (d.s.) and C4.5 (Quinlan, 1996). The AdaBoost C4.5 decision tree is implemented in Matlab Classification Toolbox (Stork and Yom-Tov, 2004). To reduce the computational complexity, in our experiment the difference metric m is fixed as L1 , that is, d = x i − x j for simplicity. It can be easily extended to different metrics by feeding difference d obtained with different metrics, such as Euclidean distance. Due to space limitations, only the traditional distance metric that gives the best performance in each dataset is shown in Table 5. The smallest error rates are in bold. From the results in Table 5 we can find that our boosted distance metric performs best in 12 out of 15 datasets. It provides comparable results to the best performance on two datasets. Only in one dataset was our method outperformed by the traditional distance metric. It proves that our method could discover the best distance metric that reflects the distribution and selects the feature elements that are discriminant in similarity estimation. D. Boosted Distance Metric in Image Retrieval As discussed in Section III.B, the boosted distance metric performs an element selection that is highly discriminant for similarity estimation and it does not suffer from the small sample set problem of LDA and other dimension reduction techniques. To evaluate the performance, we tested the boosted distance metric on image classification against some state-of-theart dimension reduction techniques: PCA, LDA, nonparametric discriminant analysis (NDA) (Fukunaga and Mantock, 1997), and plain Euclidean distance in the original feature space. The two datasets we used are a subset of the MNIST dataset (LeCun et al., 1998), containing similar handwritten 1s and 7s (Figure 7a), and a gender recognition database, containing facial images from the AR database
RANKING METRICS AND EVALUATION MEASURES
311
TABLE 5 C OMPARISON OF T RADITIONAL D ISTANCE M ETRIC AND A DA B OOST ON UCI DATASETS Error rate (%)
Traditional metric
AdaBoost +d.s.
AdaBoost +C4.5
ad
17.31 (L1 ) 15.38 (L1 ) 2.34 (RCA-D) 37.02 (RCA-D) 10.55 (Mah-D) 26.1 (Mah-CD) 31.16 (Mah-D) 10.78 (RCA) 6.83 (Mah-CD) 38.74 (Mah-D) 9.07 (L1 ) 19.18 (Mah-CD) 5.25 (RCA) 34.55 (Mah-CD) 41.11 (Mah)
12
11.42
8.88
12.27
11.89
10.45
2.22
2.14
1.6
31.39
29.94
25.78
5.94
4.84
4.73
25.95
25.81
25.67
28.65
27.18
27.1
19.92
19.92
16.27
5.81
5.37
4.67
34.31
33.18
6.37
6.37
6.86
17.97
17.21
18.33
5.7
5.34
3.79
31.02
29.96
28.91
35.51
35.43
33.58
gender mnist arrhythmia splice sonar spectf ionosphere wdbc german vote1 credit wbc pima liver
Boosted metric
32.4
(Martinez and Benavente, 1998) and the XM2TVS database (Matas et al., 1999) (Figure 7b). The dimension of the feature for both databases is 784 while the size of the training set is fixed at 200, which is small compared to the dimensionality of the feature. In such a circumstance, an appropriate distance metric is very important. To play fair, a simple nearest-neighbor classifier is used in the reduced dimension space. Figure 8 shows the classification accuracy versus the projected dimension, which, for our boosted distance metric, is the training iteration T . Because of the small sample problem, the accuracy of LDA is poor, 50% and 49.9%, and is not shown in the figure. A simple regularization scheme can improve
312
YU ET AL .
(a)
(b) F IGURE 7.
Example images from handwritten digits (a) and gender recognition (b).
its performance, but it still remains much worse than other techniques. It is clear that the traditional methods perform poorly due to the fact that we use a very small training set compared to the dimensionality of the data. Note that all traditional methods rely on estimating a covariance or scatter matrix with n2 elements, where n is the number of dimensions. Empirical experience suggests that we need a training set of size greater than 3n2 to obtain a robust estimation of n2 parameters. However, our boosted distance metric needs to estimate only a very few parameters on each dimension, which provides a robust performance on the small training set and makes it outperform the wellknown techniques.
V. D ISCUSSION AND C ONCLUSIONS This work presents a comprehensive analysis on distance metric and boosting heterogeneous metric for similarity estimation. Our main contribution is to provide a general guideline for designing a robust distance estimation that
RANKING METRICS AND EVALUATION MEASURES
F IGURE 8.
313
Accuracy of classification on gender recognition (a) and handwritten digits (b).
could adapt data distributions automatically. Novel distance metrics deriving from harmonic, geometric mean, and their generalized forms are presented and discussed. We examined the new metrics for several applications in computer vision, and the estimation of similarity can be significantly improved by the proposed distance metric analysis. The relationships between probabilistic data models, distance metrics, and ML estimators have been widely studied. The creative component of our work is to start from an estimator and perform reverse engineering to obtain a metric. In this context, the fact that some of the proposed metrics cannot be translated into a known probabilistic model is both a curse and a blessing. A curse, because it is really not clear what the underlying probabilistic models are (they certainly do not come from any canonical family), and this is usually
314
YU ET AL .
the point at which one starts. After all, the connection between the three quantities (metric, data model, and ML estimator) is probabilistic. It is a bit unsettling to have no idea of what these models are. It is a blessing because this is probably the reason why these metrics have not been previously proposed. But they seem to work very well according to the experimental results in this article. In similarity estimation the feature elements are often from heterogeneous sources. The assumption that the feature has a unified isotropic distribution is invalid. Unlike a traditional anisotropic distance metric, our proposed method does not make any assumptions on the feature distribution. Instead it learns the distance metric for each element to capture the underlying feature structure. Because the distance metric is trained on the observations of each element, the boosted distance does not suffer from the small sample set problem. Considering that not all feature elements are related to the similarity estimation, the boosting process in the proposed method provides a good generalization of the feature elements that are most important in a classification context. It also has a dimension reduction effect, which may be very useful when the original feature dimension is high. The automatic metric adaptation and element selection in our boosted distance metric bridge the gap between the high-level similarity concept and low-level features. The experimental results have proven our proposed method is more effective and efficient than traditional distance metrics. In the future we would like to incorporate our new metric into state-of-the-art classification techniques and evaluate the improvement in performance.
ACKNOWLEDGMENTS This work was supported in part by Army Research Grant (ARO) W911NF05-1-0404 and by the Center of Infrastructure Assurance and Security (CIAS), The University of Texas at San Antonio.
R EFERENCES Aggarwal, J.K., Nandhakumar, N. (1988). On the computation of motion from sequences of images—a review. Proc. IEEE 76 (8), 917–933. Aigrain, P. (1987). Organizing image banks for visual access: Model and techniques. In: International Meeting for Optical Publishing and Storage, pp. 257–270. Amores, J., Sebe, N., Radeva, P. (2006). Boosting the distance estimation: Application to the K-nearest neighbor classifier. Pattern Recog. Lett. 27 (3), 201–209.
RANKING METRICS AND EVALUATION MEASURES
315
Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D. (2003). Learning distance functions using equivalence relations. In: Proc. Int. Conf. Machine Learn., pp. 11–18. Chang, N.S., Fu, K.S. (1980). Query by pictorial example. IEEE Transact. Software Eng. 6 (6), 519–524. Cover, T.M., Hart, P.E. (1968). Nearest neighbor pattern classification. IEEE Transact. Inform. Theory IT-13, 21–27. Domeniconi, C., Peng, J., Gunopulos, D. (2002). Locally adaptive metric nearest neighbor classification. IEEE Transact. Pattern Anal. Machine Intell. 24 (9), 1281–1285. Duda, R., Hart, P., Stork, D. (2001). Pattern Classification, 2nd ed. John Wiley & Sons, New York. Flicker, M., et al. (1995). Query by image and video content: The QBIC system. IEEE Comput. 28 (9), 23–32. Fukunaga, K., Mantock, J. (1997). Nonparametric discriminant analysis. IEEE Transact. Pattern Anal. Machine Intell. 19 (2), 671–678. Gudivada, V.N., Raghavan, V. (1995). Design and evaluation of algorithms for image retrieval by spatial similarity. ACM Transact. Inform. Syst. 13 (2), 115–144. Haralick, R., Shapiro, L. (1993). Computer and Robot Vision II. AddisonWesley. Haralick, R.M., et al. (1973). Texture features for image classification. IEEE Transact. Sys. Man Cyb. 3 (6), 610–621. Hertz, T., Bar-Hillel, A., Weinshall, D. (2004). Learning distance functions for image retrieval. In: IEEE Proc. Comput. Vision Pattern Recog., pp. 570– 577. Huber, P.J. (1981). Robust Statistics. John Wiley & Sons, New York. Jacobs, D.W., Weinshall, D., Gdalyahu, Y. (2000). Classification with nonmetric distances: Image retrieval and class representation. IEEE Transact. Pattern Anal. Machine Intell. 22 (6), 583–600. Jolliffe, I.T. (2002). Principal Component Analysis, 2nd ed. Springer-Verlag, New York. Kato, K. (1992). Database architecture for content-based image retrieval. In: Conf. Image Storage Retrieval Syst., vol. 1662. SPIE, pp. 112–123. LeCun, Y., et al. (1998). MNIST database. http://yann.lecun.com/exdb/mnist/. Lew, M.S., Huang, T.S., Wong, K. (1994). Learning and feature selection in stereo matching. IEEE Transact. Pattern Anal. Machine Intell. 16 (9), 869– 882. Martinez, A. Benavente, R. (1998). The AR face database. Tech. Rep. 24, Computer Vision Center. Matas, J., et al. (1999). Comparison of face verification results on the xm2vts database. In: Proc. Int. Conf. Pattern Recog., pp. 858–863.
316
YU ET AL .
Mehtre, B.M., et al. (1997). Shape measures for content based image retrieval: A comparison. Inform. Process. Manage. 33 (3), 319–337. Merz, C., Murphy, P. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html. Peng, J., et al. (2001). LDA/SVM driven nearest neighbor classification. IEEE Proc. Comput. Vision Pattern Recog., 940–942. Quinlan, J.R. (1996). Bagging, boosting, and c4.5. In: Proc. Natl. Conf. Artificial Intell., pp. 725–730. Schapire, R.E., Singer, Y. (1999). Improved boosting using confidence-rated predictions. Machine Learn. 37 (3), 297–336. Sebe, N., Lew, M.S., Huijsmans, D.P. (2000). Toward improved ranking metrics. IEEE Transact. Pattern Anal. Machine Intell., 1132–1143. Smeulders, A.W.M., et al. (2000). Content-based image retrieval at the end of the early years. IEEE Transact. Pattern Anal. Machine Intell. 22, 1349– 1380. Smith, J.R., Chang, S.F. (1994). Transform features for texture classification and discrimination in large image database. In: IEEE Int. Conf. Image Process. Stork, D.G., Yom-Tov, E. (2004). Computer Manual in MATLAB to Accompany. Pattern Classification. John Wiley & Sons, New York. Swain, M., Ballard, D. (1991). Color indexing. Int. J. Comput. Vision 7 (1), 11–32. Tang, L., et al. (1994). Performance evaluation of a facial feature tracking algorithm. In: Proc. NSF/ARPA Workshop: Perform. Methodol. Comput. Vision. Tian, Q., Xue, Q., Yu, J., Sebe, N., Huang, T.S. (2004). Toward an improved error metric. In: IEEE Int. Conf. Image Process. (October 24–27, Singapore). Tversky, A. (1977). Features of similarity. Psychol. Rev. 84 (4), 327–352. Tversky, A., Krantz, D.H. (1977). The dimensional representation and the metric structure of similarity data. J. Math. Psychol. 7, 572–597. Viola, P., Jones, M.J. (2004). Robust-real time face detection. Int. J. Comput. Vision 57 (2), 137–154. Wallach, M. (1958). On psychological similarity. Psychol. Rev. 65 (2), 103– 116. Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S. (2003). Distance metric learning, with application to clustering with side-information. Proc. NIPS, 505–512. Zakai, M. (1964). General distance criteria. IEEE Transact. Inform. Theory, 94–95, January.
Index
A
coils, 72f cylinder types for, 77f permanent magnets and, 59–61 room temperature coils, 61–63 superconducting coils, 63–67 typical data of, in Carpe Diem, 103t Axial magnetic profile of permanent magnetic ion source, 78f Axial magnetic system with superconducting coils and iron plugs, 65f Axiomatization, 279 Axons, 228
AdaBoost, 301–302 distance metric comparison with, 311t Additive noise, 215f Affine independence, 210 Afterglow, 126 mode, 55, 94f period, 29 Algorithm(s) boosting distance metric, 302 of distance metric analysis, 305 recursive, 265 sequential, 265 All permanent magnet ECRIS, 76–88 beam intensities produced by, 87t microwave coupling in, 80–84 radial magnetic field for, 81f resonant points for, 81f source design of, 84–88 whole magnetic system of, 80f Angular diffusion, 16 Angular elastic diffusion, 10 Anisoptropic distance metric, 300 ANNs. See Artificial neural networks Apparent electron temperature, 39f, 42f Arc detector, 28 Argon charge state distribution, 93, 94f Argon ions, 144f Arithmetic mean, 298 Artificial neural networks (ANNs), 166 Associative memories. See also Lattice associative memories auto, 178 auto, minimax, 222 based on dendritic computing, 228–237 Audio signals, 251 Auger cascade, 136 Automatic stereo matcher, 304 Averaged tracking distance, 309 Axial magnetic confinement, 49–53 minimum-B, 54–56 Axial magnetic field, 59–67
B Basic measurements, 278–279 Beam(s) EBIS, 3, 127 emittance, 67 images, 131f inhomogeneity, 131 LEBT, 133 prismatic, 186 rectangular, 199f xenon, 53 Beam intensity, 87f all magnetic ECRISs producing, 87t ECRISs obtaining, 50 enhancement, 55 evolution, 50f, 117f extracted ion, 71 ion, 71 typical xenon, 88f Benchmark data set, 310 Beryllium isoelectronic sequence, 34 Beucher gradient, 274f, 275 complex, 276f Biased disk voltage, 94f, 95f Binary imaging processing, 255 Biological neuron, 229f Black top hat, 273f Boolean image patterns, 214
317
318 Boosted distance metric, 301–303 on benchmark data set, 310 in image retrieval, 310–312
C Canonical association memory, 180 Canonical basis vectors, 189 Caprice hexapole, 68f Caprice source, 42–43 Carbon contamination, 86 therapy, 129 Carpe Diem, 101f drawing of, 114f ECRISs’ characteristics of, 100t general shape of, 113 magnetic field configuration of, 101f magnetic scaling laws and, 111t PID of, 112f typical data of, in axial magnetic field, 103t Castle dataset, 305 Catastrophic failure, 218 Catchment basin, 275 Cauchy metric, 297 Charge state distribution (CSD), 12, 119f, 125f argon, 93, 94f ECRISs and, 127–130 Charge state production, 74 Chirped radar signal, 283 Chi-square test value, 305 Classical dilation, 257f Classical mirror field, 62 Closings, 258–260 Coaxial coupling, 20–23 Coaxial rf coupling, 82–83 Coaxial waveguide, 23 Coil(s) currents, 104t magnetic field, 73f room temperature, 61–63 superconducting, 63–67, 73–75 usual magnetic field, 73f Collision(s) ionizing, 48 spitzer, time, 14 Collisional excitation, 34 Compact superconducting ECRIS, 99–115
INDEX cryogenic aspect, 109–112 existing hybrid, 114–115 hexapolar field for, 105–109 region H1, 105–106 region H2, 107 region H3, 107 region H4, 107–108 region H5, 108 region H6, 108 region H7, 109 mechanical design of, 113 mirror field of, 100–105 general presentation, 100–102 safety margin, 102–105 total magnetic field, 109 Complementation, 247–255 Complex Beucher gradient, 276f Complex black top hat, 273f Complex closing, 260 Complex dilation, 256f, 257f Complex domes, 270 Complex erosion, 256f Complex geodesic dilation, 263–264 Complex horizontal opening, 267f Complex morphological gradient, 274 Complex morphological reconstruction, 265, 265f Complex opening for test signal, 259f Complex signals, 245 Complex spectrogram, 277 Complex watershed, 275–278 transformation, 247 Complex white top hat, 273f Computer vision, 292–294 Confined plasma, 9f Confinement axial magnetic, 49–56 radial magnetic, 44 time evolution, 42 time measurements, 41 Constant amplitude signal, 248 Continuous wave, 145 Convex hull, 186 polygon, 186 polyhedron, 186 polytope, 186 sets, 185–187 Cooling techniques, 111t Corona model, 36
319
INDEX Correct hyperplane, 228 Corresponding halfspaces, 198 Cortical neurons, 228 Coulomb explosion, 136 Coulomb logarithm, 11 Coupling coaxial, 20–23 coaxial rf, 82–83 microwave, 52f, 80–84 rectangular, 24–26 usual, 52f utilized, 20–31 wave, 15–31 Cryogenic aspect, 109–112 cryostat design, PID, 111–112 cryostat design, thermal aspect, 112 Cryostat design, 111–112 Cryostat thermal budget, 112t CSD. See Charge state distribution Currents, 104t
D Dark shaded region, 200f Data distribution, 309f Demagnetization, 107 Dendrites, 228 Dendritic structure, 230f, 231 Desirable erosive tolerance level, 226 Destabilization, 137 Diagonally max dominant, 171, 180 Diagonally min dominant, 171, 180 Diamond nanocrystals, 136 Dielectronic recombination, 5 Diffusion angular, 16 angular elastic, 10 Dilation, 255–260, 257f classical, 257f complex, 256f, 257f Euclidean, 263 geodesic, 263–264 geodesy and, 262–264 mathematical morphology, 255–260 operators, 248 scalar, 255 Dilative change, 213–214 Dilative noise, 214, 218, 220 Dimensionality of F (X) remarks concerning, 198–202 theorems of, 201–202
Distance, 305 function, 299f geodesic, 262f Mahalanobis, 300 real, 305 Distance distribution, 306f Distance metric, 294–295, 298t AdaBoost comparison with, 311t anisoptropic, 300 boosting for similarity estimation, 300–303 boosted, 301–303 motivation, 300–303 related, 301–303 heterogeneous, 300 Distance metric analysis, 295–300, 303–307 algorithm of, 305 generalized, 298–300 maximum likelihood approach, 295–298 in motion tracking, 308–309 Distributions different, 298f Dodecapole, 132f Dome, 269–271 complex, 270 complex detector, 270f detector, 270f Double frequency, 26–27
E EBIS. See Electron beam ion source ECR heating (ECRH), 3, 9–10 ECR light source, 143–147 ECR photo source, 141f ECR plasma diagnostics, 32 ECRH. See ECR heating ECRIS plasma electrodes, 86f ECRIS plasmas, 31–47 VUV diagnostics of, 31–47 discussion of, 44–47 experimental results, 38–43 experimental set-up/data processing, 32–38 ECRISs. See Electron cyclotron resonance ion sources EDF. See Electron distribution function Eigenmodes, 20 Electrode(s) ECRIS plasma, 86f negatively biased, 85 plasma, 67, 85, 132–133
320 Electron(s), 4, 6 bombardment, 136 heating, 9–10 losses, 28f neutrality, 124 optimum energy, 129 population, 46 temperature limiters, 129f velocity, 8 Electron beam ion source (EBIS), 3, 127 Electron confinement, 6–10 magnetic configuration and, 6–8 mechanism of, 8 Electron cyclotron resonance ion sources (ECRISs), 2. See also All permanent magnet ECRIS; Compact superconducting ECRIS; ECRIS plasmas all permanent magnet, 76–88 beam intensity obtaining of, 50 Carpe Diem characteristics with, 100t compact superconducting, 99–115 CSDs and, 127–130 design of various, 75–135 discussion, 116–135 charge state distribution, 127–130 ion beam shape, 130–131 microwave power, 116–123 plasma electrode position, 132–135 existing hybrid compact superconducting, 114–115 fully superconducting, 115–116 fundamental aspects of, 4–15 electron confinement/heating, 6–10 ion temperature/confinement, 10–12 multiply charged ions production, 5–6 plasma effects of, 12–15 fundamental compromise of, 3 high frequency, 135t industrial applications for, 135–149 implantation, 136–138 photon lithography, 138–148 ions in, 32 long superconducting, 74 magnetic, 87t magnetic characteristics of, using superconducting wires, 99t magnetic configuration of, 76 magnetic confinement in, 47–75 magnetic systems examples, 59–75 scaling laws of, 48–59
INDEX magnetic flux tubes in, 7f main parameters of well performing, 75–76 permanent magnetic, 79f, 142f permanent magnets and, 60f plasma electrodes, 86f plasmas, 31–47 possible axial/radial whistler, 17f power emitted by small magnet, 142f pulsed, 123–127 pulsed mode, 96 room temperature, 88–99 usual magnetic field coils of, 73f utilized couplings in, 20–31 coaxial coupling, 20–23 double frequency utilization, 26–27 frequencies above 20 GHz, 28–30 multifrequency transmission line, 30–31 pulsed regime application, 27–28 rectangular coupling, 24–26 wave coupling in, 15–31 macroscopic description of eigenmodes, 20 rf waves, 15–20 Electron density, 34, 38, 42, 145 data concerning, 46 evolution of, 44f Electron distribution function (EDF), 12 Electron temperature, 34, 38 apparent, 39f, 42f evolution of, 44f Energy, 8 LEBT, 133 optimum electron, 129 perpendicular, 16 Erosion, 255–260 complex, 256f geodesy and, 262–264 mathematical morphology, 255–260 operators, 248 Erosive bound, 221 Erosive change, 213 Erosive noise, 214, 216, 218 Erosively corrupted patterns, 217f Estimated data distribution, 309f Estimated distance, 305 distribution, 306f Euclidean dilation, 263 Euclidean distance, 248 Euclidean morphology, 263
321
INDEX Evaluation measures, 291–314. See also Ranking metrics discussion/conclusion to, 312–314 experiments/analysis with, 303–312 boosted distance metric in image retrieval, 310–312 boosted distance metric on benchmark data set, 310 distance metric analysis in motion tracking, 308–309 distance metric analysis/stereo matching, 303–307 similarity estimation in, 292–294 Excitation collisional, 34 mode, 24–26 Excitator synapses, 228 Excitatory fibres, 230f Exhibitor axonal fibre, 232 Existing hybrid compact superconducting ECRIS, 114–115 Explosion, 136 Exponential, 294 Exponential distribution analysis, 297 Extracted ion beam intensity, 71 Extraction regions, 121 Extraction side, 53, 54f Extraction voltage, 87f Extreme point, 188
F Faraday cup, 47 Fast Fourier transforms (FFT), 268 FFT. See Fast Fourier transforms Fibre(s) excitatory, 230f exhibitor axonal, 232 inhibitory, 230f, 232 First stage plasma, 83 Fixed points, 181–187 Flooding simulations, 277 Fourier transform, 244, 268 Frequency double, 26–27 Larmor, 4 Fully superconducting ECRIS, 115–116 Function(s) EDF, 12 gray-tone, 244 hard-limiter activation, 231 MEDF, 36, 46
Minkowski, 278–279 probability density, 296
G Gain variations, 249–250 Gas discharge plasmas (GDP), 139 Gas mixing, 11, 92f Gaussian, 294–295 isotropic distribution, 300 metric analysis, 297 GDP. See Gas discharge plasmas Gender recognition, 312f, 313f Generalized distance metric analysis, 298–300 Generalized geometric mean, 299f Generalized harmonic mean, 299f Geodesic dilation, 263, 264 Geodesic distance, 262f Geodesic mask, 263–264 Geodesic operators, 271 Geodesic transformations, 287 Geodesy, 261–271 dilations and erosions and, 262–264 domes/lakes, 269–271 measurements, 278–286 openings/closings by reconstruction, 266–268 reconstructions of, 264–265 regional maxima/minima, 268–271 structuring element of, 262 top hats, 271 Geometric mean, 298 Geometry, 184 quadrupolar, 44 Gifford Mac Mahon (GM), 111 Granulometries, 279–281, 284f Gray-level primitives, 293 Gray-tone complement, 253 Gray-tone functions, 244 Gray-tone image segmentation, 261 Gray-tone samples, 253f Gray-tone signals, 244, 268 Gray-valued images, 222 Grazing incidence spectrometer, 33f Grenoble Test Source (GTS), 18, 69f, 89 on its bench, 91f without magnetic structure, 91f Grotrian diagram, 35f GTS. See Grenoble Test Source GTS hexapole, 69f
322
H Halbach array, 68 Halbach hexapole, 70, 72f Halfspaces, 198 Handwritten digits, 312f, 313f Hard-limiter activation function, 231 Harmonic mean, 298 Heterogeneous distance metric, 300 Hexapolar field, 105–109 region H1, 105–106 region H2, 107 region H3, 107 region H4, 107–108 region H5, 108 region H6, 108 region H7, 109 Hexapolar radial magnetic field, 25 Hexapolar system, 67 radial magnetic field and, 71f Hexapole, 132f caprice, 68f GTS, 69f Halbach, 70, 72f permanent magnet, 72f shape of h2 part of, 107f High intensity values, 220 High-charge state ions, 46 High-dimensional geometry, 184 High-temperature superconducting (HTS), 63 High-voltage supply, 126 Hopfield network, 179 Horizontal opening, 267f Hot electron mirror machine, 6 Hot electron plasma, 4 Hot plasma volume, 31, 49 Hyperbox, 191–192, 199f Hyperplane, 187, 189 correct, 228 oriented, 188, 195 support, 188 three dimensional, 200 Hyperspectral imagery, 227
I Idempotent operators, 260 Image(s) gray-valued, 222 hyperspectral, 227 processing, 252, 255
INDEX retrieval, 310–312 segmentation, 261 stereo, 293 Image patterns, 214 Implantation, 136–138 Independence affine, 210 lattice, 171–175, 184, 203 linear, 209 signal statistics, 249–250 strong lattice, 203–213 Index satisfying inequality, 208, 210 Individual hyperboxes, 199f Inhibitory fibres, 230f, 232 Inhibitory synapses, 228 Injection quasiperpendicular, 19f radioactive ion, 18 Injection side, 49–53 Injection system, 26f Input neuron, 231 Input spectrogram, 266f Ion(s), 18 argon, 144f beam shape, 130–131 confinement, 10–12 confinement times, 36, 43f current, 115t densities, 36, 37t, 40f, 42 densities evolution, 38–40 in ECRIS, 32 extracted currents, 40f high-charge state, 46 implantation, 137 intensities, 43f evolution of, 45f mobility, 12 multiply, 5–6, 140f stripped argon, 93f tantalum, 97f temperature, 10–12 Ion beam intensity, 71 Ionization, 37 Ionizing collisions, 48 Iron plugs, 65f
K Kato’s model, 34, 36 K-dimensional plane, 192 Kernel vectors, 221–227 Kinetic pressure, 13
INDEX
L Lakes, 269–271 Landau damping, 27–29 Large hadron collider, 93 Large magnetic fields, 49 Larmor frequency, 4 Laser produced plasma (LPP), 139 Lattice(s) complete, 167 independence, 171–175, 184 independence points, 203 matrix memories, 220, 221 pertinent basic properties, 167–170 transforms, 165–238 Lattice associative memories, 165–238 auto, 179 theorems of, 178–181, 204–213 Lattice dependence, 171–175 theorems for, 181–187 Lattice-based perceptron, 231 Lattice-ordered group, 168, 170–171 LDA. See Linear discriminant analysis Lebesgue integral, 286 Lebesgue measure, 280 LEBT. See Low energy beam transport Lhe. See Liquid helium Line intensity ratio method, 34 Linear discriminant analysis (LDA), 300 Linear independence, 209 Linear minimax combination, 172 span, 172 sum, 172, 174 Linear subspaces, 187–193 Liquid helium (Lhe), 64 Lithography technique, 143. See also Photon lithography Logarithm, 11 Lorentz force, 21 Low energy beam transport (LEBT), 133 Low intensity values, 220 Low-temperature superconducting (LTS), 63 LPP. See Laser produced plasma LTS. See Low-temperature superconducting
M Macroscopic description, 20 Magnet(s). See All permanent magnet ECRIS; Axial magnetic field; Electron cyclotron resonance ion
323 sources; Large magnetic fields; Permanent magnets; Strong radial magnetic field; Total magnetic field Magnetic beach effect, 19f Magnetic configuration, 6–8, 60f, 76 Magnetic confinement, 47–75 axial, 49–53 minimum-B, 54–56 magnetic systems examples, 59–75 optimum, 118 radial, 44, 56–58 scaling laws of, 48–59 Magnetic field. See also Axial magnetic field; Radial magnetic field coils, 73f configuration, 101f hexapolar radial, 25 large, 49 lines, 13, 130f scaling laws for, 77–78 strong radial, 14 total, 109 usual, coils, 73f Magnetic flux tubes, 7f Magnetic scaling laws, 58–59 Carpe Diem and, 111t permanent magnets and, 61 typical, 59t Magnetic systems, 59–75 axial magnetic field, 59–67 radial magnetic field, 67–75 Magnetohydrodynamic (MHD), 12 Mahalanobis distance, 300 Maragos, 281 Mask, 263–264 Mathematical morphology, 243–288 complex watershed, 275–278 dilations/erosions, 255–260 geodesy, 261–271 dilations and erosions and, 262–264 domes/lakes, 269–271 measurements, 278–286 openings/closings by reconstruction, 266–268 regional maxima/minima, 268–271 structuring element of, 262 top hats, 271 morphological gradients, 272–275 openings/closings/morphological filters, 258–260
324 order relationship/complementation, 247–255 umbra, 253–255 Matrices, 170–171 Max dominant, 206 diagonally, 171, 180 Maximum likelihood approach, 295–298 estimation, 296–297 theory, 295 Maximum mirror ratio, 49 Maxwellian electron distribution function (MEDF), 36, 46 MCPs. See Microchannel plates Mean generalized geometric, 299f generalized harmonic, 299f geometric, 298 harmonic, 298 Measurement(s), 41, 278–286 basic, 278–279 examples of, 282–286 granulometries/pattern spectra, 279–280 mathematical morphology, 278–286 power granulometry, 280–281 similarity, 294–295 temperature, 42 MEDF. See Maxwellian electron distribution function Median, 298 Memory. See also Associative memories; Lattice associative memories canonical association, 180 minimax autoassociative, 222 morphological, 177 Metallic elements, 120t Metric. See also Distance metric; Ranking metrics anisoptropic distance, 300 boosted distance, 301–303, 310–312 Cauchy, 297 comparison, 297 heterogeneous distance, 300 Metric analysis. See also Distance metric analysis Gaussian, 297 generalized distance, 298–300 MHD. See Magnetohydrodynamic Microchannel plates (MCPs), 34 Microinstability, 14 Microwave coupling, 52f, 80–84
INDEX Microwave injection system, 82f Microwave interferometry, 39f Microwave power, 116–127 Min dominant, 206 diagonally, 171, 180 Minimax autoassociative memory, 222 outer product, 176 principle, 168 products, 170 Minkowski functionals, 278–279 Mirror field, 100–105 classical, 62 of compact superconducting ECRIS, 100–105 general presentation of, 100–102 Mirror machine(s) hot electron, 6 open-ended, 3 simple, 48 Mirror throats, 6 Mirror-mode limit, 13 Mode excitation, 24–26 Morphological closing, 266 filters, 258–260 gradient, 274 gradients, 272–275 memories, 177 operators, 245, 246–247 Morphology Euclidean, 263 mathematical, 243–288 Motion tracking, 308–309 Multifrequency transmission line, 30–31 Multimode cavity, 31 Multiplication, 169 Multiply charged ions, 140f production of, 5–6
N Nanocrystals, 136 Nearest neighbor classifier, 311 Negatively biased electrode, 85 Nerve terminals, 228 Neuron(s) biological, 229f cortical, 228 input, 231 Neutral pressure, 14 Nitrogenation reaction, 137
INDEX Noise additive, 215f dilative, 214, 218, 220 erosive, 214, 216, 218 nonrandom erosive, 216 thermal floor, 245 Noisy inputs, 213–221, 233f Nonrandom erosive noise, 216 Nonrandom removal of data, 215 Nonzero coordinate, 223f Notational complexity, 210
O Open-ended mirror machine, 3 Openings, 258–260, 266–268 Operator(s) dilation, 248 erosion, 248 geodesic, 271 idempotent, 260 topological, 271 union, 263 Optimizations, 128f Optimum electron energy, 129 Optimum magnetic confinement, 118 Order relationship, 247–255 properties of, 248–250 Orientated hyperplane, 188 Orientation, 187 Orthogonal, 193 Oxygen, 36 charge states, 43f pressure, 41
P Parallel direction, 192 Parallel propagation, 16, 17f Parallelepiped, 186, 186f Parametric text, 292 Parasitic resonances, 31 Pattern(s) boolean image, 214 erosively corrupted, 217f power, spectrum, 281 randomly corrupted, 222 randomly corrupted boolean, 223f real-valued noisy, 216 reconstruction, 213–221 spectra, 279–280
325 Pattern recognition system, 245 techniques, 276 vectorial, 215 PCA. See Principal component analysis Perceptron, 231 Perfect recall, 176 Permanent magnet hexapole, 72f Permanent magnet ion source, 78f Permanent magnetic ECRIS, 79f, 142f Permanent magnetic ion source, 78f Permanent magnets, 59–61, 67–73, 106t axial magnetic field and, 59–61 ECRIS, 60f magnetic configuration of, 60f magnetic scaling laws attained by, 61t radial magnetic field, 67–73 Perpendicular energy, 16 Perpendicular propagation, 18 Pertinent geometric property, 198 Photon lithography, 138–148 experimental results of, 141–143 orders of magnitude, 140–141 powerful ECR light source, 143–147 Photon source, 141f ECR/EUV, 147f PID. See Process integrated diagram Pixel, 236 representation, 252 Plasma, 32, 86f confined, 9f cooling water, 117 dense/energetic, 10 diameter, 57 ECRIS, 31–47 effects, 12–15 electrode, 67, 85, 132–133 electro-neutrality, 37, 47, 55 first stage, 83 GDP, 139 hot electron, 4 hot, volume of, 31, 49 iso-pressure surfaces, 48 kinetic pressure of, 13 LPP, 139 particles, 82 potential, 9, 39 shape, 130f star shape, 26 waves, 17f, 18f
326 Plasma chamber, 34 cross section of, 122f entrance, 82 inner size, 51 temperature, 122f wall, 110f Pointwise minimum, 170 Poisson superfish magnetic calculation, 61f Polyhedra, 186f Polytope, 185–187 Power fluctuations, 282 Power granulometry, 280–281, 284f Power pattern spectrum, 281, 284 Powerful ECR light source, 143–147 Pressure, 48 dependence, 41 kinetic, 13 neutral, 14 oxygen, 41 Principal component analysis (PCA), 300 Prismatic beam, 186 Prismatic surface, 186 Probabilistic data models, 313 Probability density function, 296 Process integrated diagram (PID), 111–112 Propagation channel variations, 250 parallel/quasi-parallel, 16 perpendicular, 18 Pulse simulated, 284 simulated fixed frequency radar, 282 simulated radar, 282f Pulsed ECRIS, 123–127 Pulsed mode ECRISs, 96 Pulsed regime application, 27–28
Q Quadrumafios source ion densities evolution/rf power, 38–40 ion densities/confinement time measurement with oxygen pressure, 41 pressure dependence on electron density, 41 rf injected power dependence/electron density temperature, 38 Quadrupolar geometry, 44 Quasi-parallel propagation, 16 Quasiperpendicular injection, 19f
INDEX
R Radar signal, 259f, 283 Radial magnetic confinement, 44, 56–58 Radial magnetic field, 67–75 for all magnetic ECRIS, 81f angular distribution of, 109f axial distribution of, at plasma chamber wall, 110f for GTS hexapole, 69f hexapolar, 25 hexapolar system and, 71f permanent magnets and, 67–73 superconducting coils and, 73–75 value, 7, 58 Radiative recombination, 5 radio frequency (rf), 243. See also rf power Radioactive ion injection, 18 Randomly corrupted boolean patterns, 223f Randomly corrupted patterns, 222 Ranking metrics, 291–314 discussion/conclusion to, 312–314 distance metric as similarity measurement, 294–295 experiments/analysis with, 303–312 boosted distance metric in image retrieval, 310–312 boosted distance metric on benchmark data set, 310 distance metric analysis in motion tracking, 308–309 distance metric analysis/stereo matching, 303–307 similarity estimation in, 292–294 RCHP. See Right-handed circularly polarized Real chirped radar signal, 259f Real data distribution, 309f Real distance, 305 distribution, 306f Real-valued noisy patterns, 216 Recombination dielectronic, 5 radiative, 5 Reconstruction(s), 266–268 complex morphological, 265, 265f of geodesy, 264–268 pattern, 213–221 Rectangular beam, 199f Rectangular coupling, 24–26 Rectangular waveguide diplexer, 30
327
INDEX Recursive algorithms, 265 Regional maxima, 268–271, 269f Regional minima, 268–271 Relative efficiency calibration curve, 35f Remanence, 71 Resonant points, 81f Retarded field analyzer, 32 RF. See Radio frequency rf injected power dependence, 38 rf power, 38–40, 42 rf waves, 15–20 Right-handed circularly polarized (RCHP), 20 Room temperature coils, 61–63 Room temperature ECRIS, 88–99 source design example by, 89 usual performances of, 90–99 Rotation invariance, 249 Row-scanning order, 214
S Scalar dilation, 255 Scaling laws, 48–59 axial magnetic confinement/extraction side, 53 axial magnetic confinement/injection side, 49–53 axial magnetic confinement/minimum-B, 54–56 magnetic, 58–59, 61t magnetic field, 77–78 radial magnetic confinement, 56–58 semiempirical, 6 Seismic signals, 251 Semiempirical scaling laws, 6 Semilattice, 168 Sequential algorithm, 265 Shape of F , 193–197 Signal(s), 245 constant amplitude, 248 gray-tone, 244, 268 seismic, 251 Signal statistics independence, 249–250 Similarity estimation, 297 Similarity measurement, 294–295 Simple mirror machine, 48 Simulated fixed frequency radar pulse, 282 Simulated pulse, 284 Simulated radar pulse, 282f
Sinusoidally perturbated axial curvature waveguide section, 51 Solenoids, 102f, 103t Source axis, 101 Spectrogram, 287 complex, 277 input, 266f Spectrometer, 33f Spitzer collision time, 14 Standard row scan method, 235 Standard stereo data sets, 304 Stereo images, 293 Stereo matcher accuracy, 306t, 308t automatic, 304 Stereo matching, 293, 303–307 Stochastic heating, 15 Stripped argon ion, 93f Strong axial mirror ratio, 52f Strong lattice independence, 203–213 Strong radial magnetic field, 14 Strongly lattice independent, 206 vectors, 226 Sublattice, 183 Subsphere, 193 Superconducting coils, 63–67 radial magnetic field and, 73–75 Superconducting materials, 64 Superconducting multipole, 74 Superconducting wires, 65f, 99t Support hyperplane, 188 Synapses, 228 Synaptic knobs, 228 Synaptic sites, 228 Synchrotrons, 124–125
T Tantalum ions, 97f Temperature measurements, 42 Thermal noise floor, 245 Thick iron yoke, 62 Three dimensional hyperplane, 200 Three electrode extraction system, 86f Top hat, 271 complex black, 273f complex white, 273f transformations, 272 Topological operators, 271 Topological parameters, 271 Total energy conversation law, 8
328 Total magnetic field, 109 Total magnetic field contour distribution, 110f Transform(s) Fourier, 244, 268 lattice, 165–238 Transformation(s) complex watershed, 247 geodesic, 287 top hat, 272 watershed, 247, 275 Transverse velocity, 14 Traveling Wave Tube (TWT), 79 Truncation, 237 TWT. See Traveling Wave Tube
U Ultrasounds, 251 Umbra, 253–255 Union operator, 263 Usual coupling, 52f Usual magnetic field coils, 73f Utilized couplings, 20–31 coaxial coupling, 20–23 in double frequency utilization, 26–27 frequencies above 20 GHz, 28–30 in multifrequency transmission line, 30–31 pulsed regime application in, 27–28 rectangular coupling, 24–26
V Vacuum ultraviolet (VUV), 4, 34, 35f, 39f, 46 Vector(s) canonical basis, 189 coordinate distortions, 221 kernel, 221–227 pattern recognition, 215 strongly lattice independent, 226
INDEX Velocity electron, 8 transverse, 14 wave phase, 16 Voltage, 126 biased disk, 94f, 95f extraction, 87f VUV. See Vacuum ultraviolet VUV spectrometer, 35f VUV spectrometry, 46 VUV spectroscopy, 39f
W Watershed, 247, 275–278 Wave(s) continuous, 145 phase velocity, 16 plasma, 17f, 18f rf, 15–20 TEM, 22f TWT, 79 whistler, 25 Wave coupling in ECRIS, 15–31 macroscopic description of eigenmodes, 20 rf waves, 15–20 Waveguide coaxial, 23 rectangular bijunction, 23 rectangular diplexer, 30 sinusoidally perturbated axial curvature, section, 51 Weak classifier, 303 Whistler microinstability, 14 Whistler wave, 25 White pixels, 236 White top hat, 273f
X Xenon beam intensities, 88f Xenon beams, 53