ADVANCES IN IMAGING AND ELECTRON PHYSICS
VOLUME 107
EDITOR-IN-CHIEF
PETER W. HAWKES CEMESILaboratoire d 'Optique El...
18 downloads
936 Views
20MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ADVANCES IN IMAGING AND ELECTRON PHYSICS
VOLUME 107
EDITOR-IN-CHIEF
PETER W. HAWKES CEMESILaboratoire d 'Optique Electronique du Centre National de la Recherche Scientifique Toulouse. France
ASSOCIATE EDITORS
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Pulo Alto, California
TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdom
Advances in
Imaging and Electron Physics EDITEDBY PETER W. HAWKES CEMESILaboratoire dOptique Electronique du Centre National de la Recherche Scientifique Toulouse, France
VOLUME 107
ACADEMIC PRESS
San Diego London Boston New York Sydney Tokyo Toronto
This book is printed on acid-free paper. @ Copyright
0 1999 by Academic Press
The chapter written by Jeffrey Wood appearing on page 309 is Crown copyright 0 1995 and is published with the permission of DERA on behalf of the Controller of HMSO.
All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per-copy fee through the Copyright Clearance Center, Inc. (222 Rosewood Drive, Danvers, Massachusetts 01923), for copying beyond that permitted by Sections 107 or 108 of the US.Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-1998 chapters are as shown on the title pages: if no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670199 $30.00 ACADEMIC PRESS 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.apnet.com Academic Press 24-28 Oval Road, London NWl 7DX, UK http://www.hbuk.co.uk/ap/ International Standard Serial Number: 1076-5670 International Standard Book Number: 0-12-014749-1 Printed in the United States of America 99 00 01 02 03 QW 9 8 7 6
5 4 3 2
1
CONTENTS CONTRIBUTORS. . . . PREFACE . . . . . . .
. . . . . . . .
vii ix
Magneto-Transport as a Probe of Electron Dynamics in Open Quantum Dots J. P. BIRD,R. AKIS,D. K. FERRY,A N D M. STOPA 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11. Magneto-Transport in Open Quantum Dots: Some Theoretical Con-
siderations 111. Weak-Field Magneto-Transport in Open Quantum Dots: Low-Temperature Properties . . . . . . . . . . . . . 1V. Weak-Field Magneto-Transport in Open Quantum Dots: High-Temperature Properties . . . . . . . . . . . . V. High-Field Magneto-Transport in Open Quantum Dots . VI. Concluding Discussion . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . .
2 6
. . . . .
. . . .
. . . .
. . . . . . . . . .
8 24 44 63 67
External Optical Feedback Effects in Distributed Feedback Semiconductor Lasers MOHAMMAD F. ALAMAND MOHAMMAD A. KARIM
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11. Distributed Feedback Laser Fundamentals . . . . . . . . . . . .
111. IV. V. VI.
Experimentally Observed Effects . . Theories on Optical Feedback . . . External Optical Feedback Sensitivity Conclusion . . . . . . . . . . . . References . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . . . . . . . . . . . . . . . . . .
73 74 82 97 107 114 1 15
Atomic Scale Strain and Composition Evaluation from High-Resolution Transmission Electron Microscopy Images A. ROSENAUER AND D. GERTHSEN I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11. Strain-State Analysis . . . . . . . . . . . . . . . . . . . . . 111. Composition Evaluation by Lattice Fringe Analysis . . . . . . .
IV. Applications . . . . . . . . . . . . . . . . . . . . . . . . . V
.
121 125 154 182
vi
CONTENTS
V . Summary and Discussion of the Atomic Scale Analysis Methods . . Appendix A: List of Variables . . . . . . . . . . . . . . . .
222 225
Hexagonal Sampling in Image Processing R . C. STAUNTON I. I1. 111. 1V. V. VI .
Introduction . . . . . . . . . . . . . . Image Sampling on a Hexagonal Grid . . . Processor Architecture . . . . . . . . . Binary Image Processing . . . . . . . . Monochrome Image Processing . . . . . Conclusions . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
232 236 259 219 289 299 302
The Group Representation Network A General Approach to Invariant Pattern Classification JEFFREY WOOD I. I1. Ill . IV . V. VI .
Pattern Classification and the Invariance Problem . . . . . . . . Group Representation Theory . . . . . . . . . . . . . . . . Linear and Nonlinear Concomitants . . . . . . . . . . . . . . Adaptivity in Group Representation Networks . . . . . . . . . Practical Considerations and Simulations . . . . . . . . . . . The Computational Power of the Group Representation Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . VII . The Group Representation Network and Other Invariant Classification Methods . . . . . . . . . . . . . . . . . . . . VIII . Summary and Open Questions . . . . . . . . . . . . . . . . Proof of Theorem 111.1 . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . Index
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
310 313 329 344 362 310
.
378 391 395 406 409
CONTRIBUTORS Numbers in parentheses indicate the pages on which the author’s contribution begins.
R. AKIS(l), Center for Solid State Electronics Research and Department of Electrical Engineering, Arizona State University, Tempe, Arizona, 85287 MOHAMMAD F. ALAM (73), Electro-Optics Program, University of Dayton, Dayton, Ohio, 45469 J. P. BIRD(l), Center for Solid State Electronics Research and Department of Electrical Engineering, Arizona State University, Tempe, Arizona, 85287 D. K. FERRY (l), Center for Solid State Electronics Research and Department of Electrical Engineering, Arizona State University, Tempe, Arizona, 85287
D. GERTHSEN (121), Laboratory for Electron Microscopy, University of Karlsruhe, 76128 Karlsruhe, Germany MOHAMMAD A. KARIM(73), Department of Electrical Engineering, University of Tennessee, Knoxville, Tennessee, 37996 A. ROSENAUER(121), Laboratory for Electron Microscopy, University of Karlsruhe, 76128 Karlsruhe, Germany R. C. STAUNTON,Department of Engineering, University of Warwick, Coventry, CV4 7AL, United Kingdom M. STOPA(l), Nanoelectronic Materials Laboratory, Frontier Research Program, RIKEN, 2-1 Hirosawa, Wako-shi, Saitama 351-01, Japan
JEFFREY WOOD(309), ISIS Group, Department of Electronics and Computer Science, University of Southampton, Southampton, SO17 lBJ, United Kingdom
vii
This Page Intentionally Left Blank
PREFACE In this volume, we find surveys on the study of quantum dots, distributed feedback lasers, image analysis techniques for electron micrographs, hexagonal sampling, and pattern recognition. We begin with an account of a procedure that sheds light on the transmission properties of open quantum dots, which are quasi-zero-dimensional semiconductor structures. Application of a magnetic field sweeps successive dot states past the Fermi surface and this in turn provokes regular oscillations of the magneto-conductance of the dot at low temperatures. The effects observed for weak magnetic fields are different from those found with high fields. The physics of these new and still imperfectly understood phenomena is examined here by J.P. Bird, R. Akis, D.K. Ferry, and M. Stopa. In the second chapter, M.F. Alam and M.A. Karim describe external optical feedback effects in distributed feedback semiconductor lasers. Such lasers are particularly useful in wavelength-division multiplexed optical communication systems but they suffer from a serious problem: they are highly sensitive to any light that may re-enter the cavity. It is these effects and the various remedies that are discussed in this contribution. The authors first describe this family of lasers and the effects that have been observed experimentally. The theory of these optical feedback effects are then presented and the sensitivity to external optical feedback is assessed. We now turn to electron microscopy. Digital image processing is highly developed in this field, with numerous software packages available, ranging from all-purpose suites to those designed for specific tasks. Nevertheless, areas remain for which no suitable software has been developed. Now that electron microscopy has moved out of the purely qualitative phase (in which conclusions were based on visual scrutiny of electron micrographs) into the quantitative era, such gaps are rapidly being filled and the chapter by A. Rosenauer and D. Gerthsen is a good example of this development. The authors are interested in strain in semiconductor heteroepitaxial layers and the composition of such structures and have developed a software package (Digital Analysis of Lattice Images, or DALI) for analyzing high-resolution electron images. They first provide a full account of strain-state analysis and of composition evaluation by the analysis of lattice fringes. They then describe in detail applications to several semiconductor heterostructures, notably ZnCdSe/ZnSe and InGaAslGaAs, and conclude with a discussion of atomic-scale analytical methods. ix
X
PREFACE
Next, an account of a topic that I have found very elusive. All the books on image processing mention the advantages of hexagonal sampling over the standard square-grid technique, but none provide a thorough discussion of hexagonal sampling with examples of the hardware requirements. I am therefore particularly pleased to include this chapter by R.C. Staunton, who has long advocated the use of this sampling pattern. After a general introduction, image sampling on a hexagonal grid is presented at length, from both the theoretical and practical viewpoints. This is followed by an account of the appropriate processor architecture. The fourth section deals with binary images, and covers connectivity, distances and the morphological operators; a comparison of skeletonization in hexagonal and rectangular grids is included and it will be recalled that the early studies of mathematical morphology all included discussion of the hexagonal case. The fifth section is devoted to monochrome images; in it, such topics as the Fourier transform, various geometric transforms, filters, and edge detectors are examined. The final contribution, by J. Wood, is concerned with a central problem of pattern recognition: how can pattern classifiers be designed in such a way that they are insensitive to certain kinds of transformation of the input data? In particular, how can the classifier be made invariant under linear transformations that form a group? There are numerous practical problems in this category: classification of an object specified by 3-D coordinates when rigid-body motion in three dimensions is allowed, for example. In this chapter, J. Wood first recapitulates the necessary background knowledge of group representation theory. He then introduces concomitants and adaptivity in group representation networks. The following sections cover practical considerations, the computational power of the group approach and a comparison with other methods. This full account of these original ideas should be widely welcomed. As always, I thank all the authors most sincerely for all the trouble they have taken over their surveys and conclude with a list of forthcoming contributions.
FORTHCOMING CONTRIBUTIONS
L. Alvarez Leon and J.-M. Morel (vol. 111) Mathematical models for natural images
I. Andreadis (vol. 110) Soft morphology
D. Antzoulatos Use of the hypermatrix
W. Bacsa (vol. 110) Interference scanning optical probe microscopy
M. Berz and colleagues (vol. 108) Modern map methods for particle optics
N. D. Black, R. Millar, M. Kunt, F. Ziliani, and M. Reid Second generation image coding N. Bonnet Artificial intelligence and pattern recognition in microscope image processing G. Borgefors Distance transforms
A van den Bos and A. Dekker Resolution 0. Bostanjoglo (vol. 110) High-speed electron microscopy S. Boussakta and A. G . J. Holt (vol. 111) Number-theoretic transforms and image processing
J. A. Dayton Microwave tubes in space E. R. Dougherty and Y. Chen Granulometric filters
J. M. H. Du Buf Gabor filters and texture analysis xi
xii
F O R T H C O M I N G CONTRIBUTIONS
D. van Dyck Very high resolution electron microscopy R. G. Forbes Liquid metal ion sources E. Forster and F. N. Chukhovsky X-ray optics M. J. Fransen, T. L. van Rooy, and P. Kruit On the electron optical properties of the ZrO/W Schottky emitter A. Fox The critical-voltage effect
M. Gabbouj Stack filtering W. C. Henneberger The Aharonov-Bohm effect
M. I. Herrera and L. Brh The development of electron microscopy in Spain K. Ishizuka Contrast transfer and crystal images C. Jeffries Conservation laws in electromagnetics
M. Jourlin and J.-C. Pinoli Logarithmic image processing
E. Kasper Numerical methods in particle optics A. Khursheed Scanning electron microscope design
G. Kogel Positron microscopy K. Koike Spin-polarized SEM P, V. Kolev and M. Jamal Deen (vol. 109) Development and applications of a new deep-level transient spectroscopy method and new averaging techniques
FORTHCOMING CONTRIBUTIONS
W. Krakow Sideband imaging D. J. J. van de Laak-Tijssen, E. Coets, and T. Mulvey Memoir of J. B. Le Poole L. J. Latecki Well-composed sets
W. Li Vector transformation J.-M. Lina, B. Goulard, and P. Turcotte (vol. 109) Complex wavelets C. Mattiussi The finite volume, finite element, and finite difference methods
S. Mikoshiba and F. L. Curzon Plasma displays
R. L. Morris Electronic tools in parapsychology J. G . Nagy Restoration of images with space-variant blur P. D. Nellist and S. J. Pennycook Z-contrast in the STEM and its applications
G. Nemes Phase-space treatment of photon beams
M. A. O’Keefe Electron image simulation
B. Olstad Representation of image operators
M. Omote and S. Sakoda (vol. 110) Aharonov-Bohm scattering C. Passow Geometric methods of treating energy transport phenomena
E. Petajan HDTV
...
Xlll
xiv
FORTHCOMING CONTRIBUTIONS
F. A. Ponce Nitride semiconductors for high-brightness blue and green light emission J. W. Rabalais Scattering and recoil imaging and spectrometry
H. Rauch The wave-particle dualism D. Saldin Electron holography G . E. Sarty (vol. 1 1 1) Reconstruction from non-Cartesian grids G. Schmahl X-ray microscopy
J. P. F. Sellschop Accelerator mass spectroscopy S. Shirai Cathode-ray tube gun design methods
M. Shnaider and A. P. Paplinski (vol. 110) Vector coding and wavelets
T. Soma Focus-deflection systems and their applications 1. Talmon Study of complex fluids by transmission electron microscopy
S. Tari (vol. 111) Shape skeletons and greyscale images J. Toulouse New developments in ferroelectrics T. Tsutsui and Z. Dechun Organic electroluminescence, materials and devices Y. Uchikawa Electron gun optics
J. S. Villarrubia Mathematical morphology and scanned probe microscopy
FORTHCOMING CONTRIBUTIONS
L. Vincent Morphology on graphs J. B. Wilburn Generalized ranked-order filters
C. D. Wright and E. W. Hill Magnetic force microscopy
T. Yang (vol. 109) Fuzzy cellular neural networks
xv
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 107
This Page Intentionally Left Blank
ADVANCES IN IMAGING A N D ELECTRON PHYSICS, VOL. 107
Magneto-Transport as a Probe of Electron Dynamics in Open Quantum Dots J. P. BIRD, R. AKIS, and D. K. FERRY Center for Solid State Electronics Research and Department of Electrical Engineering, Arizona Stute University, Tempe, Arizona 85287, U.S.
M. STOPAT Nunoelectronic Materials Laboratory, Frontier Research Program, RIKEN, 2-1 Hirosuwu. Wakc-shi. Saitama 351-01, JAPAN
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. Magneto-Transport in Open Quantum Dots: Some Theoretical Considerations 111. Weak-Field Magneto-Transport in Open Quantum Dots: Low-Temperature Properties . . . . . . . . . . . . . . . . . . . . . . A. Magneto-Conductance Fluctuations in Open Quantum Dots . . . . . . . B. Probing Wavefunction Scarring at Zero Magnetic Field . . . . . . . . . C. Zero-Field Magneto-Resistance Peak . . . . . . . . . . . . . . . . . IV. Weak-Field Magneto-Transport in Open Quantum Dots: High-Temperature Properties . . . . . . . . . . . . . . . . . . . . . . A. Temperature-Dependent Characteristics of the Magneto-Conductance Fluctuations . . . . . . . . . . . . . . . . . . B. Phase-Breaking in Open Quantum Dots . . . . . . . . . . . . . . . . C . Zero-Field Magneto-Resistance Peak: A Probe of Quantum Chaos? . . . . V. High-Field Magneto-Transport in Open Quantum Dots . . . . . . . . . . A. Aharonov-Bohm Magneto-Resistance Oscillations . . . . . . . . . . . . B. Giant Backscattering Resonances . . . . . . . . . . . . . . . . . . . C. Time-Dependent Magneto-Transport . . . . . . . . . . . . . . . . . VI. Concluding Discussion . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 6 8 10 16 22 24
25 29 38 44 45 52 58 63 67
In this review, we discuss the use of magneto-transport studies to probe electron dynamics in open quantum dots, which are quasi-zero-dimensional semiconductor structures in which electrical current flow is confined on length scales that approach the size of the electron itself. The transmission properties of these structures are strongly regulated by their quantum mechanical lead openings, which inject electrons into the dot in a highly collimated beam. This beam in turn only couples favorably to a small set of Current address: Walter-Scottky Institut, Technische Universitat Munchen, D-85748 Garching, Germany.
1 Volume 107 ISBN 0-12-014749-1
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright tQ 1999 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/99 $30.00
2
J. P. BIRD, R. AKIS, D. K. FERRY, AND M.STOPA
states within the dot and, at temperatures where electron phase coherence is maintained over long distances, interference of these states becomes the dominant process in the resulting electrical behavior. A powerful experimental tool for probing the interference is provided by the application of a weak magnetic field, which shifts the phase of the electron wavefunction and sweeps successive dot states past the Fermi surface. The resulting fluctuations in the local density of states are thought to be reflected directly in the magneto-conductance of the dot, which exhibits a series of regular oscillations at low temperatures. Numerical simulations reveal the oscillations to be correlated to the recurrence of wavefunction scarring within the dots, the details of which are produced by a small number of semiclassical orbits. These orbits appear to be highly stable, a property that is thought to arise from the role of the quantum point contact leads, and the discrete quantization within the cavity itself. In contrast to previous suggestions, we therefore conclude that chaotic scattering is suppressed in these structures, in which current flow occurs instead in a highly spatially nonuniform manner. Application of a magnetic field also allows an estimate of the phase breaking time of the electrons to be obtained, and the influence of temperature, environmental coupling, and disorder on this parameter will be considered. The origin of a zero-field peak in the magneto-resistance will also be discussed, and will be argued to provide a signature of energy averaging in these dots. At sufficiently high magnetic fields, the transport properties of the quantum dots are dramatically modified by the formation of well-defined edge states, which may be selectively confined within the dot. A striking observation in this regime is a resonant breakdown of the quantum Hall effect, which is correlated to the depopulation of Landau levels in the dot. Numerical simulations, in which the self-consistent evolution of the quantum dot profile with magnetic field is properly accounted for, suggest this breakdown results from a sudden increase in backscattering via trapped edge states, whose widths swell significantly as a Landau level depopulates and charge is redistributed within the dot. In this regard, the resonances may be considered as resulting from van Hove-like singularities in the coupling between different Landau levels.
1. INTRODUCTION
A fundamental issue in quantum mechanics concerns the manner in which the discrete level spectrum of an isolated system is modified when it is coupled to some external, macroscopic measuring environment. From an
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
3
experimental perspective, an ideal system for the study of this issue is provided by semiconductor quantum dots, which are quasi-zero dimensional semiconductor structures in which the flow of electrical current is confined on length scales comparable to the size of the electron itself (Ferry and Goodnick, 1997). The key components of a quantum dot are illustrated schematically in Fig. 1, in which the basic idea is that current flow between the macroscopic source and drain reservoirs is forced to occur via a central cavity. The strong confinement of motion that this cavity generates quantizes the electronic energy spectrum into a discrete ladder of states, while the coupling between the dot and its environment may be tuned directly in experiment by suitable adjustment of quantum point contact leads. In most situations of interest here, quantum dots are realized using the split-gate technique, the basic principles of which are illustrated schematically in Fig. 2 (Thornton et al., 1986). According to this approach, metal gates with a fine-line pattern defined by electron beam lithography are first deposited on the surface of a GaAs/AlGaAs heterojunction. Application of a suitable negative bias to the gates depletes the regions of electron gas from directly underneath them, forming a dot whose lead openings are defined by means of quantum point contacts (van Wees et ul., 1988, Wharam et al., 1988a). Use of electron beam lithography to define the gates allows the realization of submicron sized dots, whose size is therefore comparable to the spatial extent of short-range potential fluctuations in the underlying two-dimensional (2-D) electron gas (Nixon and Davies, 1990). These fluctuations are associated with the statistical distribution of donors in the AlGaAs layer and
FIGURE1. Schematic diagram illustrating the key features of a quantum dot. A central scattering region is connected to the source and drain by means of one-dimensional (1-D) leads, each of which supports a small number ( N ) of propagating modes.
4
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
FIGURE 2. Realization of quantum dots using the split-gate technique. (a). SEM micrograph of a 1-pm quantum dot, defined on a GaAs/AlGaAs heterojunction. The darker regions correspond to the semiconductor substrate, while the lighter ones are Ti-Au gates; (b) schematic diagram illustrating the depletion of a two-dimensional (2-D) electron gas through the application of a negative bias to suitable surface gates; the solid line indicates the shape of the conducting electron channel formed between the gates; (c) schematic diagram illustrating the depletion edge induced around the surface gates of a split-gate quantum wire.
recent studies suggest that minimum-energy considerations cause their ionization to order into a quasi-lattice structure (Stopa, 1996). In the presence of the resulting weak disorder, electronic motion within the dots should therefore be predominantly ballistic in nature, with large-angle scattering events being restricted to their confining walls (Richter et al., 1996). With a sufficiently negative bias applied to the quantum point contact leads of the dot, electron transmission through them may occur only by tunneling and the transport behavior is dominated by the Coulomb blockade effect (Grabert and Devoret, 1991; McEuen et al., 1991; Ashoori et al., 1992; Waugh et al., 1995; van der Vaart et al., 1995; Yacoby et al., 1995; Tarucha et al., 1996). However, we focus here on the behavior exhibited by open quantum dots, whose point contacts are instead configured to allow electron transmission via a finite number of transversely quantized modes
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
5
(Marcus et al., 1992; Chang et al., 1994; Bird et al., 1994a; Keller et al., 1994; Persson et al., 1995). In these open dots, the Coulomb blockade effect is suppressed and electron transport instead provides a natural connection to the study of quantum chaos (Jalabert et al., 1990). In particular, since open dots are energetically coupled to their external environment, it has been suggested that their discrete level spectrum should be obscured by lifetime broadening effects (Jalabert et al., 1990). The problem with these arguments, however, is that they ignore the few mode nature of the point contact leads, which act as quantum mechanical filters between the dot and its environment, greatly restricting the excitation of phase space within the central dot. The filtering action itself arises from the need for incoming electrons to match their transverse momentum component to one of the strongly quantized values within the input contact. Consequently, electrons are injected into the dot in a highly directed, or collimated, beam (Akis et al., 1996a, 1996b, 1997a), which is only thought to couple efficiently to a small set of preferred dot states (Zozoulenko et al., 1997). At suitably low temperatures, where phase coherence of the electrons is maintained over long distances, subsequent interference of these eigenstates then becomes the dominant factor in determining the resulting electrical behavior (Marcus et al., 1993a, 1993b; Bird et al., 1995a, 1998a; Clarke et al., 1995). In other words, rather than obscuring the discrete level spectrum of the dot, the introduction of environmental coupling by means of quantum point contacts is instead thought to filter the effective density of states that contributes to transport. Transport measurements are therefore expected to be ideally suited as an experimental probe of this filtering effect. Motivated by the preceding, we discuss here the use of magneto-transport studies to probe the dynamics of electrons in open quantum dots. At sufficiently weak magnetic fields, the main effect of the applied field is to modulate the intrinsic motion of electrons, by shifting the phase of their wavefunction. Important information is thus obtained on the nature of electrical current flow through these strongly quantized structures, and on the factors that limit the phase coherence of electrons trapped within them. At higher magnetic fields, however, well-defined Landau levels ultimately form, giving rise to a dramatic redistribution of charge within the quantum dot (Stopa et al., 1996). In this edge state regime, the dot may be considered as an artificially engineered atom, whose “level transitions” may be studied directly in experiment (van der Vaart et al., 1994a). The organization of this chapter is as follows. In Section 11, we discuss some of the theoretical concepts that will be important when interpreting the results of the magneto-transport studies. Because our main interest lies in probing the intrinsic dynamics of electrons in the dots, in Section I11 we begin by discussing the results of our weak field studies and focus on the
6
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
behavior obtained at low temperatures, where the influence of electron dephasing may be reasonably neglected. The importance of dephasing is then considered in Section IV, in which we present the results of studies performed at higher temperatures. In Section V, we discuss the behavior observed at high magnetic fields, where novel resonant scattering effects are found to dominate the magneto-transport. Finally, we present our conclusions in Section VI.
11. MAGNETO-TRANSPORT IN OPENQUANTUM DOTS: SOMETHEORETICAL CONSIDERATIONS
The importance of electron transport measurements as a spectroscopic probe of open quantum dots derives from the connection between conductance and the density of states. In order to illustrate this connection, in Fig. 3 we show the computed energy levels of an isolated dot and the evolution of these levels with magnetic field. In the same figure, we also show the conductance contour plot that is obtained for the same dot, when four propagating modes are present in its point contact leads. While a number of subtle modifications may be resolved, it is nonetheless apparent that the basic details of this contour reflect the underlying energy level structure of 15.5
15.5
14.0
14.0 -0.1
0.1
MAGNETIC HELD (T)
-0.1 0.1 MAGNETIC FIELD (T)
FIGURE3. Conductance measurements of open quantum dots provide a spectroscopic probe of their discrete level spectrum. The energy level spectrum of an isolated (0.3 pm) dot is shown on the left-hand side, while on the right-hand side we show the corresponding conductance contour plot, obtained with four modes now propagating in the dot leads. Lighter regions correspond to higher conductance. (See also Plate 1.)
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
7
the isolated dot. This correspondence between conductance and the density of states may easily be understood by recalling that the former quantity is expected to be proportional to the transmission probability of electrons through the dot (Landauer, 1957; Biittiker et ul., 1985). As this probability should in turn be proportional to the density of final states that may be accessed in transport, we thus arrive at the connection between conductance and the density of states (Ferry et al., 1998). At sufficiently low temperatures, where electron phase coherence is maintained over long distances, we therefore expect that transport measurements should provide an experimental probe of the discrete level structure of open quantum dots. Transport measurements are also expected to provide an important tool that may be used to clarify the correct semiclassical description of electron dynamics in open quantum dots. The connection here arises via periodicorbit theory, according to which the density of states of any quantum system may be decomposed in terms of contributions from closed, semiclassical orbits (Gutzwiller 1971, 1990; Berry and Tabor, 1976, 1977; Jalabert et ul., 1990, Nakamura 1993, Casati and Chirikov, 1995; Brack and Bhaduri, 19973. The important point t o note is that the introduction of environmental coupling by means of quantum point contacts is thought to filter the effective density of states that contributes to the conductance [Akis et a/., 1996a; Ferry et al., 19981. Consequently, it is expected that the appropriate semiclassical description of these devices will be one in which a small number of the intrinsic periodic orbits of the dot are coupled and participate in transport. As already mentioned, the filtering action of the leads is thought to arise from their ability to collimate electrons into a highly directed beam (Fig. 4). From the perspective of the semiclassical motion of
FIGURE 4. Quantum mechanical wavefunction simulation showing electrons emerging in a highly collimated beam from a quantum point contact. The gate geometry is taken to be the same as the asymmetric pattern shown in Fig. 2(a). In the left-hand figure, only one occupied mode is present in the quantum point contact, while in the right-hand one seven modes are supported. (See also Plate 2.)
8
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
the electrons, it may be imagined that this beam is only able to favorably couple to those few orbits whose momentum components it closely matches (Akis et al., 1996a, Zozoulenko et al., 1997). At sufficiently high magnetic fields, well-defined Landau levels form in the dot, whose transport properties are dramatically modified by a redistribution of charge into compressible and incompressible regions of electron gas (Beenakker, 1990; Chang, 1990; McEuen et al., 1991; Chklovskii et a/., 1992; Stopa et al., 1996). The compressible regions are characterized by noninteger filling factor and consequently exhibit metallic-like screening. The incompressible regions, on the other hand, correspond to the special situation of integer filling factor, and their screening properties are very poor. Because the incompressible regions are usually much narrower than their compressible counterparts, this spatial redistribution of charge strongly modifies the shape of the dot from its zero-field form. Its walls, in particular, develop a series of broad (compressible) terraces, which are separated from each other by much narrower (incompressible) regions, in which the confining profile suddenly changes (Stopa et al., 1996).
111. WEAK-FIELD MAGNETO-TRANSPORT IN OPEN QUANTUM DOTS:
Low TEMPERATURE PROPERTIES In this section, we consider the transport properties of open dots at low temperatures, where experiment has shown that the wave-like nature of the electrons is maintained over time scales up to a hundred times longer than the ballistic transit time across the dot (Bird 1995a, Clarke 1995). To illustrate the behavior typically observed in this regime, we show in Fig. 5 the results of a magneto-resistance measurement, performed over a wide range of magnetic field. At the highest fields shown here, the cyclotron radius of the electrons
(where k , is the Fermi wavevector and B is the magnetic field) is significantly smaller than the size of the dot and current flow occurs via well-defined edge states (Biittiker, 1988). The edge states result from the intersection of successive Landau levels with the Fermi surface, and may be thought of as analogous to classical skipping orbits. For suitable gate voltages, AharonovBohm oscillations are observed in this regime and are understood to result from electron tunneling via edge states trapped in the dot (Fig. 5, expanded section) (van Wees, 1989; Bird et al., 1994b). In this section, we focus on the
W
MAGNETIC FIELD (TESIA)
FIGURE5. Typical magneto-resistance trace, measured in a 1-pm split-gate quantum dot at 0.01 K. The noise-like features are in fact highly reproducible, and an expanded view of the structure at high magnetic fields reveals highly periodic oscillations.
10
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA TABLE 1 BASICPROPERTIES OF THE SEMICONDUCTOR WAFERS USED IN THISSTUDY TRANSPORT AT TEMPERATURES OF A FEWDEGREES KELVIN CHARACTERISTICS OBTAINED
Wafer mobility (p) m2/Vs
Carrier density ( n , ) x 10'5m-2
Mean free path ( / ) !'m
Fermi wavelength (J.F) nm
36-78
4.5-5.0
4-9
-
35
behavior observed at weaker magnetic fields, where the cyclotron radius is much larger than, or comparable to, the dimensions of the dot (Tables 1 and 2). In this regime, the low-temperature magneto-resistance is typically dominated by dense fluctuations, and closer inspection of these reveals a highly periodic nature. As can be seen from Fig. 6, this periodicity is quite distinct from that of the Aharonov-Bohm oscillations seen in the edge state regime. Another feature that may be observed at weak fields is a central magneto-resistance peak (Fig. 7) (Marcus, 1992; Chang et al., 1994; Bird et al., 1995b), and at first glance this appears reminiscent of that which arises due to weak localization in disordered thin-films (Bergmann, 1983, 1984).
A . Magneto-Conductance Fluctuations in Open Quantum Dots
The regular nature of the magneto-conductance fluctuations, observed in the weak-field regime, is confirmed by the results of Fourier analysis, which typically reveals the presence of just a few distinct frequencies (Fig. 8) (Bird et al., 1996a, 1997a). The fluctuations themselves are thought to arise from an interference effect, involving electron partial waves that propagate through the dot via a small number of periodic orbits (Akis et al., 1996a).
TABLE 2 TYPICAL PARAMETERS FOR THE QUANTUM DOTS" Gate size /'m
Effective dot size ( L ) Pm
~-
2 1 0.6 0.4
1.8 0.8 0.5 0.2-0.3
L/2F
---255015 -5
2r, = L Pm
AIkB
K ~
0.13 0.29 0.45 0.7-0.9
0.03 0.13 0.33 -1
"The effective size of the dots was inferred from the observation of Aharonov-Bohm oscillations in the edge state regime (Bird et a/., 1994b).
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
11
28
21
A
c:
z w
0
14
2
0
2
0.1
0.2
0.3
3.1
3.2
3.3
0.4
0.5
; w
35
w
29
23 3
3.4
3.5
MAGNETIC FIELD (TESLA) FIGURE6 . Comparison of the magneto-resistance over different ranges of magnetic field. In the lower plot, the strikingly periodic oscillations result from tunneling via edge states that are trapped in the dot. The basic periodicity of these oscillations is clearly different from those shown in the upper figure, which spans the same increnierir of magnetic field (0.5T). The experiment was performed at a temperature of 0.01 K.
As such, the simple periodicity of the fluctuations noted here suggests this interference is dominated by contributions from a small number of these semiclassical orbits. This characteristic is fully consistent with the results of electron transport simulations, which reveal the wavefunction within the dot to be strongly scarred by the remnants of a few closed orbits (Akis et al., 1996a, 1996b, 1997b). While the details of this scarring vary sensitively with magnetic field, we nonetheless find that certain scars may recur quite regularly, with a basic periodicity that is in good agreement with that of the measured fluctuations. Fourier analysis of the fluctuations shown in Fig. 8,
0
10
' -3
0.01
J
l , , , , i , , , , l , , , , l , , , , l , , , , ~
0 MAGNETIC FIELD (TESLA)
3
FIGURE7. Low-temperature magneto-resistance of a 1-pm GaAs/AIGaAs split-gate quantum dot. The inset shows an expanded view of the region around zero magnetic field, where a central magneto-resistance peak is apparent. The experiment was performed at a temperature of 0.01 K.
for example, reveals a fundamental magnetic frequency of approximately 9 T - ' . This in turn corresponds very closely to the magnetic field period over which a diamond-like scar is found to recur in the simulations (AB = 0.11 T, Fig. 8) (Akis et al., 1996a). With a small number of orbits involved in transport, electron interference occurs when the orbits return to their initial point, in this case the input point contact (Berry, 1984). Because the experiment here is performed in the presence of a weak magnetic field, this typically causes electrons to precess around the cavity, so that many rotations of the dot are required before orbit closure may be achieved (Bird et al., 1997b). At sufficiently low temperatures, electrons undergo multiple traversals of this basic orbit, while
0.5
4
3
2
0.0
-
0.5
0.m
0.65
1
-
c.
w
1
0
-
-
0
2
0
4
0
6
0
8
0
1
0
0
FREQUENCY (lTTESI-4)
FIGURE 8. The well-defined periodicity observed in the weak field magneto-conductance fluctuations is found to be correlated to the recurrence of well-defined wavefunction scars within the dot. In this figure we show the behavior observed in a 0.4-pm split-gate dot, which reveals fluctuations with a fundamental frequency of 9 T-'. This frequency content does not change significantly as the dot lead openings are varied, and corresponds closely to the field scale over which a diamond scar recurs. Lighter regions in these probability density plots correspond to regions of enhanced probability density. The experiment was performed at a temperature of 0.01 K. (See also Plate 3.)
14
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
maintaining phase coherence, and it is this highly recursive process that builds up the scar. Given such considerations, it is clear that a crucial requirement for the observation of well-defined scarring is that electrons remain coherently trapped within the dot over very long time scales. This notion is confirmed by the results of temperature dependent studies, which, as we will discuss in Section IV, may be considered as probing the time dependent evolution of the scars. We have already mentioned that periodic orbit theory reveals the underlying connection between quantum states and their associated semiclassical orbits. In this regard, the simple periodicity of the fluctuations noted here may also be considered to demonstrate that the transport properties of open dots are dominated by a small set of preferred quantum states. The selection of these states is thought to result from the collimating action of the input contact, in support of which we note that corresponding calculations in which the point contacts are replaced with uniform tunnel barriers reveal much weaker scarring behavior (Akis et al., 1996a). This conclusion is also supported by the results of an independent study, which has shown how quantum point contacts might be used to preferentially excite discrete states within open dots (Zozoulenko et al., 1997). The situation here is similar to that encountered in resonant tunneling diodes, in which only those states of the well, which couple effectively to the barriers, are excited in tunneling to exhibit clearly defined scars (Fromhold et al., 1995). 1. Stability of the Dominant Orbits Further experiments reveal a number of important properties of the orbits that dominate interference in open dots, the first of which is that these orbits appear to be highly stable. This characteristic is revealed by studies in which the magneto-conductance fluctuations are measured at a series of different gate voltages (Bird et al., 1996a, 1997a, 1997b). These studies reveal that the main effect of varying gate voltage is to modulate the amplitude of the dominant Fourier peaks, while leaving their frequency values unaffected (Fig. 8). The stability of the selected orbits is further suggested by the results of numerical calculations, which reveal essentially the same scarred features when the number of modes in the leads is varied over a similar range to that considered in experiment (Fig. 9).
2. Size-Dependent Scaling The periodic nature of the fluctuations is more clearly resolved in smaller dots, in which the increased importance of energy quantization presumably results in a smaller number of dot states being excited in transport. In order
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
15
FIGURE 9. Diamond scars formed in quantum dots with different numbers of modes present in the quantum point contact leads. The dot size here is 0.3 pm, which corresponds to the effective size of the experimental dot studied in Fig. 8. (See also Plate 4.)
to quantify the scaling behavior of the fluctuations, we compute their averaged Fourier spectra by summing over traces obtained at a number of different gate voltages. A fundamental peak typically survives this averaging and its amplitude is found to increase by more than two orders of magnitude when the dot size is reduced in experiment (Fig. 10). The frequency of this peak appears to show a linear scaling with dot length, rather than area, a surprising observation that is nonetheless confirmed by the results of numerical simulations (Bird, et al., 1997b; Holmberg, et al., 1998). In order to account for this unexpected scaling, we note that the basic periodicity of the fluctuations should reflect the rate at which the magnetic flux enclosed by the dominant orbits varies. In the presence of a weak magnetic field, these orbits precess around the dot so that many traversals are required before orbit closure may be achieved (Akis, et al., 1996a). For the periodicity of the fluctuations, the relevant magnetic flux is, therefore, that enclosed between multiple traversals, which in turn may be modified due to flux cancellation effects (Beenakker and van Houten, 1988). Indeed, from studies of the conductance fluctuations in disordered systems, it is known that the flux enclosed between different trajectories may scale in proportion to their length (Ferry and Goodnick, 1997; Ferry, et al., 1997). While the lengthdependent scaling observed here appears consistent with these arguments, it is alternatively possible that it somehow reflects the nonergodic sampling of phase space in these strongly scarred dots, and further studies are required to resolve this issue. 3. Universality of the Scarring Behavior While we have thus far restricted our discussion to the results of experiments performed on dots with the same gate geometry as that shown in Fig. 2, we emphasize that the selective excitation of stable orbits appears to be a generic property of all dots whose environmental coupling is provided by
16
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA 2.0
1.5
w P
zn A
B 4
i!
I.o
t-
u
w
n v)
0.5
0.0
0
20
40 60 80 FREQUENCY (1ITESLA)
I00
FIGURE10. Power spectrum obtained by averaging the spectra of the fluctuations, measured in the same 0.4-pm dot at a number of different gate voltages. First and second harmonics remain resolved. Upper inset: Amplitude of the averaged fundamental peak as a Function of dot size. Lower inset: Frequency of the averaged fundamental peak as a function of dot size. The experiment was performed at a temperature of 0.01 K.
means of few-mode quantum point contacts. In particular, studies performed on dots with chaotic gate geometries, and with different lead opening orientations, have also been found to reveal a highly regular nature to their weak field fluctuations (Fig. 11) (Marcus, et al., 1992; Okubo, et al., 1997a). As the associated wavefunction scarring implies a highly nonuniform sampling of phase space within these dots, an important conclusion is therefore that chaotic scattering is suppressed in open quantum dots when their discrete quantum mechanical nature becomes suficiently resolved.
B. Probing Wavefunction Scarring at Zero Mugneric Field As magneto-transport studies reveal a highly selective nature to electron transport in open dots, a natural question now arises as to whether this
17
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
2
0 0
50 100 MAGNETIC FREQUENCY (PERTESlA)
150
FIGUREI I. Observation of scarring effects in a stadium-shaped dot. The gate geometry studied in experiment is shown in the upper left inset and the spacer bar indicates 1 pm.The periodic nature of the weak field magneto-conductance is shown in the lower right inset, while in the main figure the frequency content of the fluctuations, obtained in experiment and in simulations, is compared. An example of scarring for this dot is shown in the lower left inset. The experiment was performed at a temperature of 0.01 K.
selectivity is intrinsic to these structures or if it is related instead to the application of the magnetic field. In order to resolve this issue, we note that Fig. 3 implies that Fermi energy variations should provide an equally effective probe of the discrete eigenspectrum of open dots. In experiment, we mimic this variation by modulating the voltage applied to the gates of the dot at zero magnetic field. At liquid helium temperatures, this modulation gives rise to a series of smooth plateaus in the resistance, reminiscent of those exhibited by pairs of quantum point contacts aligned in series (Fig. 12) (Wharam et al., 1988b). As the temperature is lowered, however, the phase
18
J. P. BIRD. R. AKIS, D. K . FERRY, A N D M. S T O P A 40
30
-2.4
-2
-1.6
-1.2
-0.8
GATE VOLTAGE (VOLTS) FIGURE 12. T h e resistance-gate voltage characteristic of a I - p n split-gate dot, measured at two different temperatures (indicated).
coherent lifetime of the electrons increases (Bird et al., 1995a) and the plateaus become disrupted by the growth of reproducible fluctuations (Fig. 12). After subtracting the quantized background from the total resistance variation, we thus obtain a series of highly regular conductance oscillations, reminiscent of those seen in the magneto-transport studies (Fig. 13) (Bird et af., 1998b). These oscillations, too, are found to be a generic feature of these devices, and suggest that the selective nature to electron transport inferred previously is indeed an intrinsic property of these devices. As for the origin of the gate-voltage induced oscillations, these are thought to arise from an associated modulation of the size of the dot. In particular, an analysis of the high field Aharonov-Bohm oscillations (Bird et al., 1994b) indicates that the magnitude of this modulation may be as much as several tens of percent, at least in the smallest dots studied. Varying the size of the dot in this manner,
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
W
60
2
40
I.6
20
1.2
0
0.8
19
Y
s
u, $ D:
-20
-40
-60
-80 -1.6
-1.4
-1.2
-1
-0.8
-0.6
GATE VOLTAGE (VOLTS)
FIGURE13. Gate voltage induced conductance variation, measured in a 0.4-pm split-gate dot at 0.01 K. Upper curve: variation of dot resistance with gate voltage. Lower curve. corresponding conductance oscillations, obtained after subtracting a monotonic background from the original data. The dot geometry employed here is shown in the inset.
it should be possible to sweep its eigenstates past the Fermi surface, generating fluctuations in the local density of states that should, in turn, be reflected directly in the measured conductance. In our earlier magneto-transport studies, an important feature was found to be the existence of a correlation between the periodicity of the magnetoconductance fluctuations and the recurrence of specific wavefunction scars. In order to consider the possibility of a similar correlation here, we have performed electron transport simulations in which the influence of the gate voltage variation on the confining profile of the dot is properly accounted
20
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
for (Vasileska et al., 1998). In this way, we obtain conductance oscillations as a function of gate-voltage whose frequency content agrees excellently with that obtained in experiment (Fig. 14). We also find that the wavefunction may be strongly scarred at zero-field, and that specific scars may recur quite regularly with gate voltage, in good correspondence with the periodicity of the associated conductance oscillations (Figs. 15 and 16). Consequently, we conclude that the selective nature to electron transport inferred from the magneto-transport studies is indeed intrinsic to open dots. Finally, in this section, we note that by combining the results of studies in which magnetic field and gate voltage are independently varied, it should
1
0.8
3
0.4
t;
3 W
0.2
u z
2 0 3
n
0
z
8 -0.2
-0.4
-0.6
-1
-0.9
-0.8
-0.7
-0.6
GATE VOLTAGE (VOLTS)
FIGURE14. Comparison of the gate voltage induced conductance oscillations, measured in experiment at 0.01 K and computed using self-consistent numerical simulations. The dot geometry employed here is shown in the inset.
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
21
FIGURE 15. Self-consistently computed wavefunction plots, obtained from simulations of the split-gate dot geometry shown in Fig. 14. The plots were obtained at three different gate voltages and the darker regions correspond to enhanced probability density. A typical dot profile is shown in the upper left figure. (See also Plate 5 . )
be possible to construct conductance contour plots that provide information on the coupling modified density of states of the dots (Chan et al., 1995; Persson et al., 1995; Berggren et al., 1996). An example of one such contour is plotted in Fig. 17, in which the color scale indicates the measured variation of dot conductance with magnetic field and gate voltage. While a direct comparison to the form of Fig. 3 (in which the variation of magneto-conductance with energy is instead computed) cannot be made here, for now we are at least encouraged by observation of the well-defined striations that run through the experimental plot. Similar striations are also resolved in the numerical contours (Figs. 3 and 18), and have previously been shown to correspond to points of constant scarring (Akis et al., 1998b). Before a more detailed comparison of experiment and theory can be made, however, it will first be necessary to compute contour plots in which the influence of the gate voltage variation on the self-consistent profile of the dots is properly considered.
22
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA I
0
I
I
I
.
I
/
,
,
,
I
,
,
.
,
I
"
"
50
I
100
FREQUENCY (V')
FIGURE 16. The Fourier spectrum of the experimental conductance oscillations shown in Fig. 14, reveals a peak at the gate voltage frequency that corresponds to the recurring scars shown in Fig. 15 (see arrow). The inset shows the original, experimental oscillations.
C. Zero-Field Magneto-Resistant Peak
Another feature that may be observed in the magneto-resistance of open dots is a central peak at zero-field (Fig. 19) (Marcus et al., 1992; Berry et al., 1994a 1994b, Chang et al., 1994, Bird et al., 1995a; Keller et ul., 1996), which has previously been attributed to a weak localization effect in which electrons are backscattered through a series of collisions with the confining walls of the dot (Baranger et al., 1993). The study of this peak is often difficult at low temperatures, due to the obscuring effect of the surrounding fluctuations (Figs. 19 and 20), and previous studies have emphasized the importance of using a suitable energy average to resolve it (Baranger et al., 1993; Keller et al., 1996). In what follows, we will argue that the observation
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
23
-0.419
-0.363 -0.25
0
0.25
MAGNETIC FIELD (TESLA)
FIGURE 17. Experimentally determined conductance contour plot, obtained for a 0.4-pm split-gate quantum dot. The color scale ranges from red to blue, indicating low to high conductance, respectively. (See also Plate 6.)
of a well-defined peak in such experiments reflects the influence of energy averaging on the discrete spectrum of the dot. In particular, we will show how distinct peak lineshapes may be obtained, depending on the range of energy over which this average is taken. This in turn leads us to a very different interpretation of these peaks, in which the role played by closed semiclassical orbits is once again emphasized (Akis et a/., 1998b). We
24
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
MAGNETIC FIELD (TESLA) FIGURE18. Numerically determined conductance contour plot, for a 0.3-pm quantum dot. The well-defined lines indicated by the arrows correspond to lines of constant scarring. (See also Plate 7.)
postpone a discussion of this issue for now, however, until the role of electron dephasing in these structures has been considered.
Iv. WEAK-FIELD MAGNETO-TRANSPORT IN OPEN QUANTUM DOTS: HIGH-TEMPERATURE PROPERTIES While we have suggested that the transport properties of open quantum dots are strongly influenced by electron interference, we have thus far neglected the fact that the wave-like nature of the electrons is not preserved indefinitely in condensed matter systems (Ferry and Goodnick, 1997). While this approach should be quite reasonable at low temperatures, where the
MAGNETO-TRANSPORTAS A PROBE OF ELECTRON DYNAMICS
25
160
120
40
0
-0.5
0
0.5
MAGNETIC FIELD (TESLA)
FIGURE19. Weak field magneto-resistanceof a 0.4-pm split-gate quantum dot, measured at a temperature of 0.01 K and a series of different gate voltages. Successive curves are offset by 5-kR increments.
phase breaking time of the electrons may be very long, it is expected that the increased importance of dephasing should result in a suppression of interference at higher temperatures. In this section, we therefore consider the manner in which this suppression arises.
A . Temperature Dependence Characteristics of the Magneto- Conductance Fluctuations Reminiscent of the universal conductance fluctuations observed in disordered quantum wires (Lee 1987), the fluctuations in open dots are found to be
26
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA 35
32
0.13 K (+lo kn)
29
26
0.36 K (+S kn)
23 1.4K (*2
kn)
20
-0.2
-0.1
0
0.1
0.2
MAGNETIC FIELD (TESLA)
FIWRE 20. Temperature dependence or the magneto-resistance of a 1-,urn split-gate quantum dot, measured in the region near zero magnetic field.
strongly suppressed on raising the temperature to above a degree Kelvin (Fig. 21). The quantitative manner in which this quenching arises is found to be very different to that exhibited by disordered wires, however, and is thought to reflect the different nature of electron dynamics in these systems. In particular, disordered wires are known to exhibit a power law scaling of their fluctuation amplitude, which results from the broad distribution of diffusive electron trajectories that contribute to transport (Lee et a/., 1987, Bird et a/., 1990, 1991). Here, the fluctuations are exponentially suppressed with temperature (Fig. 22), however, and we have argued that this behavior is consistent with the notion that a stable set of orbits dominates the transport behavior (Bird et al., 1994a). In particular, as the phase-breaking
MAGNETO-TRANSPORTAS A PROBE OF ELECTRON DYNAMICS
27
60
50
40
10
; 0
0.5
1
1.5
2
2.5
MAGNETIC FIELD (TESLA)
FIGURE2 I . Temperature dependence of the magneto-resistance of a 1-pm split-gate quantum dot. Successive curves are offset by lo-kn increments.
time shortens at higher temperatures, the number of electrons that are able to propagate coherently along these orbits should decrease exponentially. As the fluctuations are thought to result from interference between electrons that propagate coherently along the stable orbits, it thus seems reasonable that this exponential reduction in transmission probability should be reflected directly in the amplitude of the fluctuations. Indeed, similar considerations are known to govern the temperature dependent decay of the Aharonov-Bohm oscillations in disordered rings, which also result from a geometrically defined interference effect and which are found to reveal a similar exponential decay to that shown here (Milliken et al., 1987, Chang et al., 1988, Kurdak et al., 1992). The exponential temperature scaling of the fluctuations is also confirmed by numerical simulations (Akis et at., 1996b), in which dephasing is accounted for phenomenologically by the introduction of an imaginary
10-2 0
0.25
0.5
0.75
1
1.25
1.5
TEMPERATURE (KELVIN)
FIGURE22. Left: The root mean square (rms) amplitude of the conductance fluctuations decreases exponentially with increasing temperature in experiment. Right: The experimental temperature variation is thought to reflect a similar exponential quenching of the wavefunction scarring, which is induced as the electron dephasing rate increases. In this figure the computed wavefunction in a 0.3-pm dot is shown for a number of different phase breaking times (T~).(See also Plate 8.)
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
29
potential (Wang et ul., 1993). These latter studies reveal that the exponential suppression of the fluctuations appears to he associated with a simultaneous disruption of wavefunction scarring due to phase-breaking scattering. This notion is illustrated in Fig. 22, in which we consider the influence of electron dephasing on a representative scar. Note how reducing the phase-breaking time to 500 ps has little effect on the initial scar, which is computed at absolute zero and for no dephasing. On reducing the phase-breaking time to 50 ps, however, the scar is almost completely destroyed, while for a coherent lifetime of just 1 ps only the collimation of incoming electrons may be resolved. As we will show, experimental studies of phase breaking in open dots yield typical lifetimes of the order of a few hundred ps at low temperatures, which decrease by more than an order of magnitude on warming to a degree Kelvin (Bird et al., 1995a; Clarke et al., 1995). In this regard, the temperature-dependent decay of the fluctuations in Fig. 21 appears consistent with a disruption of scarring, similar to that shown in Fig. 22. The wavefunction plots of Fig. 22 reveal an important general property of the scarring, namely that it results from interference of coherent electrons that are trapped in the dot for very long time scales. In particular, when viewed in order of increasing phase coherence, the series of plots in Fig. 22 may be considered to show the time dependent growth of scarring, subsequent to the initial injection of electrons into the dot. From this perspective, the important feature to note is that, while the direct transit time across the dot is no more than a few picoseconds, a well-defined scar is only built up once electrons have been coherently trapped for at least a hundred times longer than this! The implication is, therefore, that temperature dependent transport studies may be used to probe the temporal evolution of the scars.
B. Phase Breaking in Open Quantum Dots
An important parameter for characterizing electron interference in open quantum dots is provided by the phase-breaking time (z~),the average time scale over which the quantum mechanical phase of the electrons is preserved. In this section, we describe an experimental technique for determining the phase-breaking time, and consider some of the physical factors that limit its value. The approach that we employ exploits the magnetically induced increase in the average periodicity of the fluctuations, which is observed at high fields as skipping orbits begin to form (Fig. 23) (Bird et al., 1991, 1995a, 1995c, 1995d; Geim et al., 1992; Brown et al., 1993; Ferry et al., 1995). In this regime, fluctuations are thought to arise from interference between different skipping orbits, whose coupling is predominantly generated by
30
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
f
I.o
2 2
0 I-
$
0.5
t
If
U
"1
I-
0
3 U W
0 2
2
0.0
7 P
x
0
-0.5 0
0.5
I.5
1
2
2.5
MAGNETIC FIELD (TESLA)
FIGURE23. Conductance fluctuations measured in a 1-prn split-gate quantum dot at two distinct temperatures (the higher temperature trace has been shifted upwards by 0.75 ez/h for clarity). In both cases, the traces were obtained by subtracting a smoothed polynomial fit from the raw rnagneto-conductance data. The form of the background did not change significantly over the temperature range shown, and its average resistance was of order 16 kR.
scattering in the point contact leads. To compute the characteristic magnetic flux enclosed between these orbits, we consider the area that a single orbit encloses as it skips coherently along the walls of the dot (Fig. 24) (Ferry et al., 1995)
where N is the number of bounces the electron makes before losing phase coherence and vQ is the Fermi velocity. Given this definition, we obtain a simple expression relating the average period of the fluctuations to the magnetic field
B(B)
q+~
A,
h eA,
8n2tn*
hkiz,
B,
(3)
where m* is the effective mass of the electron and we have exploited the relations 1 . 1 ~= hk,/nz*, rc = hk,/eB (Eq. (1)) and h = h/2n; B, is the correla-
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
31
FIGURE24. Upper figure: schematic diagram illustrating, from left to right, electron trajectories in the weak, intermediate, and strong magnetic fields. respectively. Middle figure: the quantum mechanical probability density, calculated in a 0.8-pm dot at a magnetic field of 1.55 T. For reasons of clarity, only one edge state is shown in this picture. Lower figure: in the skipping orbit, regime fluctuations are thought to result from interference between different orbits, associated with different Landau levels. The characteristic area enclosed between these orbits is estimated by computing that enclosed between a single, average orbit and the walls of the dot.
tion field that arises in the fluctuation correlation function (Lee et ul., 1987)
where the angled brackets indicate an average over a suitable range of magnetic field. According to Eq. (3), when the phase-breaking time is independent of magnetic field the average period of fluctuation should increase as a linear function of the applied field. Such behavior is indeed found to be typical of experiment (Fig. 25, inset) (Bird et al., 1995a, Okubo et al., 1997b) and from the slope of the resulting straight line fit we are able to obtain an estimate for the phase-breaking time. Due to the approximations involved in computing the effective area for interference in the
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
I02
0
10
2
MAGNETIC FIELD (TESIA)
2
lo-'
101
100
TEMPERATURE (KELVIN)
FIGURE25. Main figure: Experimentally determined variation of the phase-breaking time with temperature in two different quantum dots (solid circles: 1 pm; open circles: 0.6 pm). The markers on the upper axis indicate the temperature where the thermal smearing ( k , T ) becomes comparable to the average level spacing (A) in the dots, while the dashed lines indicate a 1/T variation. Inset: Experimentally determined variation of the fluctuation correlation field with magnetic field. At intermediate fields, an approximately linear variation may be resolved.
skipping orbit regime, the value of this estimate is only expected to be correct within a numerical factor of order unity. Using this technique, however, it is expected that it should be possible to determine accurately the evolution of the phase breaking time with external parameters, such as temperature or gate voltage. 1. Phase Breaking at Finite Temperatures
In the main panel of Fig. 25, we show the measured variation of the phase-breaking time with temperature in two different quantum dots. At temperatures of order a degree Kelvin, the phase-breaking time varies roughly inversely with temperature in both dots, similar to the behavior
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
33
found in an independent study (Clarke et al., 1995). While the origin of this inverse variation remains undetermined, we note that it is reminiscent of that obtained for electron-electron scattering in two-dimensional (2-D) disordered systems (Altshuler et al., 1982; Fukuyama and Abrahams, 1983; Takane, 1998). As the temperature is lowered, however, both dots show a tendency to saturated behavior, which sets in at a higher temperature in the smaller dot. We have suggested that saturation is related to a crossover from 2-D to zero-dimensional phase-breaking behavior, which occurs when the discrete levels of the dot become thermally resolved (Bird et al., 1995a, 1997~).In support of this argument, we note that the transition between the two regimes of phase-breaking behavior appears to occur when the thermal energy becomes comparable to the average level spacing in the dots (the markers on the upper axis of Fig. 25)
where L2 is the effective area of the dot. Further evidence for the influence of dimensionality on phase breaking is provided by the results of nonequilibrium studies, in which the transport properties of the dot are measured in the presence of a superimposed dc source-drain bias (Fig. 26) (Linke et al., 1997a, 1997b). As the magnitude of this voltage is varied, the enhanced probability for electron-electron scattering is expected to quench phase coherence (Yacoby et al., 1991), and experiment is found to reveal a monotonic suppression of the dot resistance (Fig. 26). An analysis of the resulting lineshape variation allows an estimate for the bias dependence of the phase-breaking time to be obtained, the results of which are shown in Fig. 26 (Linke et a/., 1997a). As can be seen from this figure, the value of the phase-breaking time deduced in this manner is found to be independent of the voltage bias, until the corresponding excitation energy (e&) becomes comparable to the average level spacing in the dot. This threshold is therefore tentatively associated with a process in which electrons transition between dot levels once the bias voltage becomes sufficiently high (Bird et at., 1997~;Linke et al., 1997a). 2. Environmental Coupling and Electron Dephasing In Fig. 27, we study the influence of dot lead opening on phase coherence in three different quantum dots. For relatively low values of dot resistance, the phase-breaking time takes a value of roughly 40 ps in all three dots. As the dot leads are narrowed, by increasing the negative voltage applied to their gates, however, a clear increase in z4 is observed in each dot. This appears to set in at roughly the same resistance in each case (13 kn). Further
34
J. P. BIRD, R. AKIS. D. K. FERRY, AND M. STOPA
20
5
DC VOLTAGE BIAS (pv) I
2
I
10 DC BIAS VOLTAGE (pV)
I
I
I00
FIGURE 26. Variation of the phase-breaking time with dc source-drain bias voltage, measured in a 0.6-ym dot at 10 mK. The arrow indicates the crossover voltage (A/e). Inset: zero-bias resistance peak, generated in the 0.6-pm dot by sweeping dc bias at zero magnetic field. For a more detailed discussion of such nonequilibrium measurements, we refer the reader to [Linke er a[., 1997a1.
increasing the dot resistance beyond this transition leads to no additional change in phase coherence, and the overall impression is one of a step-like transition (Bird et ul., 1998a). We do note from this figure, however, that the value of the phase-breaking time in this high resistance regime shows considerable variations from one dot to another, which do not appear to follow any well-defined scaling with dot size. The influence of environmental coupling on the behavior of mesoscopic devices has been widely considered in the literature. For the open dots of interest here, we expect that the phase-breaking rate should be proportional to the number of available states that electrons may access during transport. In this regard, the variation shown in Fig. 27 suggests that the phase space available for scattering in the dot is significantly reduced when the quantum point contact leads are narrowed. For the origin of this effect, we note from Fig. 4 that reducing the width of the leads is expected to suppress Raring of the incoming electron beam, and so should reduce the excitation of phase space within the dot. While further theoretical studies are required to
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
35
500
200
100
50
20 0
10
20
30
40
RESISTANCE (kQ)
FIGURE27. Variation of the phase-breaking time with lead opening, measured in three different quantum dots. Solid circles: 1-pm dot. Open circles. 0.4-pm dot. Inset: 0.6-pm dot. Lines are intended to guide the eye and additional error bars are omitted for clarity.
confirm such a mechanism, another independent study has also argued for an increase in phase coherence in small dots when their environmental coupling is reduced (Barggild et a/., 1998).
3. Device-Dependent Variations in Phase Coherence An intriguing feature of Fig. 27 is the absence of a well-defined scaling of the phase-breaking time with dot size. This finding is confirmed by the results of other studies, which reveal considerable variations in phase coherence in the regime of few-mode coupling. For example, in Fig. 28 we show conductance fluctuations measured in two lithographically identical dots, which were patterned on Hall bars with similar characteristics. While the zero-field resistance was adjusted to be roughly the same in both devices, a striking difference is nonetheless apparent in the amplitude of the resulting
36
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA 1.8
f
5
1.4
w
0
z
5 33 cl
z
8
1.0
0.6
0
0.1
0.2
0.3
0.4
0.5
MAGNETIC FIELD (TESLA)
FIGURE28. Conductance fluctuations in two I-pm dots, patterned on nominally identical Hall bars (R,, = 20 kQ, n, = 4.4 x 10’’ m - 2 and p = 40 mZ/vs). The lower curve (T+ = 30 ps) has been shifted downwards from the upper one ( T = ~ 200 ps) by 0.6 e2/h.
fluctuations. The difference in amplitude suggests a considerable difference in phase-breaking time between the two dots. This poor correlation of phase coherence to the average properties of the host substrate is further illustrated in Fig. 29, in which we compare the lead opening dependence of the phase-breaking time, measured in three different dots. Note how the two dots fabricated in the lower quality material exhibit no noticeable variation in phase coherence with lead opening, while the higher mobility one shows an order of magnitude change. While a very rough correlation to wafer mobility can be resolved here, it is nonetheless clear that the magnitude of the phase-breaking time in the high resistance regime does not appear to be distributed in simple accordance with the mobilities of the bulk wafers (Bird et al., 1998a; Huibers et al., 1998). The poor correlation of the phase-breaking time to the average properties of their host material suggests a strong sensitivity of phase coherence in these dots to their microscopic disorder configuration. One obvious source of disorder is potential fluctuations in the 2-D electron gas layer, which
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
37
10' 0
10
20
30
40
4 (W FIGURE29. Variation of the phase-breaking time with lead opening in three 1-pm dots. Solid circles: n, = 5 x 10'sm-2 and 11 = 70m2/Vs; open circles: n, = 4.4 x 1 0 1 s m - 2 and p = 40m2/Vs; open squares: n, = 4.1 x 10'sm-2 and p = 20m2/Vs. Lines are intended to guide the eye.
arises due to the discrete distribution of donors in the AlGaAs layer (Nixon and Davies, 1990; Stopa, 1996a). The effect of these fluctuations should be to mix the otherwise discrete states of the dot, so that electrons may populate additional states that would otherwise remain inaccessible in transport (Altland and Gefen, 1995). In other words, while disorder is more normally thought of as giving rise to elastic scattering, by increasing the density of available states into which electrons may scatter, its presence may actually increase the dephasing rate in these dots. The very different behaviors apparent in Fig. 29 may arise if the phase-breaking time is limited by scattering from disorder within the dot and varying lead opening then would have little effect on the resulting phase coherent characteristics. In contrast, in the limit of weak disorder, electrons may escape from the dot before their phase is randomized within it. Because the phase of these electrons will therefore be broken in the external reservoirs (or within the quantum point contact leads), it seems quite reasonable that reducing the width of the dot leads should enhance their phase-breaking time. Once
38
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
again, however, theoretical studies are required to clarify the influence of disorder on the phase-breaking process within these dots.
C. Zero-Field Magneto-Resistance Peak: A Probe of Quantum Chaos? Weak localization is a well-known correction to the conductance of disordered systems that results from a process known as coherent backscattering (Bergmann, 1984). The origin of this quantum mechanical effect is the finite probability that exists for diffusing electrons to return to their initial point, after randomly scattering within a disordered medium. At sufficiently low temperatures, and in the absence of a magnetic field, constructive interference between these backscattered electrons and their time-reversed counterparts produces an enhancement of the sample resistance. The interference is suppressed by the application of a weak magnetic field, which breaks time-reversal symmetry. This leads to a magneto-resistance peak at zero-field. The open dots we study here may also exhibit a zero-field peak in their magneto-resistance (Figs. 19 and 20), which has been argued to result from the ballistic analog of weak localization in which electrons backscatter within the dot through a series of collisions with its confining walls. According to one semiclassical theory, in particular, the lineshape of this peak is expected to depend very sensitively on the nature of the electron dynamics in the dot, with Lorentzian and linear magnetic field dependencies predicted for chaotic and regular scattering, respectively (Baranger, 1993). In order to resolve a zero-field peak in experimental studies of open dots, it is usually necessary to suppress the influence of the surrounding fluctuations that may dominate the magneto-conductance at low temperatures (Fig. 20). Among the techniques that may be used to achieve this suppression include measuring the response of large arrays of lithographically identical dots (Chang et al., 1994), or averaging magneto-conductance traces obtained in single dots at different gate voltages (Keller et al., 1994; Chan et al., 1995, Huibers et ul., 1998). Alternatively, by raising the measurement temperature to roughly a degree Kelvin, it has been found possible to quench the fluctuations while leaving the central peak resolved (Fig. 20) (Bird et al., 199%). Motivated by these observations, we have suggested an alternative interpretation of this peak, according to which the peak is thought to provide a signature of energy averaging of specific spectral features in these dots (Akis et al., 1998b). As we discuss in greater detail in what follows, simply by varying the range over which this average is taken, we are able to obtain both Lorentzian and linear peak lineshapes in the same dot geometry! While this observation is quite consistent with the results of experiment (Bird et al., 1995b), it clearly contradicts the suggestions of the previous paragraph.
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
39
1. Zero-Field Resistance Peaks: A Spectral interpretation To emphasize the connection of the zero-field peak to the spectral properties of open dots, in Fig. 30 we show the computed level spectrum of an isolated dot, and the conductance contour that is obtained at absolute zero when two propagating modes are present in its point contact leads. A clear correlation to the underlying level spectrum is once again apparent in this contour, from which it is also clear that sweeping magnetic field at fixed energy will not always yield a zero-field magneto-resistance peak. One way in which such a peak may be observed, however, is by averaging the results of magneto-resistance calculations, performed at a series of different Fermi energies. In order to illustrate the effects of such a uniform energy average, consider the magneto-resistance curve shown in Fig. 30. This represents an average of more than 60 distinct magneto-resistance traces, which are used to construct the contour plot shown in the same figure. After this uniform average is performed, a central peak is found to remain at zero magnetic field, reminiscent of the behavior observed in experiment. In Fig. 31, we show how the lineshape of this peak varies as we increase the energy range over which the average magneto-resistance is computed. While the geometry of the dot itself is held constant here, a transition between Lorentzian and linear lineshapes may nonetheless be resolved (see also Fig. 32). This transition is not restricted to the perfectly square dots we consider here, but is also observed in calculations performed for self-consistently computed profiles (Akis et al., 1998b). Based on these findings, we therefore conclude that the lineshape of the zero-field peak does not provide a reliable indicator of chaos in open dots! Instead, the observation of this peak is thought to reflect the fact that the transport properties of these dots are dominated by the details of their energy spectra, even at temperatures where specific spectral features may no longer be resolved. The observation of a peak at zero magnetic field, in particular, is thought to reflect the highly degenerate nature of the open dot energy spectrum in this region (Akis et al., 1998b).
2. Connection to Previous Experiment Zero-field peaks, whose lineshape was found to transition between Lorentzian and linear forms as gate voltage was varied, have been reported in studies of single dots with very different geometries (Fig. 32) (Bird et al., 1995b, Taylor, 1997). While this transition has been argued to reflect a crossover from chaotic to regular scattering in the dots (Bird et al., 1995b, 1996b), it now seems more likely that varying gate voltage instead changes the specific dot states that contribute to the associated energy average. This notion is further confirmed by the results of Fig. 33, in which we obtain
40
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
-100
0
100
MAGNETIC FIELD fTESLAl
FIGURE30. Upper left: Computed energy levels of an isolated 0.3-pm dot and their evolution with magnetic field and Fermi energy. Upper right: Corresponding conductance contour plot, obtained for the same range of magnetic field and Fermi energy, in an open 0.3-pm dot. Lighter regions correspond to higher conductance. Lower figure: Corresponding magneto-resistance lineshape, obtained by averaging over all curves in the upper right plot.
different peak lineshapes by averaging over different regions of the same dot spectrum. A transition between linear and Lorentzian lineshapes has also been observed in studies of nominally regular dots, on raising the measurement temperature to above a degree Kelvin (Chang et al., 1994, Bird et al., 1995~).While this transition was argued to reflect a thermally induced disruption of regular electron scattering (Chang et al., 1994), we instead suggest it arises as the effective window over which averaging is performed increases at higher temperatures. Indeed, an important question, which we have thus far neglected to consider, concerns our choice of energy window in the simulations. In experiment, the two main sources of energy averaging are thought to be thermal smearing and lifetime broadening (h/z4). Making suitable estimates for these quantities, we obtain a total broadening of order
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
41
FIGURE31. Left: Computed conductance contour plot obtained for a 0.3-pm, open quantum dot. Right: Corresponding magneto-resistance lineshapes. obtained by averaging over different sized energy windows (indicated on the energy axis).
a few tenths of a millielectron volt at a degree Kelvin (see Fig. 22), a value that is quite consistent with that required to obtain smooth peaks in the numerical simulations.
3. Absence of Weak Localization in Open Quantum Dots In previous studies of open dots, a distinction has been made between the zero-field magneto-resistance peak and the reproducible fluctuations that persist over a wider range of field. In particular, while the fluctuations have been explained in terms of an interference effect involving electron partial waves that connect the source and drain (Jalabert, 1990), the zero-field peak has been viewed as an additive contribution to the conductance, which results from interference between backscattered orbits (Baranger er al., 1993). In this review, on the other hand, we have argued that both of these magnetoconductance features are characteristic of the same density of states, which in turn is determined solely by contributions from backscattered orbits
13.5
-
-c
C
13.0
5
-
w 14
UI
0
0
z
2
2
$
12.5
2
2 w
-
W
P N
15
13
pc
pc
12
12.01 -30
'
I
'
I
'
I
'
1
'
I
0
MAGNETIC FIELD (mTESLA)
'
30
-15
0
15
MAGNETIC FIELD (rnTESLA)
FIGURE 32. Comparison of the zero-field peak lineshape, obtained numerically (left) and in experiment (right). In theory, the different lineshapes are obtained by averaging over different regions of the quantum dot spectrum, while in experiment they are obtained on changing gate voltage [Bird et a/., 1995b1.
P
W
FIGURE33. Different lineshapes are obtained for the zero field peak by averaging over ditferent regions of the quantum dot spectrum.
44
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
(Gutzwiller, 1971, 1990; Berry and Tabor, 1976, 1977; Jalabert et al., 1990; Nakamura, 1993; Casati and Chirikov, 1995; Brack and Bhaduri, 1997). According to our interpretation, the zero-field peak cannot therefore be considered as an additive contribution to the conductance, because the orbits that determine its details are the only orbits involved in transport. Consequently, the only possible interpretation of this peak is that of a probe of the underlying level spectrum, and we emphasize that there is no weak localization in the sense of an additive contribution to the conductance (Akis et al., 1998b).
v. HIGH-FIELDMAGNETO-TRANSPORT IN OPEN QUANTUM DOTS The transport properties of quantum dots are dramatically modified at high magnetic fields, where the formation of well-defined Landau levels results in current flow at the Fermi surface being carried by one-dimensional (1-D) edge states (Fig. 34) (Biittiker, 1992). These narrow channels are confined very closely to the walls of the dot and propagate while following equipotential paths, whose guiding center energies may be written as
FIGURE34. Schematic diagram illustrating the formation of edge states at high magnetic fields. The upper figure shows the location of the edge states relative to the sample boundaries, while the lower figure shows the resulting potential profile (thick curve) and Landau level structure. The dotted line shows the position of the Fermi level.
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
45
FIGURE35. The tunable profile of the quantum dot may be used to selectively trap edge states within it. Scattering between the transmitted and confined edge states is assumed to occur at the positions marked a-d. The R is the area enclosed by the confined edge state while A , and A , are the interedge state areas defined by the scattering events at a-h and c-d, respectively.
where n is the Landau level index. An important consequence of Eq. 6 is therefore that quantum dots may be used to selectively trap edge states, by reflecting them from the saddle point barrier that is formed in their point contact leads (Fig. 35) (Glazman and Jonson, 1989; van Wees et a/., 1989). In situations where the interaction between the edge states is weak, the resistance of the dot then exhibits a series of perfectly quantized plateaus as 1980 Biittiker, gate voltage or magnetic field is varied (von Klitzing et d., 1992). When scattering between the trapped and transmitted edge states is possible, however, the latter provides an additional route for current flow through the dot and dramatic departures from Hall quantization may occur. In particular, a magnetically induced modulation of tunneling via the trapped edge states is found to give rise to Aharonov-Bohm oscillations and giant backscattering resonances, the properties of which we discuss in what follows.
A . Aharonov-Bohrn Magneto-Resistance Oscillations An important consequence of confining edge states in quantum dots is that their energy becomes quantized into a discrete spectrum, successive states of
46
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
which are swept past the Fermi surface each time their enclosed magnetic flux increases by one quantum ($o = h/e) (Sivan and Imry, 1988; Sivan et al., 1989). In situations where the trapped edge states provide a tunneling route for current flow through the dot, its magneto-resistance is then found to exhibit a series of highly periodic oscillations, which may persist over wide ranges of magnetic field (Figs. 5 and 36) (van Wees et ul., 1989; Taylor et al., 1992; Sachrajda et al., 1993; Simpson et al., 1993; Bird et al., 1994b). An example of this periodicity is shown in Fig. 36, in which the magnetic field position of successive oscillations is plotted. Equating the average period of oscillation to a corresponding magnetic flux, we typically obtain enclosed edge state areas that agree very closely with self-consistent calculations of the effective size of the dot (Bird et al., 1994b, 1997d). Furthermore, small variations in period observed over larger ranges of magnetic field (Fig. 36) appear consistent with the expected movement of the edge guiding centers, relative to the walls of the dot (van Wees et ul., 1989; Marcus et at., 1994; Bird et al., 1994b). The Aharonov-Bohm oscillations are rapidly quenched with increasing temperature and are typically no longer resolved by a degree Kelvin (Fig. 37). While we expect that both thermal averaging and phase breaking should be efficient in reducing the amplitude of the oscillations, the strong sensitivity to temperature that is apparent in Fig. 37 seems inconsistent with the effects of phase-breaking (see Fig. 25). We have, therefore, previously suggested that the quenching of these oscillations is related to thermal smearing of the discrete energy levels of the trapped edge states that mediate the tunneling process (Bird et al., 1994b).
1. Aharonov-Bohm Oscillations: Precise Departures from hf e Periodicity
A number of recent studies have shown that the Aharonov-Bohm oscillations in small dots may exhibit the phenomenon of frequency-doubling, which appears to result from the generation of two separate sets of h/e oscillations by opposite spin-branches of the same Landau level (Fig. 38) (Sachrajda et al., 1993; Simpson et al., 1993). The oscillations remain locked in antiphase over wide ranges of magnetic field and it has been speculated that the Coulomb interaction between the edge states plays an important role in maintaining this phase rigidity. From studies of different sized dots, we have found the frequency-doubled oscillations to be a generic feature of micron-sized devices (Bird et al., 1997d). In this regard, it seems reasonable that these oscillations are indeed associated with some novel charging effect, as has been speculated in the literature (Taylor et al., 1992; Sachrajda et al., 1993; Simpson et al., 1993).
4.0
3 2
3.7
t
9 W G
40
3.4
0
W +
$
3.1
35
60 120 OSCILLATION NUMBER
0
P 4
180
6.6
20
5%
b b
b
b
b b
b b
2.8
3
3.2
3.4
3.6
3.8
MAGNETIC FIELD (TESLA) 6.1
2.8
3
3.2
3.4
3.6
3.8
WGNETIC FIELD (TESIA)
FIGURE36. The Aharonov-Bohm oscillations observed in the edge state regime exhibit striking periodicity over a wide range of magnetic field. Results presented here were obtained in a 1-pm split-gate dot at a temperature of 0.01 K.
48
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
i
1 v) W
a
3.2
3.25
3.3
3.35
MAGNETIC FIELD (TESLA)
FIGURE37. The Aharonov-Bohm oscillations observed in the edge state regime are rapidly washed out with increasing temperature. Results presented here were obtained in a 1-pm split-gate dot.
Period-doubling of the Aharonov-Bohm oscillations has also been reported and while, at present, we do not understand the origin of this effect, its precise nature once again suggests an interpretation in terms of the single particle spectrum of the trapped edge states (Fig. 39) (Bird et al., 1996c, 1997d). In particular, it is well understood that an increasing magnetic field causes the depopulation of successive Landau levels. As doubling of the oscillation period may be thought of as arising from a suppression of every other tunneling event, one idea is that as a depopulation event is approached internal redistribution of charge might somehow compete with the tunneling process responsible for the oscillations (Bird et al., 1996~).The difficulty with such a notion, however, is that it would require the competing tunneling processes to operate strictly sequentially. As this seems rather unlikely, further studies are required to determine the origin of the perioddoubling.
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS 31
49
, , , , I , , , , , , , , , , , , , , 1 , , , ,
h 2.5
2.6
2.1
2.8
2.9
3
MAGNETIC FIELD (TESLA)
FIGURE 38. The phenomenon of frequency-doubling of Aharonov-Bohm oscillations, in this case measured in a 0.4-pm split-gate dot at 0.01 K. The oscillations are determined to be frequency-doubled because the assumption that they are single period oscillations results in an effective edge state area that is bigger than the lithographic size of the dot!
2. Magneto-Coulomb Oscillations: 7he Transition to Tunneling Transport While we have thus far focused on the behavior exhibited by open dots whose leads are initially biased to support one or more modes, a strong magnetic field may be used to induce a transition to tunneling in these structures. This transition is indicated by the magneto-resistance of the dot rising above the last quantum Hall plateau, corresponding to the point where the guiding center of the lowest Landau level drops below the saddle point minimum in the leads of the dot (Fig. 40). In addition to a monotonically increasing background, the magneto-resistance in this tunneling regime is characterized by the observation of a series of periodic oscillations (Bird et al., 1994b, 1994~).The period of these oscillations shows little sensitivity to changes in gate voltage and is found to be more than two orders of magnitude larger than that expected for the Aharonov-Bohm effect. In order
50
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
-
s
27
w
Y 2 V,
26
v)
W
U
25 3.75
3.50
-C
4.00
4.25
4.50
4.75
27
Y, W
Ya b-
V,
26
v)
w
[L
25 3.6
3.65
27
"
25 4.4
I
'
I
"
"
4.45
3.75
3.7
1
'
I
"
"
'
" 4.5
l
'
"
"
3.8
'
'
' 4.55
4.6
MAGNETIC FIELD (TESLA)
FIGURE39. Period-doubling of the Aharonov-Bohm oscillations measured in a 1.O-,um dot at 0.01 K. Upper figure: Broad range of magnetic field illustrating the transition between k/e and 21412 oscillations. Middle figure: Expanded view of the h/e Aharonov-Bohm oscillations. Lower figure: Expanded view of the 2h/e oscillations.
1400
100
1200
-C r
W
1000
800
u z I4
2 w
u)
600
L 400
200
0 4
5
6
MAGNETIC FIELD (TESLA)
7
6
A
5
6
7
8
MAGNETIC FIELD (TESLA)
FIGURE40. Observation of magneto-Coulomb oscillations in a 2-pm split-gate dot at 0.01 K. The left-hand figure shows the magneto-resistance of the dot at two different gate voltages. A t a critical magnetic field, the quantum point contacts of the dot depopulate and the magneto-resistance rises rapidly, indicating the transition to the tunneling regime. A t even higher magnetic fields periodic oscillations are observed in the magneto-resistance. In this figure, the curve that pinches off more rapidly corresponds to a more negative gate voltage. The right figure shows an expanded view of the high-field oscillations, which was obtained by subtracting a monotonic background from the magneto-resistance.
52
J. P.BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
to account for this observation, we note that in the tunneling regime the energy of the isolated dot may vary with magnetic field, while the electrochemical potential in the reservoirs remains pinned due to the presence of the attached battery. With all electrons occupying the lowest Landau level, the energy of the single particle states in the dot should vary with magnetic field according to 1 E(B) = - h o , 2
heB 2m*
= -.
(7)
In this regime, increasing magnetic field should cause electrons to tunnel successively out of the dot and, in order to determine the rate at which this depopulation occurs, we note that each tunneling event should be accompanied by a charging energy (Grabert and Devoret, 1991)
where C is the effective capacitance of the dot. In the presence of this charging energy, the magnetic field period between successive depopulation events ( A B ) should therefore be given as EN(B)
-
EN-
1 (AB) = EC,
(9)
where EN(B) is the energy of the Nth single particle state in the dot. In sufficiently large dots, the spacing between successive states is very small and we may therefore write 2m*e AB=-. hC For the quantum dot shown in Fig. 40, we compute a capacitance C = 0.73 x F and, substituting this value into Eq. (lo), a corresponding depopulation period of 0.25 T, in satisfactory agreement with the oscillations observed in experiment (Figs. 40 and 4 1). We therefore conclude that the oscillations observed in the tunneling regime result from a Coulomb blockade effect, and so refer to them as magneto-Coutomb oscillations (Beenakker et al., 1991; Bird et al., 1994b, 199%; van der Vaart et ul., 1994b). 3. Giant Backscattering Resonances
The remarkable quantization of the Hall resistance, observed in 2-D electron gas systems (von Klitzing et al., 1980), is understood to result from the formation of well-defined edge states, which become pinned within a few magnetic lengths of the sample boundaries at high magnetic fields (Buttiker,
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
53
8
3 ,
v)
W
t 0
5
0
3
6
9
12
15
OSCILLATION INDEX
FIGURE 41. The periodic nature of the oscillations shown in Fig. 40 is confirmed by a plot of their index as a function of magnetic field.
1988). Edge states located at opposite edges of the sample propagate in opposite directions and this spatial separation of current-carrying states is responsible for a strong suppression of electron backscattering. In particular, at sufficiently high magnetic fields, the edge states propagate as independent, 1-D channels and the quantization of the Hall resistance is simply determined by the number of occupied Landau levels. The situation is very different in quantum dots, however, in which the mesoscopic geometry greatly enhances the interaction between the opposite edge states (Glazman and Jonson, 1989; Kirczenow, 1994). It is therefore expected that adiabatic edge state transport may break down in these structures, as the dot geometry is tuned by means of changes to the applied gate voltage (Stopa et al., 1996; Bird et al., 1997e). In Fig. 42, we show the magneto-resistance of a split-gate dot at a number of different gate voltages. With the gates grounded at the drain potential, no dot is formed and the quantum Hall effect is clearly observed (Fig. 42, expanded section). With a negative bias applied to the gates, however, the
2 4 0
; wl
-
-
L
P I
,
I
,
l
,
I
,
FIGURE42. The evolution of the magneto-resistance of a 1-pm dot at 0.01 K, as the voltage applied to its split-gates is varied. The upper curves are offset by 10 kR and 20 kR, respectively, while the lowest curve was obtained with the gates grounded and exhibits wellquantized plateaus (see expanded section).
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
55
initial effect is to shift the Hall plateaus to lower magnetic fields, indicative of selective edge state confinement in the dot. In particular, when interedge state scattering can be neglected, the conductance of the dot should be given as (Biittiker, 1988)
where N is the total number of spin-degenerate edge states, R is the number of these trapped in the dot and T is the number transmitted through it. As the negative gate bias is further increased, however, it is expected that interedge state scattering should become more likely and that significant departures from Eq. (11) should thus be observed. This is indeed found to be the case in experiment, where a series of highly resonant peaks emerge in the magneto-resistance as the gates are more strongly biased (Figs. 42 and 43). The peaks imply a resonant backscattering of edge states at certain magnetic fields (R(B)+ l), which careful studies have shown to be correlated to the depopulation of Landau levels within the dot (Bird et al., 1997e). This depopulation is accompanied by a swelling of the remaining edge states in the dot, and we have argued that the backscattering arises in a process in which initially transmitted edge states tunnel into their oppositely propagating counterparts, via trapped edge states in the dot (Fig. 35) (Stopa et a/., 1996). While the swelling effect mentioned here is found to be most dramatic for the innermost edge states, the large separation of these from the transmitted edge states ensures that they are ineffective as a backscattering path. Instead, the resonant reflection is thought to be dominated by the outermost trapped edge state, whose width grows steadily with magnetic field but whose separation from the transmitted edge states simultaneously increases. Consequently, the magneto-resistance is found to be strongly peaked in the narrow range of magnetic field close to the depopulation event (Bird et a/., 1997e). The amplitude of the resonances is suppressed with increasing temperature, until little evidence for their existence may be resolved at liquid helium temperatures (Fig. 44). Such sensitivity is suggestive of a phase coherent effect and we have therefore modeled edge state transport using a quantum mechanical approach, in which the backscattering probability is computed by considering the phase interference between a number of different edge state areas (Fig. 45) (Kirczenow, 1994). These areas may be computed realistically, by considering the evolution of the dot profile with magnetic field, and by assuming that interedge state scattering arises predominantly in the quantum point contact leads (Stopa et al., 1996). Curvature of the dot profile should be strongest in these regions, allowing for enhanced edge state
&I
150
-
-
vp-4.407 v
4.404 v
E
-
.U6V
: f
I00 -
,427.
v-
u,
01
w
K
1-1
.4uv -
4.3a1 v
50
I-
4.415
/
4.387
0
1
2
3
MAGNETIC FIELD (TESLA)
4
v
""~""~""""'""''""'' 1.5 2.5 3.5
4.5
v 5.5
MAGNETIC FIELD (TESLA)
FIGURE43. Growth of giant-backscattering resonances with increasing gate bias, measured at 0.01 K in a 0.4-pm (left) and a 1.0-pm dot.
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
57
40
s
32
16
1
1.5
2
2.5
3
3.5
MAGNETIC FIELD (TESLA)
FIGURE 44. The giant resonances are also suppressed with increasing temperature. The results shown here were obtained in a 1-pn split-gate dot.
coupling (Glazman and Jonson, 1989), a notion that seems quite consistent with experiment in which the resonances grow as the point contacts are narrowed (Fig. 43). The edge states are also wider in these regions, as can be seen from Fig. 45, in which we plot the convolution of the density of states with the derivative of the Fermi function. In that the strength of the scattering should depend on the local density of states in the destination channel, this figure gives further evidence that the coupling between Landau levels should be strongest in the point contact regions. In Fig. 46,we compare the results of numerical simulations of the edge state transport with an experimental magneto-resistance trace (Bird et al., 1997e). The lower curve is obtained by assuming a fixed dot profile as a function of magnetic field, and it is clear that this is unable to account for the resistance variation seen in experiment. The middle curve, on the other hand, is obtained by computing the self-consistent evolution of the dot profile with magnetic field, and corresponds very closely to the behavior found in experiment. In particular, both curves are seen to be peaked in the
58
J. P. BIRD, R. AKIS, D. K . FERRY, A N D M. STOPA
FIGURE45. Convolution of the density of states with the derivative of the Fermi function in a I-pm gated dot at 2.9 T (only the two lowest Landau levels are shown.) In this figure, only the upper left corner of the dot is shown, in the region near the input quantum point contact (see Fig. 2 for the dot geometry) (Bird et ul., 1997e). (See also Plate 9.)
magnetic field range over which the depopulation occurs, and exhibit fine structure with similar field scales. This latter observation is thought to provide support for the origin of these resonances as an interference effect involving different edge state areas (Fig. 35) (Stopa et a/., 1996; Bird et a/., 1997e). C. Time-Dependent Mugneto-Trunsport 1. Level Trunsitions of Artlficiul Atoms
The analogy of quantum dots to artificial atoms is particularly clear at high magnetic fields, where edge states trapped within the dot may be thought of as analogous to atomic levels (Fig. 47). For transitions to occur between these levels, electrons must tunnel across the incompressible gaps that
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
59
40
35
30 W
0
z 2
25
g v)
W
U
20
15
10 2.25
2.5
2.15
3
3.25
3.5
MAGNETIC FIELD (TESLA) FIGURE 46. Main figure: Comparison of the magneto-resistance measured in a 1-pm dot at 0.01 K (top), with the results of calculations, which incorporate a magnetic field evolving (middle) and independent (bottom) quantum dot profile (Bird rt cd., 1997e). Inset: Comparison of the Aharonov-Bohm oscillations, computed numerically and measured in experiment at 0.01 K.
separate the edge states, and at sufficiently high magnetic fields the likelihood of this tunneling should be very small. One phenomenon that may be observed in closed quantum dots at high fields is therefore metastable switching of the conductance between a number of discrete values (Fig. 48) (van der Vaart et al., 1994a; Bird et al., 1994b). The switching is thought to result as single electrons tunnel between different edge states of the dot, while the relatively long time between such events, of order several minutes, is thought to reflect the large edge state separation. Indeed, in another study it was shown that the time between switching events increases at higher
60
J. P.BIRD, R. AKIS, D.K. FERRY, A N D M.STOPA
FIGURE 47. The edge state structure in a quantum dot at high magnetic fields suggests an analogy to the level structure of atoms. In this case, the red regions correspond to compressible electron gas and the calculation is performed for the gate geometry shown. (See also Plate 10.)
fields, where the edge state separation is thought to be similarly increased (van der Vaart et al., 1997).
2. Zero Current Voltage Fluctuations An interesting phenomenon that is observed in poorer quality devices, whose resistance may exhibit time-dependent drift under fixed gate voltage conditions, is zero-current voltage noise (Ishibashi et al., 1993; Bird et al., 1995e). The origin of this noise, and of the time-dependent drift of the resistance, is thought to be a slow movement of ionization in the AlGaAs donor layer (Stopa 1996), which in turn should yield a time-dependent perturbation to the confining profile of the dot. An example of the zero-current voltage noise is shown in Fig. 49, in which, with the dot gates grounded and no current flowing, varying the magnetic field has no effect on the measured noise level. With the quantum dot formed, however, a
W
0
z
8
c
W
Y3
450
5
v)
% 6
450
400
4
2 '
0
350 10
20
30
40
TIME (MINUTES)
50
60
0
2
4
6
8
10
TIME (HOURS)
FIGURE 48. In the edge state regime, switching noise observed in the magneto-resistance of closed quantum dots is thought to arise from electron tunneling between discrete edge states within the dot. In this regard, the switching noise may be considered as arising from level transitions in an artificial atom! Results shown here were obtained in a 2-pm split-gate dot at a temperature of 0.01
K.
62
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. S T O P A 0.3
r
'
I
i
i
I
I
i
I
-
0.6
I
-
.
-0.3
-0.3
-0.6
0
2
4
6
a
MAGNETIC FIELD (TESLA)
FIGURE 49. Zero-current voltage noise, measured in a I-Aim split-gate dot at 0.01 K. In this case, the current leads were disconnected at the top of the cryostat and the voltage across the sample was measured with a Lockin amplifier.. The upper curve therefore represents the noise level in the experimental setup.
dramatic enhancement of the noise level is observed for certain ranges of magnetic field, which subsequent measurements reveal to be related to the confinement of edge states in the dot. The voltage noise persists with unaltered characteristics when the magnetic field is held constant, and its amplitude is also found to be unaffected by the application of a measurement current (Fig. 50). The mesoscopic origin of the noise is suggested by temperature dependent studies, which reveal that it may be quenched on raising the temperature to around a degree Kelvin (Fig. 51). As voltage noise is only observed over magnetic field ranges where one or more edge states are trapped in the dot, this suggests that these trapped modes must somehow be able to modify the electrochemical potential of the transmitted edge states. To appreciate how this modification might arise, we first consider the situation in which all occupied edge states are transmitted through the dot. With the contacts of the Hall bar floating, all edge states will fill to the same potential and the measured voltage across the dot will, therefore, be zero (Buttiker, 1988). If we now allow the size of the dot to vary by some small amount, the edge states will move so as to remain
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
63
lo-'
102
CURRENT (PA)
FIGURE 50. With a measurement current applied, the voltage noise level was found to be constant, while the corresponding resistance fluctuation A R = A V / I . This indicates conclusively that the origin of the noise is a voltage fluctuation. Measurements shown are from a 1-pm split-gate dot at 0.01 K.
pinned to the walls but, as this process does not change their chemical potential, the voltage measured across the dot will remain fixed at zero. Now consider what happens if we allow the dot size to vary while one or more edge states is trapped in the dot. The transmitted edge states will once more follow the movement of the dot, while the confined ones will remain pinned at its center. This latter property is a consequence of flux quantization, which requires the trapped edge states to enclose a fixed magnetic flux when their electron occupation does not change (Stopa et al., 1994). Consequently, any time dependent perturbations to the dot geometry should give rise to a variable capacitive coupling between the transmitted and the confined edge states, and it is this effect which is believed to be responsible for the zero-current voltage noise (Fig. 52) (Bird et al., 1995e).
VI. CONCLUDING DISCUSSION We have discussed here the use of magneto-transport studies to probe electron dynamics in open quantum dots, which are quasi-zero-dimensional
64
J. P. BIRD, R. AKIS, D. K . FERRY, AND M. STOPA
c
10.01 K
0.1
0.2
0.0
0.1
-0.1
0.0
2 w
c)
c U
P
-0.2' 7
'
'
'
'I' 7.1
'
' 7.2
'
'
'
' 7.3
'
'
'
1
-0.1
7.4
MAGNETIC FIELD (TESLA)
FIGURE51. The mesoscopic origin of the voltage fluctuations is suggested by the fact that it may be quenched with increasing temperatures. Measurements shown here are from a I-jtm split-gate dot.
devices in which electrical current flow is confined on length scales comparable to the size of the electron itself. The transmission properties of these structures are strongly regulated by means of their quantum mechanical lead openings, which inject electrons into the dot in a highly collimated beam. This beam couples favorably to only a small set of states within the dot and, at temperatures where electron phase coherence is maintained over long distances, interference of these states becomes the dominant process in determining the resulting electrical behavior. A powerful tool for probing the interference in experiment is provided by the application of a weak magnetic field, which shifts the phase of the electron wavefunction and sweeps successive dot states past the Fermi surface. The resulting fluctuations in the local density of states are thought to be reflected directly in the magneto-conductance of the dot, which exhibits a series of regular oscillations at low temperatures. These in turn are consistent with the notion that electron transport through these structures is dominated by the selective
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
65
FIGURE52. Proposed model for the origin of the voltage noise, in a dot whose precise confining profile varies with time. A confined edge state at the center of the dot is assumed to contain a fixed charge Q, which couples to the transmitted edge states via time dependent capacitances C,,(t).
excitation of a small number of discrete dot states. Furthermore, the simple periodicity of these oscillations implies that the proper semiclassical description of electron transport in these structures is one in which a small number of discrete orbits are predominantly excited during transport. The orbits themselves appear highly stable, a characteristic suggested by the results of experiment in which the frequency content of the fluctuations is studied as a function of gate voltage. Previous treatments of electron transport in open quantum dots have started from an assumption of ergodicity, according to which electrons are considered to scatter chaotically from the confining walls of the dot. The quantum mechanical nature of the lead openings is neglected in these treatments, and the discrete quantization within the dot is also assumed to be obscured by lifetime broadening effects. Clearly, such approaches are quite inconsistent with the results presented here, which reveal a highly nonergodic nature to electron transport in these dots. This conclusion is supported by the results of numerical studies, which reveal the wavefunction within the dots to be scarred by the remnants of a small number of semiclassical orbits. The details of this scarring are not independent of magnetic field but instead recur periodically when this parameter is varied,
66
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
with a frequency that corresponds very closely to that of the conductance oscillations seen in experiment. The implication is therefore that chaotic scattering is suppressed in these open dots, in which the transport behavior is instead dominated by a small number of regular orbits. These in turn are thought to be stabilized by the role of the quantum point contact leads, and by the discrete quantization within the cavity itself. Support for these notions is provided by the results of calculations performed for isolated dots, in which the collimation effect is absent and in which current flow occurs by tunneling. In such weakly coupled dots, the wavefunction typically exhibits the more uniform sampling of phase space that is expected for chaotic dynamics (Akis et a!., 1996a; Stopa, 1998). The conductance fluctuations are exponentially suppressed with increasing temperature and simulations suggest this behavior is correlated to a simultaneous disruption of scarring, which occurs as the phase coherent lifetime of the electrons shortens at higher temperatures. A useful parameter for characterizing this disruption is provided by the phase-breaking time, which may be thought of as the average time scale over which the wave-like nature of the electrons is preserved. In temperature dependent studies of this parameter, its value is found to saturate at low temperatures and we have suggested that this behavior results from a crossover to zero-dimensional phase breaking, which sets in once the discrete levels of the dot become thermally resolved. In other experiments, the influence of environmental coupling on the phase-breaking behavior has been demonstrated, although the manner in which this coupling modifies phase coherence remains poorly understood. Another unresolved issue here is the origin of large variations in phase coherence, seen from one nominally similar device to another. In addition to regular oscillations, the weak field magneto-conductance of open quantum dots may also exhibit a zero-field peak, which has previously been argued to result from the ballistic analog of weak localization. We have presented a very different interpretation of this feature here, according to which it is thought to provide a signature of energy averaging in these dots. An interesting conclusion is that there is no weak localization in these quantum dots, at least not in the sense normally implied in disordered systems. In these latter systems, weak localization essentially provides an additive contribution to the conductance, which arises from a set of backscattered orbits whose importance is rapidly quenched in the presence of a magnetic field. In contrast, we have argued that the transport properties of open quantum dots are related directly to the details of their density of states, which in turn is determined solely by contributions from backscattered orbits. When the magnetic field is increased sufficiently, the formation of welldefined Landau levels results in current flow at the Fermi surface being
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
67
carried by a finite number of edge states. These may be selectively confined in open quantum dots, in which the mesoscopic geometry also enhances the interaction between the different edge states. A striking observation in this regime is a resonant breakdown of the quantum Hall effect, which is correlated to the depopulation of Landau levels in the dot. According to the results of numerical simulations, in which the self-consistent evolution of the quantum dot profile with magnetic field is properly accounted for, this breakdown results from a sudden increase in backscattering via trapped edge states, whose widths swell significantly as a Landau level depopulates and charge is redistributed within the dot. In this regard, the resonances may be considered as resulting from van Hove-like singularities in the coupling between different Landau levels. In conclusion, we once again emphasize the powerful role that magnetotransport studies may play in the characterization of mesoscopic devices. Recently, much interest has been focused on the potential application of quantum dots in novel technologies such as quantum computing and ultrahigh frequency signal processing. Applications such as these offer the possibility of a genuine paradigm shift in microelectronics, which in turn is expected to derive from many of the fundamental phenomena revealed by the studies presented here.
ACKNOWLEDGMENTS In the course of the work presented here, the authors have benefited from invaluable interactions with a number of individuals, including: Y. Aoyagi; J. R. Barker; K. F. Berggren; K. M. Connolly; J. Cooper; L. Eaves; C. Ford; T. M. Fromhold; H. L. Grubin; H. Hofmann; N. Holmberg; K. Ishibashi; S. Komiyama; M. Keller; H. Linke; P. Main; C . M. Marcus; A. P. Micolich; K. Nakamura; R. Newbury; Y. Ochiai; Y. Okubo; D. M. Olatona; P. Omling; D. P. Pivin; Jr., T. Sugano; R. P. Taylor; D. Vasileska, and R. Wirtz. The work presented here was supported in part by The Institute for Physical and Chemical Research, Japan (RIKEN); The Office of Naval Research (ONR); and The Defense Advanced Research Projects Agency (DARPA).
REFERENCES Akis, R.. Ferry, D. K., and Bird, J. P. (1996a). P h j ~Rev., B 54: 17705. Akis, R., Bird, J. P.. and Ferry, D. K. (1996b). J . Phys. Condens. Mritter 8: L667.
68
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
Akis, R., Ferry, D. K., and Bird, J. P. (1997a). Phys. Rev. Lett., 79: 123. Akis, R., Ferry, D. K., and Bird, J. P. (1997b). Jpn. J. Appl. Phys. 36: 3981. Akis, R., Vasileska, D., Ferry, D. K., and Bird, J. P. (1998). Submitted for publication. Altland, A. and Gefen Y. (1995). Phys. Rev. B 51: 10671. Altshuler, B. L., Aronov, A. G., and Khmelnitsky, D. E., (1982). J. Phys., C 1 5 7367. Ashoori, R. C., Stormer, H. L., Weiner, J. S., Pfeiffer, L. N., Pearton, S. J., Baldwin, K. W., and West, K. W., (1992). Phys. Rev. Lett., 6 8 3088. Baranger, H. U., Jalabert, R. A., and Stone A. D. (1993). Phys. Rev. Lett., 7 0 3876. Beenakker, C. W. J. and van Houten, H. (1988). Phys. Rev., B 37: 6544. Beenakker, C. W. J. (1990). Phys. Rev. Lett., 64: 216. Beenakker, C. W. J., van Houten, H., and Staring, A. A. M. (1991). Phys. Rev., B 4 4 1657. Berggren, K. F., Ji, Z. L., and Lundberg, T. (1996). Phys. Rev., B 5 4 11612. Bergmann, G., (1983). Phys. Rep., B 28: 2914. Bergmann G. (1984). Phys. Rev. 107, 1. Berry, M. V. and Tabor, M. (1976). Proc. Roy. Soc. (London), A 349: 101. Berry, M. V. and Tabor, M. (1977). J.*Phys., A 10: 371. Berry, M. V. (1984). In The Wave-Particle Dualism. S Diner er al., eds., Dordrecht: Riedel. Berry, M. J., Katine, J. A,, Marcus, C. M., Westervelt, R. M., and Gossard, A. C. (1994a). SurJ Sci., 305: 495. Berry, M. J., Baskey, J. H., Westervelt, R. M., and Gossard, A. C. (1994b). Phys. Rev., B 5 0 8857. Bird, J. P., Grassie, A. D. C., Lakrimi, M., Hutchings, K. M., Harris, J. J., and Foxon, C. T. (1990). J. Phys. Condens. Matter, 2: 7847. Bird, J. P., Grassie, A. D. C., Lakrimi, M., Hutchings, K. M., Meeson, P., Harris, J. J., and Foxon, C. T. (1991). J . Phys. Condens. Matter, 3 2897. Bird, J. P., Ishibashi, K., Aoyagi, Y., and Sugano, T. (1994a). Phys. Rev., B 5 0 18678. Bird, J. P., Ishibashi, K., Stopa, M., Aoyagi, Y., and Sugano, T. (1994b). Phys. Rev., B SO: 14983. Bird, J. P., Ishibashi, K., Stopa, M., Taylor, R. P., Aoyagi, Y., and Sugano, T., (1994~).Phys. Rev., B 4 9 11488. Bird, J. P., Ishibashi, K., Ferry, D. K., Ochiai, Y., Aoyagi, Y., and Sugano, T. (1995a). Phys. Rev., B 51: R18037. Bird, J. P., Olatona, D. M., Newbury, R., Taylor, R. P., Ishibashi, K., Stopa, M., Aoyagi, Y., Sugano, T., and Ochiai, Y. (1995b). Phys. Rev., B 52: R14336. Bird, J. P., Ishibashi, K., Ferry, D. K., Aoyagi, Y., Sugano, T., and Ochiai, Y., (1995~).Phys. Rev., B 5 2 8295. Bird, J. P., Ishibashi, K., Ochiai, Y., Lakrirni, M., Grassie, A. D. C., Hutchings, K. M., Aoyagi, Y., and Sugano, T., (1995d). Phys. Rev., B 5 2 1793. Bird, J. P., Ishibashi, K., Aoyagi, Y., and Sugano, T., (1995e). J. Phys. Soc. Jpn., 10: 3618. Bird, J. P., Ferry, D. K., Akis, R., Ishibashi, K., Aoyagi, Y., Sugano, T., and Ochiai, Y. (1996a). Europhys. Lett., 35: 529. Bird, J. P., Ferry, D. K., Edwards, G., Olatona, D. M., Newbury, R., Taylor, R. P., Ishibashi, K.. Aoyagi, Y., Sugano, T., and Ochiai, Y. (1996b). Physica, B 227: 148. Bird, J. P., Ishibashi, K., Aoyagi, Y., and Sugano, T. (1996~).Phys. Rev., B 5 3 3642. Bird, J. P., Akis, R., Ferry, D. K., Pivin, Jr., D. M., Connolly, K. M., Taylor, R. P., Newbury, R.,Olatona, D. M., Ochiai, Y.,Okubo, Y., Ishibashi, K.. Aoyagi,Y., and Sugano, T., (1997a). Chaos, Solitons and Fractals 8 1299. Bird, J. P., Akis, R., Ferry, D. K., Aoyagi, Y., and Sugano, T. (1997b). J . Phys. Condens. Matt., 9 5935. Bird, J. P., Linke, H., Cooper, J., Micolich, A. P., Ferry, D. K., Akis, R., Ochiai, Y., Taylor, R. P., Newbury, R., Omling, P., Aoyagi, Y., and Sugano, T. (1997~).Phys. Stat. Sol. (b), 204 314.
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
69
Bird, J. P., Stopa, M., Taylor, R. P., Newbury, R., Aoyagi, Y., and Stopa, M. (1997d). Superlatt. Microstruct., 2 2 57. Bird, J. P., Stopa, M., Connolly, K., Pivin, Jr., D. M., Ferry, D. K., Aoyagi, Y., and Sugano, T. (1997e). Phys. Rev., B 5 6 7477. Bird, J. P., Micolich, A. P., Linke, H., Ferry, D. K., Akis, R., Ochiai, Y., Aoyagi, Y., and Sugano, T. (1988a). J . Phys. Condens. Mutt., 1 0 L55. Bird, J. P., Akis, R., Ferry, D. K., Cooper, J., Ishibashi, K., Ochiai, Y., Aoyagi, Y., and Sugano, T. (1998b). Semicond. Sci. Tech., 1 3 A4. Bnrggild, P., Kristensen, A,, Bruus, H., Reimann, S. M., and Lindelof, P. E. (1998). Phys. Rev., B 57: 15408. Brack, M. and Bhaduri, R. K. (1997). Semiclassical Physics. Reading, MA: Addison-Wesley. Brown, C. V., Geim, A. K., Foster, T. J., Ldngerak, C. J. G. M., and Main, P. C. (1993). Phys. Rev., B 47: 10935. Biittiker, M., Imry, Y., Landauer, R., and Pinhas, S. (1985). Phys. Rev., B 31: 6207. Biittiker, M., (1988). Phys. Rev., B 3 3 3020. Biittiker, M., (1992). In Semiconductors und Semimetals, Volume 35, M. Reed, ed., pp. 191-277, New York: Academic Press. Casati, G. and Chirikov, B. (eds.). (1995). Quantum Chaos. Cambridge: Cambridge University Press. Chan, I. H., Clarke, R. M., Marcus, C. M., Campman, K., and Gossard, A. C. (1995). Phys. Rev. Lett., 7 4 3876. Chang, A. M., Timp, G., Chang, T. Y., Cunningham, J. E., Chelluri, B., Mankiewich, P. M., Behringer, R. E., and Howard, R. E. (1988). Surf Sci., 196 46. Chang, A. M. (1990). Solid State Comm., 74: 871. Chang, A. M., Baranger, H. U., Pfeiffer, L. N., and West, K. W. (1994). Phys. Rev. Lett., 73: 2111. Chklovskii, D. B., Shklovskii, B. I., and Glazman, L. I. (1992). Phys. Rev., B 4 6 4026. Clarke, R. M., Chan, I. H., Marcus, C. M., Duruoz, C. I., Harris, Jr., J. S., Campman, K., and Gossard, A. C. (1995). Phys. Rev., B 52: 2656. Ferry, D. K., Edwards, G., Ochiai, Y., Yamamoto, K., Bird, J. P., Ishibashi, K., Aoyagi, Y., and Sugano, T. (1995). Jpn. J . Appl. Phys., 3 4 4338. Ferry, D. K., and Goodnick, S. M. (1997). 7?ansport in Nanostructures. Cambridge: Cambridge University Press. Ferry, D. K., Bird, J. P., Akis, R., Pivin, D. P. Jr., Connolly, K. M., Ishibashi, K., Aoyagi, Y., Sugano, T., and Ochiai, Y. (1997). Jpn. J. Appl. Phys., 36: 3944. Ferry, D. K., Akis, R., and Bird, J. P. (1998). Superlatt. Microstruct. 2 3 611. Fromhold, T. M., Wilkinson, P. B., Sheard, F. W., Eaves, L., Miao, J., and Edwards, G . (1995). Phys. Rev. Lett., 7 5 1142. Fukuyama, H. and Abraham, E. (1983). Phys. Rev., B 27: 5976. Geim, A. K., Main, P. C., Beton, P. H., Eaves, L., Beaumont, S. P., and Wilkinson, C. D. W. (1992). Phys. Rev. Lett., 6 9 1248. Glazman, L. I. and Jonson M., (1989). J. Phys. Condens. Matter, 1: 5547. Grabert, A. and Devoret, M. H. (eds.). (1991). Single Charge Tunneling. Volume 294, N A T O Advanced Study Institute, Series B: Physics, New York: Plenum. Gutzwiller, M. C. (1971). J . Math. Phys., 12: 343. Gutzwiller, M. C. (1990). Chaos in Classicul and Quantum Mechanics. Berlin: Springer-Verlag. Holmberg, N., Akis, R., Pivin, Jr., D. P., Bird, J. P., and Ferry, D. K. (1998). Semicond. Sci. Tech., 13: A21. Huibers. A. G., Switkes, M., Marcus, C. M., Campman, K., and Gossard, A. C. (1998). Phys. Rev. Lett., 81: 1917.
70
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
Ishibashi, K., Bird, J. P., Stopa, M., Sugano, T., and Aoyagi, Y. (1993). Jpn. J. Appl. Phys., 32: 6246. Jalabert, R. A., Baranger, H. U., and Stone, A. D. (1990). Phys. Rev. Lett., 65: 2442. Keller, M. W., Millo, O., Mittal, A., and Prober, D. E. (1994). SurL Sci., 305: 501. Keller, M. W., Mittal, A., Sleight, J. W., Wheeler, R. G., Prober, D. E., Sacks. R. N., and Shtrikrnann, H. (1996). Phys. Rev., B 53: R1693. Kirczenow, G. (1994). Phys. Rev., B 50: 1649. Kurdak, C., Chang, A. M., Chin, A., and Chang, T. Y . (1992). Phys. Rev., B 4 6 6846. Landauer, R.(1957). IBM J . Res. Deoelop., 1: 223. Lee, P. A,, Stone, A. D., and Fukuyama, H. (1987). Phys. Rev., B 35 1039. Linke, H., Bird, J. P., Cooper, J., Omling, P., Aoyagi, Y., and Sugano, T. (1997a). Phys. Rev., B 5614397. Linke, H., Bird, J. P., Cooper, J., Omling, P., Aoyagi, Y., and Sugano, T . (1997b). Phys. Stcir. Sol., 204 318. Marcus, C. M., Rirnberg, A. J., Westervelt, R. M., Hopkins, P. F., and Gossard, A. C. (1992). Phys. Rev. Lett., 69: 506. Marcus, C. M., Westervelt, R. M., Hopkins, P. F., and Gossard, A. C. (1993a). Chaos, 3: 643. Marcus, C. M., Westervelt, R. M., Hopkins, P. F., and Gossard, A. C. (1993b). Phys. Reo., B 4 8 2460. Marcus, C. M., Westervelt, R. M., Hopkins, P. F., and Gossard, A. C. (1994). Surf: Sci., 305: 480. McEuen, P. L., Foxman, E. B., Meirav, U., Kastner, M. A.. Meir, Y., and Wingreen, N. S. (1991). Phys. Rev. Lett., 66: 1926. Milliken, F. P., Washburn, S., Urnbach, C. P., Laibowitz, R. B., and Webb, R. A. (1987). P h j ~ Reu., B 36: 4465. Nakarnura, K. (1993). Quantum Chaos: A New Puradiym of Non-Linear Dynamics. Cambridge: Cambridge University Press. Nixon, J. A. and Davies, J. H. (1990). Phys. Reu., B 41: 7929. Okubo, Y., Ochiai, Y., Vasileska, D., Akis, R., Ferry, D. K., Bird, J. P., Ishibashi, K., Aoyagi, Y., and Sugano. T. (1997a). Phys. Lett., A 236: 120. Okubo, Y., Bird, J. P., Ochiai, Y., Ferry, D. K., Ishibashi, K., Aoyagi, Y., and Sugano, T. (1997b). Phys. Rev., B 5 4 1368. Person, M., Petterson, J., von Sydow, B., Lindelof, P. E.. Kristensen, A., and Berggren, K. F. (1995). Phys. Rev., B 52: 8921. Richter, K., Ullrno, D., and Jalabert, R. A. (1996). Phys. Rev., B, 54: R5219. Sachrajda, A. S., Taylor, R. P., Dharrna-Wardana, C., Zawadzki, P., Adarns, J. A,, and Coleridge, P. T. (1993). Phys. Rev., B 47: 681 1. Simpson, P. J., Mace, D. R., Ford, C. J. B., Zailer, I., Pepper, M., Ritchie, D. A,, Frost, J. E. F., Grimshaw, M. P., and Jones, G.A.C. (1993). Appl. Phys. Letr., 63: 3191. Sivan, U. and Imry, Y . (1988). Phys. Rev. Lett., 61, 1001. Sivan, U., Irnry, Y., and Hartzstein, C. (1989). Phys. Rev., B 39: 1242. Stopa, M. (1996). Phys. Rev., B 5 3 9595. Stopa, M. (1998). Sernicond. Sci. Techno/., 13: A55. Stopa, M., Bird, J. P., Ishibashi, K., Aoyagi, Y., and Sugano, T. (1994). Superlatr. Microstruc't., 15: 99. Stopa, M., Bird, J. P., Ishibashi, K., Aoyagi, Y., and Sugano, T. (1996). Phys. Rev. Left., 76: 2145. Takane, Y. (1998). J . Phys. Soc. Jpn., 6 7 3003. Tarucha, S., Austing, D. G., Honda, T., van der Hage, R. J., and Kouwenhoven, L. P. (1996). Phys. Rev. Lett., 71: 3613.
MAGNETO-TRANSPORT AS A PROBE O F ELECTRON DYNAMICS
71
Taylor, R. P., Sachrajda, A. S., Zawadzki, P., Coleridge, P. T., and Adams, J. A. (1992). Phys. Rev. Lett., 69: 1989. Taylor, R. P., Newbury, R., Sachrdjda, A. S., Feng, Y., Coleridge, P. T., Dettmann, C., Zhu, N., Guo, H., Delage, A,, Kelly, P. J., and Wasilewski, 2. (1997). Phys. Rev. Lett., 7 8 1952. Thornton, T. J., Pepper, M., Ahmed, H., Andrews, D., and Davies, G. J. (1986). Phys. Rev. Lett., 5 6 1198. van der Vaart, N. C., de Ruyter van Steveninck, M. P., Kouwenhoven, L. P., Johnson, A. T., Nazarov, Y. V., Harmans, C. J. P. M., and Foxon, C . T. (1994a). Phys. Reu. Lett., 7 3 320. van der Vaart, N. C., de Ruyter van Steveninck, M. P., Harmans, C. J. P. M., and Foxon, C. T. (1994b). Physica B 194-6 1251. van der Vaart, N. C., Godijn, S. F., Nazarov, Y. V., Harmans, C. J. P. M., Mooij, J. E., Molenkamp, L. W., and Foxon, C. T. (1995). Phys. Rev. Lett., 7 4 4702. van der Vaart, N. C., Kouwenhoven, L. P.. de Ruyter van Steveninck, M. P., Nazarov, Y. V., Harmans, C . J. P. M., and Foxon, C. T. (1997). Phys. Rev., B 55: 9746. van Wees, B. J., van Houten, H., Beenakker, C. W. J., Williamson, J. G., Kouwenhoven, L. P., van der Marel, D., and Foxon. C. T. (1989). Phys. Rev. Lett., 6 0 848. van Wees, B. J., Kouwenhoven. L. P., Harmans, C. J. P. M., Williamson, J. G., Timmering, C . E., Broekaart, M. E. I., Foxon, C. T., and Harris, J. J. (1989). Phys. Rev. Lett., 62: 2523. Vasileska, D., Wybourne, M. N., Goodnick, S. M., and Gunther, A. D. (1998). Semicond. Sci. Tech., 1 3 A37. von Klitzing, K., Dorda, G., and Pepper, M. (1980). Phys. Rev. Lett., 45: 494. Wang, Y., Wang, J., and Guo, H. (1993). Phys. Rev., B 4 7 4348. Waugh, F. R., Berry, M. J., Mar, D. J., Westervelt, R. M., Campman, K. L., and Gossard, A. C . (1995). Phys. Rev. Lett., 75: 705. Wharam, D. A,, Thornton, T. J., Newbury, R., Pepper M., Ahmed H., Frost J. E. F., Hasko D. G., and Peacock D. C . (1988a). J . Phys. C 21 L209. Wharam, D. A,, Pepper, M., Ahmed, H. Frost, J. E. F., Hasko, D. G., Peacock, D. C., Ritchie, D. A., and Jones, G. A. C . (1988b). J . Phys. C 21: L887. Yacoby A,, Sivan U., Umbach, C . P., and Hong, J. M. (1991). Phys. Rev. Lett., 66: 1938. Yacoby, A,, Heiblum, M., Mahalu, D. and Shtrikman, H. (1995). Phys. Reu. Lett., 74: 4047. Zozoulenko, 1. V., Schuster, R., Berggren, K. F., and Ensslin, K. (1997). Phys. Rev., B 55: R 10209.
This Page Intentionally Left Blank
ADVANCES I N IMAGING AND ELECTRON PHYSICS. VOL. 107
External Optical Feedback Effects in Distributed Feedback Semiconductor Lasers MOHAMMAD F. ALAM' and MOHAMMAD A. KARIM' 'Electro-Optics Progrum. University of Duyton, Duyton, Ohio 45469-0245, U.S. 'Depurtment of Electricul Engineering, University of Tennessee, Knoxville, Tennessee 37996-2100. U.S.
. . . . . . . . . . . . . . . . . . . . . . . . . . , . A. Physical Structures for Distributed Feedback Lasers . . . . . B. Distributed Feedback Laser Electromagnetics . . . . . . . C. Oscillation Condition for a Distributed Feedback Laser . . . D. General Characteristicsof Distributed Feedback Lasers . . . 111. Experimentally Observed Effects . . . . . . . . . . . . . . . A. Intensity Fluctuation and Spectral Characteristics . . . . . . B. Linewidth Reduction, Broadening, and Chaos . , . . . , . C. Noise Generation . . . . . . . . . . . . . . . . . . . . D. Regimes of External Feedback . . . . . . . . . . . . . . E. Otherphenomena. . . . . . . . . . . . . . . . . , . IV. Theories on Optical Feedback . . . . . . . . . . . . . . . . A. Compound Cavity Model . . . . . . . . . . . . . . . . B. Coherence Collapse . . . . . . . . . . . . . . . . . . . C. Bistability Under Optical Feedback . . . . . . . . . . . . D. Mode Competition Noise . . . . . . . , . . . . . . . . V. External Optical Feedback Sensitivity . . . . . . . . . . . . A. Sensitivity to Threshold Gain and Spectrum . . . . . . . . B. Feedback Sensitivity Based on Mode Competition Theory . . VI. Conclusion . . . . . . . . . . . . . . . . . . . . . . , . References . . . . . . . . . . . . . . . . . . . . . . . . I. Introduction . . .
. .
11. Distributed Feedback Laser Fundamentals
. . . . . . .
. . . .
.
.
.
. . .
.
. .
. .
. . . . . . . . . , .... . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . , . . . , . . . . . , . . . . . . . . . . . . . . . , . . , . . . . . . . . . . . . . . . , . . . , . . . . . ,
73 74 75 75 79 81 82 83 87 90 94 95 97 98 102 104 104
107 108 11 I 114 1 15
I. INTRODUCTION Semiconductor lasers are used as a coherent light source in a number of applications including optical communication systems, optical data storage systems (compact discs (CDs), digital versatile discs (DVDs) etc.), optical measurement systems, printing etc. These lasers are inexpensive, lightweight, highly efficient, and can be mass-produced using integrated circuit fabrication technology. These lasers use a forward-biased p-n junction (or diode) to achieve optical gain (Chuang, 1995). The simplest type of semiconductor laser (also called laser diode) is the Fabry-Perot (FP) type semiconductor 73 Volume 107 ISBN 0-12-014749-1
ADVANCES IN IMAGING A N D ELECTRON PHYSICS Copyright C 1999 by Acddemic Press All rights of reproduction in dny h i n reserved ISSN 1076-s670/~9 $30 00
74
MOHAMMAD F. ALAM A N D M O H A M M A D A. KARIM
laser. A Fabry-Perot type laser diode uses the reflectivity of the facets (semiconductor-air interface) for providing necessary optical feedback to sustain laser oscillation. On the other hand, distributed feedback (DFB) semiconductor lasers use an internal Bragg grating within the laser cavity to provide optical feedback. The grating works as a wavelength-selective device to achieve highly stable narrow-linewidth laser operation. Distributed feedback lasers are particularly useful in wavelength-division multiplexed (WDM) optical communication systems where different DFB lasers transmit optical signals of different wavelengths that are very closely spaced. Due to their present and future technological significance, research activity on DFB lasers has increased manyfold during the recent years. A major problem with semiconductor lasers, both FP and DFB types, is that these semiconductor lasers are highly sensitive to the laser light which re-enters the laser cavity after being reflected by an external reflector. External optical feedback of the laser light usually causes instability of operation of a laser diode and generates excessive noise in optical communication systems (Petermann, 1988; Lenstra, 1991a, b; Twu, et al., 1992; Park et al., 1998). A variety of optical elements, including lenses, fiber endfaces etc. can be the source of unwanted optical feedback. Rayleigh backscatter from fiber can be another source of optical feedback. Packaged laser diodes may also receive optical feedback due to the external cavity formed between the laser diode chip and a transparent window of the package. Furthermore, in many cases an integrated external cavity with a laser diode is unavoidable in an optoelectronic device or circuit. For these reasons, costly and bulky optical isolators are required in most applications to protect semiconductor lasers from optical feedback-induced noise. However, an isolator can still cause very weak optical feedback to a diode laser. We will discuss in an exploratory fashion the basic mechanisms of noise generation, methods of analyzing external optical feedback performance, and various factors that contribute to the external optical feedback performance in semiconductor lasers with special emphasis on DFB lasers. In Section 11, we discuss the basic electromagnetic equations for DFB semiconductor lasers. Sections 111 and IV discuss experimental effects and theoretical investigations, respectively, related to external optical feedback. External optical feedback sensitivity is described in Section V. Section VI summarizes this chapter. 11. DISTRIBUTED FEEDBACK LASERFUNDAMENTALS
In DFB semiconductor lasers, distributed feedback is achieved by periodic longitudinal modulation of either the index of the waveguiding layer, or the
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
75
net optical gain in the active layer, or both. A DFB laser with only index modulation is called an index-coupled (IC) DFB laser while a laser with a gain modulation is termed a gain-coupled DFB laser. When both index and gain coupling are present, a DFB laser is said to be complex-coupled. However, in the literature, complex-coupled DFB lasers are sometimes termed simply “gain-coupled lasers” (to differentiate them from pure indexcoupled lasers) although they contain both index and gain coupling. Consequently DFB lasers that have only gain coupling but do not have index-coupling are referred to as “pure gain-coupled” lasers.
A. Physical Structures for Distributed Feedback Lasers It is possible for a DFB structure to be incorporated into a doubleheterostructure (DH) semiconductor laser. In the first DFB semiconductor lasers fabricated, the optical feedback needed for lasing operation used to be provided by a corrugated surface between the active layer and an outer p-AIGaAs layer. The fabrication of the grating in such an active layer caused interface recombination centers that increased the threshold current density substantially at high temperatures (Casey and Panish, 1978). Accordingly, it was impossible to operate such lasers around 300K even at low current densities. This problem was overcome by the separate (optical and carrier) confinement heterostructure (SCH) developed by Aiki et d . (1975). In this new structure, the carriers are confined to the p-AIGaAs active layer while the active layer is a larger region that includes two additional layers of p-AIGaAs. The grating is made on the p-AIGaAs layer to obtain optical feedback. Because the active layer is separated from the corrugated interface, the threshold current density has been found to be low enough to operate the laser diode at higher temperatures. Recent advances in fabrication technology have produced pure gaincoupled lasers (Luo et al., 1990, 1991), mixed-coupled lasers (Luo et al., 1991; Li et al., 1992), and loss-coupled lasers (Tsang et al., 1992a; Bourchert et al., 1993). B. Distributed Feedback Laser Electromagnetics An analysis of the electromagnetic fields inside a DFB laser begins with Maxwell’s equations. First, the wave equation for a simple semiconductor laser cavity is developed, and then this equation is solved considering the index and/or gain grating or corrugation present inside the cavity. Figure 1 shows the schematic of the model for electromagnetic analysis of a DFB laser with coherent external optical feedback. Amplitude reflectivities
76
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
SURFACE OF ANOTHER DEVICE
FIGURE 1. Schematic of a DFB semiconductor laser with a phase shift and external feedback.
of the left and right facets are r 1 and r,, and the corresponding power reflectivities of the facets are R , and R,, respectively. It is assumed that the output beam from the right facet of the DFB laser is reflected by an external reflector and the reflected light re-enters the laser cavity. Here r is the ratio of the feedback power to the output power at the facet, and v] is the coupling ratio of optical feedback into the active region in the laser cavity. Thus, qlis the effective feedback ratio of the laser. The model for a DFB laser as shown in Fig. 1 consists of two sections with lengths I , and I,, respectively. The regions -1, < z < 0 and 0 < z < I , are denoted, respectively, region 1 and region 2 for convenience. For any one of the regions, the index of refraction n(z) is assumed to vary along the (axial) z-direction as
where Ti is the average refractive index over the z-direction and An is its amplitude variation. The corrugation is assumed to have a Bragg grating with a spatial period A. Thus, the Bragg wave number is flB = n/A. The initial phase of the corrugation at the plane z = 0 is assumed to be 4, and the specific.values of C#J are 4, and 4, in regions 1 and 2, respectively. The DFB laser having a quarter-wave or A/4 phase shift has 4, = 4, = 0 . 5 ~ . Due to gain variation in the gain grating, the susceptibility ~ ( z varies )
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
77
according to
where ji is the average susceptibility over the z-direction and Ax is its amplitude variation. Ic/ denotes the phase difference between the index and the gain grating. The quantities n2(z) and ~ ( z as ) used in Eqs. (1) and (2) are interrelated by the well-known relationships: -
D
= con2E
+
(3)
and
P
(4) where 6 is the electric flux density vector, E is the electric field vector, i' is the electric polarization vector, and e0 is the permittivity of the free space. Using Eqs. (1)-(4) in Maxwell's equations, we get the wave equation for a DFB semiconductor laser to be -
aE
V 2 E - po-
at
= EoXE
a2E
- pE0n2-
at2
a2E
- p&,X7
at
=0
where p is the magnetic permeability and o is the resistivity of the material. Here N is assumed to be a mode corresponding to either an internal or an external cavity mode. To find a solution to Eq. (9,we may assume a solution of the form E = C [A,(z)F,(x,
y)ejONt-jflBz
+ B,(z)FN(x,
+ C.C]
y)ejwN'+jflBz
(6)
N
where A , and B , are amplitudes of the forward and backward components of mode N , respectively. The F,(x,y) is the normalized transverse component of the field distribution of mode N . The propagation constant of mode N , denoted by P N , satisfies the wave equation for a DFB laser in an unperturbed medium without any loss, corrugation, or external feedback, given by (V2 + p e O i i 2 w ~ ) ~ , (y)e-jflNZ x, = 0.
(7)
The angular lasing frequency of the mode N is given by BN w, = 2nf, = 7
(8) n&' By putting the value of E from Eq. ( 6 ) in Eq. (9,and replacing n2 and x in Eq. (5) using Eqs. (1) and (2), respectively, we arrive at an equation involving space dependence in the x, y , and z directions, and also time
78
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARlM
dependence. In that equation, the variation of field amplitudes is rather small in the z-direction. As a result, the second-order derivatives with respect to z may be neglected here. We also take time average over a period of AT = 2n/oN and integrate with respect to x and y within the limits of - 00 to + co to obtain a transverse spatial average. In addition, we choose FN(x,y ) such that !Tm (FN(x,y)I2 d x d y = 1. Also, with rotating wave approximation (RWA), some other terms disappear. After these approximations, we arrive at a pair of coupled wave equations (Kogelnik and Shank, 1972):
I.",
where SP, is the deviation of the wave number of mode N from Bragg wave number, g and CI are the gain and loss coefficients for the traveling wave given by
and, K~ and K~ are the coupling coefficients, respectively for the index and , K, are given by gain coupling. The quantities SP,, K ~ and
Thus, the angular lasing frequency can be written as
By differentiating the coupled wave equations (Eqs. (9) and (10)) with
79
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
respect to z, we obtain
[(&-
d2A,(z) 7 = -d2BN(z) -
dz2
g - CI
-jbp,y
[(y
-jda,)i
+( K ~
+jtigej*)(tii
1 1
+ jtige-j@) A,(z)
+ (rci+ j t i g e j @ ) ( t i +i j t i g e - j $ )
Equations (16) and (17) can be written as
(16)
BN(z) (17)
provided
The phase angle $ between the gain and index corrugations is usually 0 (in-phase) or n (antiphase). Assuming that $ is either zero or n, the contributions of both gain and index coupling can be expressed by a single ~ .solutions of Eqs. (18)complex coupling factor R, where R = tii + j ~ The (19) can be written as
+ = bleYNZ + b2e-YNz
A N ( z )= aleYNZa2e-YNz
B,(z)
where a1,2 and b,,2 are constants to be evaluated from the boundary conditions for each section. The boundary conditions for the two sections of the laser are utilized to find the different solutions A,,,(z) and B,,,(z) in Section 1, and A,,,(z) and B,,,(z) in Section 2. Due to the external feedback, the modified reflectivity of the right facet becomes (Favre, 1987),
r; =
+ (1 - R2)JF
e-jsext
(23)
where Be,, = 4de,,/Ae,, is the phase delay of the feedback light in the external cavity and A,, = 27c/(PB 6pN)nr,where n, is the refractive index. Multiple reflection of the feedback light between the laser right facet and the external reflector is neglected assuming that r is sufficiently small.
+
C. Oscillation Condition for a Distributed Feedback Laser For a DFB laser with a phaseshift at the plane z = 0, the boundary
80
MOHAMMAD F. ALAM A N D M O H A M M A D A. KARIM
conditions are given by riAN,2(/2)e-joBi2= B N.2 (1 2) e j 8 ~ 1 2
(24)
rlAN,l(-ll)ej~E'l= BN,](-!])e-J""
(25)
AN,2(0)
= BN,I(o)
(26)
= BN,2(0)'
(27)
Using the conditions given by Eqs. (24)-(27), we obtain the following condition for oscillation of a phase-shifted complex-coupled DFB semiconductor laser:
{
- y Nr,e - j 2 h i 1 cash( - Y N [ I ) - [rle-J2PBl1 (-jS,ON
x
{
+F)-jRej41]
sinh(yNll)}
-yNr)Ze-j28B'2 cosh(-yNz2)
Equation (28) is an equation involving complex quantities. Thus, both real and imaginary parts of Eq. (28) must be simultaneously satisfied by each of the allowed lasing modes. Equation (28) is developed assuming that the phase angle IJ between the gain and index corrugations is either zero or n. However, for the more general case when ) I is neither zero nor n, the contributions of both gain and index coupling can no longer be expressed by a single complex coupling factor R, and we need two different complex factors K + and K - , defined as follows:
K,
= K~
+ jK,e+j*
(29)
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
81
and
+
K - = ui juge-j#.
(30)
For a DFB laser that does not have any phase shift within the structure but has a dephased grating, the boundary conditions for the coupled wave equations become: r2A(L/2) = B(L/2)
(31)
rlB( - L/2) = A( - L/2).
(32)
The oscillation condition obtained from the preceding boundary conditions becomes:
where y is given by Eq. (20). In Eq. (33), r2 must be replaced by r; when external optical feedback is applied to the facet with reflectivity r2. D. General Characteristics of Distributed Feedback Lusers
Each lasing mode of a DFB laser is characterized by its threshold gain
(8 - a) and its lasing wave vector 8. However, the departure of the lasing wave vector from the Bragg wave vector (denoted by Sp,) is the parameter considered in theory. Thus, in the ( J - a) - Sp, plane, each mode solution represents a point where both the real and imaginary parts of the oscillation condition (Eq. (28) or Eq. (33)) are satisfied. The mode solutions of a DFB laser reveal a number of modes with different threshold gains and oscillation frequencies around the Bragg frequency. The mode with the lowest threshold gain is the dominant mode (i.e., the main mode). In a correctly designed DFB laser, the threshold gain difference between the dominant mode and other modes is very large (at least 30dB). In such lasers, only the dominant mode oscillates with almost all of the output power resulting in single-mode operation with a single narrow output spectrum. Pure index-coupled (IC) lasers require a phase-shift within the laser structure to achieve single-mode operation. On the other hand, pure or partly gain-coupled (GC) lasers can achieve single-mode operation without any phase-shifting structure. Theoretically, pure gain-coupled lasers show large threshold gain difference between the main lasing mode and the side modes (Morthier et al., 1990), higher stability due to standing wave effect
82
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
(Kogelnik and Shank, 1972; David et al., 1992), and less spatial hole burning (SHB) (Kapon et al., 1982; David et al., 1991). However, it is difficult to fabricate pure gain gratings because gain variation causes carrier density variation, which in turn causes the index of refraction to vary. Thus, in most gain-coupled lasers, both index and gain coupling are present and such lasers are characterized by a complex coupling coefficient. Lasers with complex coupling coefficients have been reported to exhibit excellent performance (Tsang et a/., 1992b; Li et al., 1992). It has also been reported that partly gain-coupled lasers may show better modulation characteristics than pure G C lasers (Lowery and Novak, 1994; Zhang and Carroll, 1993; Hong et al., 1995). Even a small amount of gain coupling ( 5 % of index coupling) present in complex-coupled DFB lasers can significantly improve the threshold gain difference between the main mode and the side modes (Morthier et al., 1990). Facet reflectivities of a laser play a crucial role in determining the lasing characteristics. In almost all cases, antireflection (AR) coatings are needed on the facets to eliminate the effect of uncertainty of the corrugation phase at the facet during fabrication. Pure gain-coupled lasers also have high yield despite the corrugation phase variation at the facets (Nakano et ul., 1992). One of the facets in a G C laser can be made highly reflecting (HR) to create an HR-AR gain-coupled laser with a high facet efficiency.
OBSERVED EFFECTS 111. EXPERIMENTALLY A number of effects are experimentally observed when a semiconductor laser is subjected to external feedback. When the distance of the external reflector is less than the coherence length of the laser, the feedback is termed coherent. On the other hand, distant reflectors located at a distance longer than the coherence length of the semiconductor laser produces incoherent feedback. A laser diode with a fiber as the external cavity has an external cavity length typically greater than 10 cm whereas a GRIN-rod lens external coupled cavity laser has a cavity length of less than 1 cm. The major effects observed under feedback are line broadening and the increase in the number of modes of oscillation due to external cavity modes. These may lead to mode hopping, intensity fluctuation, and generation of excess noise in optical communication systems. The exact characteristics depend on the feedback level, laser structure, laser driving current, presence or absence of a modulating signal etc. Some of the observed characteristics include excess noise in the low and high frequency range (Broom et a/., 1970; Salathe, 1979; Temkin et ul., 1986; Fujita et al., 1984; Goldberg et al., 1980; Goldberg et ul., 1982; Park et ul., 1998), suppression of excess noise with a
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
83
small modulating signal (Fujita et ul., 1984), self-modulation of the optical intensity (Salathe, 1979; Temkin ef a!., 1986; Fujiwara et al., 1981; Park et al., 1989), successive subharmonic-oscillation cascade leading to optical chaos (Mukai and Otsuka, 1985), kink-shaped light-current (L-I)characteristics, (Temkin et al., 1986; Fujiwara et a/., 19Sl), self mode locking of external cavity modes, coherence collapse of output intensity (Lenstra et al., 1985; Miles et a/., 1980; Temkin et al., 1986), and multiple-pass resonances superimposed on noise spectrum (Seo et ul., 1988).
A. Intensity Fluctuation and Spectral Charucteristics The effects of external optical feedback on semiconductor lasers have been extensively studied experimentally. These effects include narrowing of emission spectrum width (Bogatov et al., 1973; Voumard et al., 1977; Kikuchi and Okoshi, 1982), reduction or distortion of modulated output (Kobayashi, 1976), use of a semiconductor laser as a detector (Mitsuhashi et a/., 1976), low-frequency intensity fluctuation in continuous-wave (CW) operated lasers (Risch and Voumard, 1977) etc. On the other hand, from the early days of optical communication, unwanted external feedback in diode lasers has been known to cause degradation of modulation response and increase in intensity noise (due to fluctuations in intensity) (Broom et a/., 1970; Morikawa et al., 1976; Risch and Voumard, 1977; Ikushima and Maeda, 1978; Chinone et a/., 1978; Hirota and Suematsu, 1979). Properties of a semiconductor laser under optical feedback depend closely on its operating condition. Based on the injection current J of a semiconductor laser, three distinctive regions may be defined (Besnard et a/., 1993): (1) J < Jzh: Coherent and additive optical interference effects between the feedback light and the light reflected from the facet are necessary for the laser to work. The compound cavity delivers a stable intensity output. The coherence length is fixed by the external cavity. In this case, the behavior of the system is extraordinarily sensitive to fine adjustments of the external optics, which fixes the light distribution on the laser facet. (ii) J > Jlh:The laser can live by itself (i.e., lasing effect can occur without feedback). The optical feedback becomes uncorrelated to the field inside the laser. This yields an unstable, noisy intensity output. (iii) J z Jrh:In the vicinity of the solitary laser threshold, a mixing of or a hopping between the coherent and the incoherent behavior yields bursts of noise in the output intensity.
84
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
There are two components in the increased fluctuation frequency spectrum when the laser current is above threshold. One is the high-frequency component peaked at frequency l/z where T is the roundtrip time for the light in the external cavity formed by the external reflector and the laser diode facet facing it (Broom et al., 1970; Risch and Voumard, 1977; Ikushima and Maeda, 1978). The other is termed low-frequency fluctuations (LFF) that peak at a frequency reported to be one to two orders of magnitude smaller than l/z. Lang and Kobayashi (1980) reported extensive experimental observation results for CW AlGaAs double heterostructure diode lasers. Their findings demonstrated that the external feedback can make the injection laser multistable and cause hysteresis phenomenon. A typical light-output-versus-current characteristics with and without external optical feedback is shown in Fig. 2. The two curves with external optical feedback suggest the presence of hysteresis effects. The hysteresis phenomenon has been explained to be caused by crystal refractive index variation due to active regional temperature change with current. The jumps in the light-current characteristics are thought to be due to mode switching.
I
CURRENT FIGURE 2. Light-output-versus-current ( L - I ) characteristics for a CW AlGaAs double heterostructure laser diode with and without external optical feedback. (After Lang and Kobayashi, 1980.)
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
85
Mode jumping depends primarily on whether the external cavity length is close to an integral multiple of the wavelength (large output amplitude undulation) or not (small output amplitude undulation). The transient response of the laser diode to a current pulse at different biasing levels was also investigated by Lang and Kobayashi (1980). The results showed that the relaxation oscillation due to the input current pulse was suppressed at dc bias current levels corresponding to the peaks of the undulations of the L-I curve, but was enhanced at bias levels corresponding to the undulation valleys. Besnard et al. (1993) observed that slight adjustments to the experimental conditions may result in dramatic changes of the L-I curves. Figure 3 shows several typical cases of different L-1 characteristics observed under different experimental conditions. Curves (c) and (d) correspond to the situation where alignment was optimally set for maximum feedback. Both curves (c) and (d) follow an identical path for low injection currents with coherent
I-
3 P
I3 0 II
:
CURRENT FIGURE 3. L-I characteristics of GaAs-AIGaAs channel substrate planar (CSP) double heterostructure laser diodes under different experimental conditions. Curves (a) and (b) are obtained under poor optical alignment. Curves (c) and (d) are obtained when optical alignment is optimally set for maximum feedback. In (b) and (c), a region of instability is observed when the laser diode current exceeds the solitary laser threshold. (After Besnard rt of., 1993.)
86
MOHAMMAD F. ALAM AND MOHAMMAD A. KARlM
feedback effects. However, they may separate into two different branches above solitary laser (without feedback) threshold. In curve (d), the output intensity remains stable along the entire curve, while in curve (c), there is a noisy region and a reduction in the final output power. This reduced power is contributed by the generation of noise due to decrease in the mean time between successive intensity breakdowns at high bias currents (Henry and Kazarinov, 1986). Both curves (a) and (b) were obtained under poor optical alignment. Curve (a) resembles the curves that were reported in earlier investigations (e.g., Temkin et al., 1986). Depending on experimental conditions, a kink may or may not be present in the curve, as shown by the two different branches of the curve (a). Fujiwara et al. (1981) carried out experiments to determine the mechanism of low-frequency fluctuations (LFF) enhancement and relationship between peak LFF frequency and basic parameters of the laser diode with an external reflector. In their experiment, they reported similar L-I characteristics as in Fig. 2 (Lang and Kobayashi, 1980). The L F F peak power was maximum at the undulation valleys while it was minimum at the undulation peaks. Several experimenters (unpublished References 10- 12 in Fujiwara et ul., 1981) also found that above a certain current level I , , the undulation amplitude and lasing differential efficiency are suddenly reduced. Fujiwara et al. (1981) experimentally established a simple relationship fo = f R / w , where f, is the L F F peak frequency, f,is the intensity fluctuation peak frequency in the absence of the external optical feedback, K is the coupling parameter between the laser diode cavity and the external cavity, and T is the round-trip time in the external cavity. This simple relationship was found to be qualitatively in agreement with predictions based on compound cavity model (see Section IV-A). The spectral properties of a semiconductor laser under feedback from a reflector at a distance longer than the coherence length of the laser output was investigated by Cohen et al. (1990). They measured the spectrum and the visibility (absolute value of the field-autocorrelation function) of a laser for different values of feedback. Their analytical model required that the damping rate of the relaxation oscillations change with the amount of feedback. Hamel et al. (1992) measured the visibility of a semiconductor laser for a wider range of feedback levels and compared their results with numerical solutions of the Lang and Kobayashi (1980) equations. Hamel et a/. (1992) were able to solve the problem of feedback dependent damping (Cohen et al., 1990), but they required an unusually large value of the linewidth enhancement factor to fit theory with experiment. Sigg (1993) studied CW output power versus current characteristics of semiconductor lasers. In addition to L-I characteristics, he also reported the effect of external reflector reflectivity on the threshold current of metal-
EXTERNAL OPTICAL FEEDBACK EFFECTS I N LASERS
87
FIGURE 4. Normalized threshold current change (Allh/Ilh)as a function of the square root of the reflectivity of the external mirror (,/Re,,) in InGaAsP-InP laser diodes. (After Sigg. 1993.)
clad-ridge-waveguide (MCRW) type and buried heterostructure (BH) type laser diodes. Figure 4 shows experimentally fitted curves for the normalized threshold current reduction Alrh/lth (Zll, is the solitary laser threshold current and AI,,, is the change in threshold current due to external optical feedback) of a laser diode as a function of square root of the reflectivity of . et ul. (1996) reported experimenthe external reflector ( J R e x l ) Achtenhagen tal results on external optical feedback in complex-coupled DFB lasers. Giles et al. (1994) reported the spectral behavior of a high-power 980-nm InGaAs strained quantum-well laser diode used for pumping erbium-doped fiber amplifiers (EDFAs). They reported that 2% reflection from an external mirror caused the output spectrum to shift from 970nm to 1OOOnm. Because pump lasers used for EDFAs must meet stringent wavelength requirements, they proposed the use of external narrowband grating reflectors to control the laser emission wavelength.
B. Linewidth Reduction, Brocrdening, and Chaos Linewidth reduction has been reported by many experimenters (Patzak et ul., 1983; Kikuchi and Okoshi, 1982; Agrawal, 1984) with small amounts of feedback and proper phase matching. However, high levels of optical feedback result in relaxation oscillation and multiple external cavity mode
88
M O H A M M A D F. ALAM AND M O H A M M A D A. KARIM
operation, which in turn results in linewidth enhancement (Miles et al., 1980; Goldberg et a/., 1980, 1982; Acket et al., 1984; Osmundsen et a/., 1983). Lenstra et a/. (1985) experimentally studied the effects of higher levels of external optical feedback in detail. They observed that at high levels of feedback, the output becomes multimodal with a complex line shape that is almost insensitive to the external cavity length. As the amount of feedback increases further, spectral details in the emission line shape tend to disappear until finally a single dramatically broadened line results, with a width of the order of 25 GHz. The coherence length of the laser light was found to collapse from 10 m without feedback to about 10 mm with relatively high feedback. This phenomenon is termed coherence collapse and is observed in both Fabry-Perot and DFB laser diodes. In most cases, coherence collapse is harmful to a semiconductor laser diode. However, it may also be useful in suppressing coherent backscatter and speckle effects, for example in optical disk data storage systems. Miles et al. (1980) and Goldberg et a/. (1980, 1982) also reported linewidth broadening of the order of 20 GHz compared to the solitary laser linewidth of about 60 MHz. Li et al. (1993) carried out a detailed study of coherence collapse and found a number of new phenomena that can collectively be described as coherence collapse. These include subharmonic bifurcation (Mukai and Otsuka, 1985), self-pulsation (Park et ul., 1990), intermittent behavior (Sacher et a/., 1989), and staircase fluctuations involving random power drops followed by step-wise recoveries (Temkin et al., 1986; Henry and Kazarinov, 1986). Li et a/. (1993) demonstrated that coherence collapse can be reached via a period-doubling route (when the relaxation oscillation and external cavity modes or their harmonics were locked together), suggesting the onset of deterministic chaos, although a quasi-periodic route to chaos was observed by other experimenters (e.g., Lenstra et al., 1985). Coherence collapse has also been reported in distributed Bragg reflector (DBR) lasers by Woodward et al. (1990). For applications of laser diodes requiring narrow linewidth operation, coherence collapse places severe demands on the optical isolation of the laser diode. Spurious back reflections on the order of ( - 30 dB) of the emitted laser beam from various interfaces (like optical fibers) are sufficient to drive the laser into coherence collapse, Cho and Umeda (1984) reported that extremely high levels of optical feedback (5-10%) cause the optical field of a diode laser to be in a state of chaos. Dente et al. (1988) and Merrk et al. (1992) further investigated the transition to chaos in semiconductor lasers. Merrk et al. (1992) demonstrated that coherence collapsed state is a chaotic attractor, and with increasing feedback level, the laser undergoes a quasi-periodic route to chaos that may be interrupted by frequency locking. They obtained experimental phase portraits of the output of a laser diode to study the dynamic behavior of the
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
I
I
-6
89
0
6
OPTICAL FREQUENCY (GHz) FIGURE5. The time-averaged optical spectrum of a 1.3-pm DFB semiconductor laser under optical feedback. The central peak at zero frequency (A) represents the solitary laser oscillation frequency (internal cavity mode). The peaks B and C and their symmetrical counterparts are observed under external optical feedback (external cavity modes). The laser randomly jumps between the two external cavity modes B and C. (After Mark et al. 1992.)
relaxation oscillation. The phase portraits were found to support the concept of quasi-periodic route to chaos. A typical time-averaged optical spectrum under optical feedback is shown in Fig. 5. In the figure, A represents the central peak of the external cavity mode. There are two relaxation oscillation sidebands B and C (and their symmetrical counterparts) corresponding to two strong peaks in the intensity noise spectrum. By tuning an interferometer to different optical frequencies simultaneously, Mark et al. (1990) found that there was a high degree of anticorrelation present between the power levels of the peaks around modes B and C. This suggests that the laser jumps randomly between modes B and C . At a lower feedback level, the average time between jumps is very small, which increases for an increasing feedback level, attains a maximum, and finally drops rapidly. They argued that the final decrease in jumping time indicates that the system became chaotic. Recently, Lam et ul. (1996) studied the chaotic stability of DFB and ridge-waveguide external cavity semiconductor lasers under modulation and concluded that dynamic determinism is present at broadband chaotic state in these laser diodes. Another interesting phenomenon is the stable operation of semiconductor lasers with very strong external feedback. For example, if an antireflection (AR) coating is used to increase the relative feedback strength further, stable operation has been demonstrated in AlGaAs (Fleming and Mooradian, 1981) and InGaAsP (Wyatt and Devlin, 1983) diode lasers. Similar stability at high feedback has also been reported for InGaAsP laser diodes by Temkin el al. (1986).
90
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
C. Noise Generation The noise characteristics of semiconductor lasers under optical feedback are extremely important for such applications as optical communications, optical measurement etc. The generation of noise is the direct effect of a number of interrelated phenomena like intensity fluctuation, linewidth broadening, coherence collapse etc. as discussed earlier. External feedback results in excess noise generation at frequencies corresponding to the integral multiples of the external cavity roundtrip time (Broom et al., 1970; Salathe, 1979; Lang and Kobayashi, 1980). For the external cavity length of the order of 1 m, such noise peaks can be observed at gigahertz frequencies and are referred to as high-frequency noise. There is also a low-frequency noise in the < 100 MHz region, the frequency of which is proportional to the length of the external cavity (Hirota and Suematsu, 1979; Fujiwara et al., 1981; Morikawa et al., 1976). Temkin et al. (1986) carried out a number of experiments investigating the role of feedback intensity, bias current, and the external cavity length on several types of index-guided InGaAsP laser diodes. Figure 6 shows the experimental setup for reflection noise measurement by Temkin et al. (1986). Reflection feedback was provided by a flat front surface mirror mounted at distances between 10 and 150 cm froin the laser diode.
7 SPECTROMETER
ATENUATOR
FIGURE6. Experimental setup for reflection noise measurement of InCaAsP index-guided laser diodes. (After Temkin et d., 1986.)
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
91
Figure 7 shows the L-I characteristics with and without external optical feedback. High feedback results in threshold decrease, and a kink-shaped L-I curve. There is an inflection point in the L-l curve. In contrast to close reflector experiments (Fujiwara et a/., 198I), periodic undulations in the power output corresponding to longitudinal mode jumps were not observed for distant reflectors even at the highest feedback intensity (Ito and Kimura, 1980). The noise spectrum of the laser studied by Temkin et a/. (1986) consisted of a large number of sharp and intense peaks, equally spaced with the external roundtrip frequency. Noise intensity and spectral details were found to depend only on the bias and external cavity conditions and not on the laser structure. The overall spectrum found by Temkin et al. (1986) was very broad, extending from 0.2-6 GHz at low bias and the envelope peak was around 3 GHz. Figure 8 shows the optical spectra of an index-guided InGaAsP laser diode near threshold with and without external optical feedback. A change in the bias current causes a change of both high-
WITHOUT
CURRENT FIGURE 7. Light-current characteristics of a 1.55-pm ridge waveguide laser with and without external optical feedback. Expanded view of near-threshold region also shown. (After Temkin et a/., 1986.)
92
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
FIGURE8. Longitudinal mode spectra of an index-guided InGaAsP laser diode near threshold: (a) without external optical feedback and (b) with moderate optical feedback. Note that under external feedback, a number of side modes have power comparable to the main mode and each mode has increased linewidth. (After Temkin et a!., 1986.)
frequency and low-frequency noise peaks as shown in Fig. 9. Temkin et al. (1986) observed low-frequency noise in their studies in the 2-60 MHz frequency range. The noise intensity builds up with increasing bias current, and a maximum is observed near the inflection point of the L-I curve in Fig. 7. Above the inflection point, noise intensity rapidly decreases. They also observed that the frequency fL corresponding to the low frequency noise varies with the external cavity lifetime t as f L = u/t where a 0.08 while the frequency f, corresponding to high frequency noise varies as multiples of l/r. The same phenomenon was also reported by Morikawa et al. (1976). The noise spectra observed by Temkin et al. (1986) show a flattening effect when external feedback is increased. The individual modes are also downshifted in wavelength while their linewidths are greatly increased. External cavity modes were observed by Temkin et al. (1986) at low feedback levels
-
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
20
93
60
DIODE CURRENT (mA) FIGURE9. Bias current dependence of the first harmonic of the high- and Iow-frequency noise components. Both components follow an identically shaped curve, except that the scales are different for these two components. The left scale is for low-frequency noise while the right scale is for high-frequency noise. (After Temkin et ul., 1986.)
very close to threshold. At higher feedback and at the bias current above the inflection point in L-I characteristics in Fig. 7, the external cavity modes broadened very rapidly and could not be resolved. This broadening is similar to the coherence collapse phenomena observed by Lenstra et al. (1985). Schunk and Petermann (1989a) reported measured feedback-induced intensity noise for 1.3 pm DFB laser diodes. They compared their measurements with their theoretical predictions (Schunk and Petermann, 1988) and found that the relative intensity noise (RIN) starts to increase abruptly beyond a minimum value when external feedback power is increased from very low value of the order of - 50 to - 20 dB. Figure 10 shows a typical RIN-versus-feedback ratio plot. They also observed that the RIN depended on the index coupling coefficient K~ of an index-coupled DFB laser, and a change of the coupling strength from K L= 1.5 to KL = 3 yields an improvement by more than one order of magnitude. The RIN measurements have also been reported by Kawai et al. (1995) and Park et al. (1998). Park er al. reported the effects of external optical feedback on the power penalty of commercial DFB laser modules. They suggested that optical isolators for DFB laser modules used in 2.5Gb/s systems require an isolation ratio of
94
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARlM
-50
-40 -30 FEEDBACK RATIO (dB)
-20
FIGURE10. Relative intensity noise as a function of feedback level for a DFB semiconductor laser for two different output powers. Note the abrupt rise in RIN above a certain threshold feedback level. (After Schunk and Petermann, 1989.)
better than 54.5 dB for negligible power penalty induced by external optical feedback.
D. Regimes of External Feedback Although many authors have studied external feedback effects within a narrow range of external feedback ratios, some authors have classified the effects of external optical feedback into a number of regimes depending on various unique characteristics observed under different external feedback levels. Tkach and Chraplyvy (1986) measured the effects of feedback on the spectra of 1.5pm DFB semiconductor lasers for feedback power ratios ranging from -80dB (very weak feedback) up to -8dB (very strong feedback). They proposed five regimes of operation depending on the observed effects.
Regime I:
In this regime of extremely weak feedback, narrowing or broadening of emission line is observed, depending on the phase of the feedback. Linewidth change of the order of 30% is observable in this regime. Regime ZI: The emission line starts to show splitting that arises from
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
95
rapid mode hopping. The magnitude of the splitting depends on the strength of the feedback and on the distance to the reflector. Regime Ill: The mode hopping is suppressed and the laser operates on a single narrow line. This regime is very narrow, from -45 to - 39 dB and independent of the distance to the reflector. Regime IV: At around -40 dB, satellite modes, separated from the main mode by the relaxation oscillation frequency, start to appear. Effects are independent of distance to the reflector. These grow as the feedback increases and the laser line eventually broadens to as much as 50 GHz. This region corresponds to the coherence-collapsed region. The effects on this regime are independent of the feedback phase. Regime V: Extended cavity operation with a narrow linewidth is observed at the highest levels of feedback (usually greater than - 10 dB). Typically it is necessary to antireflection (AR) coat the laser facet to reach this regime. In this regime the laser operates as a long cavity laser with a short active region. The laser is relatively insensitive to additional external optical perturbations. Tkach and Chraplyvy (1986) also measured the feedback level where each transition occurs as a function of the distance to the external reflector; the regions are shown in Fig. 11. E. Other Phenomena When the bias current of a semiconductor laser is suddenly changed from a value below threshold to a value above threshold, there is a delay time before the laser switches on. The turn-on time varies statistically and the mean turn-on time (MTOT) is an important parameter of a laser diode. The standard deviation of MTOT is called turn-on jitter (TOJ). Low TOJ is required for high bit-rate optical communication systems. Experimental results by Simonsen (1993) reveal that relaxation oscillation sidebands due to external optical feedback can be suppressed and linewidth can be strongly reduced for CW operation of a laser diode. For large external cavity roundtrip times, TOJ increases with feedback under both weak (Langley and Shore, 1992; Langley and Shore, 1993) and moderate (Wu and Chang, 1992) optical feedback. Recent results show that the MTOT and TOJ oscillate periodically with external-cavity round-trip time of the order of picoseconds under repetitive gain switching (Hernandez-Garcia et ul., 1994). Recently, Besnard et ul. (1993) carried out an in-depth investigation of
96
MOHAMMAD F. ALAM A N D M O H A M M A D A. KARIM
"I c
m
E 9
2
K
-20
V
t
IV
Y
0
U
m n W
W LL
10
20
40
80
160 320
DISTANCE TO REFLECTOR (cm) FIGURE11. Various regions of feedback (I, 11, 111, IV, and V ) when distance to reflector and feedback ratio are vaned. (After Tkach and Chraplyvy, 1986.)
reflection-induced behavior of a semiconductor laser with a distant reflector. They reported a number of new effects, including multiple pass resonances at (c/3nL)and (c/4nL) in the noise spectrum, where n is an integer, L is the laser length, and c is the velocity of light. These resonances are observed when the optical system is slightly misaligned. They also observed noise bursts in the output signal that eventually culminated in coherence collapse. Besnard et ul. (1993) also observed subharmonic generation when a modulating signal was applied to a laser diode. They further observed switching between the first- and second-order triple-pass resonances of the external cavity. Besnard et ul. showed that most of the new effects have their origin in two distinct physical mechanisms: (i) dynamical effects: a locking of the statistically distributed intensity drops by deterministic effects; and (ii) spatial effects: breaking of symmetry of the optical beam that
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
97
propagates inside the passive external cavity, brought about by asymmetries of the geometrical configuration of the experimental setup. Mink and Verbeek (1986) observed asymmetry in the output power and in the power spectrum of the light emitted by the two facets of a laser when one of the facets is subjected to external optical feedback. Tomasi et al. (1994) observed asymmetric pulse shape in the low frequency fluctuations (LFF) of a semiconductor laser when it is subjected to external optical feedback. Nakano et al. (1991) reported the fabrication of gain-coupled DFB lasers and measured the intensity noise in such lasers. They reported that gaincoupled DFB lasers were less sensitive to external optical feedback in terms of RIN. Kurosaki et al. (1994) reported improvement in external optical feedback sensitivity in quarter-wave-shifted (QWS), index-coupled DFB lasers by moving the phase shift away from the center of the DFB laser towards one of the facets. Recently, Chuang et al. (1996) reported a complex-coupled DFB laser with current blocking grating that showed high resistance to external optical feedback effects (Wang et al., 1997). The measured relative intensity noise of the lasers was as low as - 160dB/Hz even in the presence of an external feedback level of - 15 dB. Without using an optical isolator, transmission over 235 km of fiber was demonstrated with a power penalty of only 1.55dB at a bit error rate (BER) of with this DFB laser. A partially corrugated-waveguide laser diode (PC-LD) was also recently demonstrated to be resistant to high levels of external optical feedback (Huang et al., 1996). Benoist (1996) recently investigated the possibility of optical isolation of a semiconductor laser using frequency-shifted feedback using acoustooptic interaction and reported better performance of a laser diode with frequencyshifted feedback compared to conventional feedback.
ON OPTICAL FEEDBACK Iv. THEORIES
Theoretical models for optical feedback are usually based on Lang and Kobayashi (1980) rate equations, which have proven to contain all the dominant effects observed experimentally. For weak feedback, regimes 1-11, a small signal analysis has been demonstrated by many authors to give a correct description of linewidth, spectral behavior, and modulation properties (Petermann 1988). For moderate or strong feedback, regimes 11-V,the nonlinearities must be taken into account.
98
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
A . Compound Cavity Model
Lang and Kobayashi (1980) analyzed the effect of external optical feedback when the distance of the reflector is smaller than the coherence length using a compound-cavity model. They demonstrated that external feedback can make an injection laser multistable and cause hysteresis phenomena, which follows the same mechanism as nonlinear Fabry-Perot resonator (Szoke et ul. 1969). Lang and Kobayashi used the following form of rate equation for the electric field for compound cavity laser configuration:
+ K E ( ~- T)ejn(f-T).(34)
E(t)eJnr
Here, E is the electric field, R is the laser oscillation frequency, T is the external cavity transit time, wN(n)is the diode cavity longitudinal mode resonance frequency, which is defined with an integer N as wN = Nnc/yl,, where is the active region refractive index, c is the velocity of light, and I , is the diode cavity length. G(n) is the gain of the laser medium, To is the loss of the diode cavity. The last term in Eq. (34) represents the external feedback. The coefficient K is related to cavity parameters as: K =
ca/2y1,
(35)
where parameter a, defined with the facet and external mirror reflectivities R , and R,, respectively, as (1
= (1 -
RJR
JR2)l',
(36)
is a measure of the coupling strength between the two cavities. Multiple reflections in the external cavity are neglected here. The rate equation for carrier density n is given by d dt
- n = -7n
-
G(n)lE12 + P
(37)
where P denotes the number injection rate per unit volume of excited carriers, which is related to current density J , electronic charge e, and diode active layer thickness d as P = J/ed, and y is the inverse spontaneous lifetime of the excited carriers. Under steady-state conditions, real and imaginary parts of Eq. (34) are set equal to zero. For small variations of the refractive index An, laser oscillation frequency AR, and an external parameter Ax expanded around their references values (nrra,, x r ) , solitary diode cavity resonance is expressed as:
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
99
where q, = q(n,, S2,,xr) is the reference value of q, and q, = and likewise for qn and q,. Under steady-state conditions, Lang and Kobayashi (1980) obtained the following equation from the preceding equations: qx.Ax = ( q c f f / ~ , ~ ) (COS(AS~.T) B[R - sin(AS2.z) - R ] - AC2.z)
(39)
where
is the external cavity transit time, /l= aL,/L,, L, = external cavity length, and L, = qefflU= effective diode cavity length. The factor R depends on cavity parameters and critically affects the lateral transverse mode stability in stripe geometry lasers against spatial hole burning (Lang 1979; Thompson et al., 1978). The calculated frequency versus refractive index is a multivalued function of the external parameter x, and that is why an external cavity laser is multistable. Lang and Kobayashi found that multistability arises when (1 + R~ ) % L , /L , > 1.
(40)
As R is large in semiconductor lasers, multistability is possible when the other parameters are suitably chosen. Under dynamic conditions, the stability of a stationary solution has been examined by Lang and Kobayashi by studying the time development of infinitesimal fluctuations in the field and the carrier density around it. They found that the stationary solutions of Eq. (39) for which F < 0 represent dynamically unstable (DU) states provided,
K, = K
cos(AR. Z)
(42)
tis = K
sin(AS2.z).
(43)
Lang and Kobayashi also found regions of gain-spectrumwise unstable (GU) ranges of frequencies by considering frequency dependence of gain. The laser frequency was found to be a multivalued function of the external parameter x and that can explain the multistability of the compound cavity. They also studied the dynamic response of the laser output to a small amplitude current modulation. The amplitude response curve as a function
100
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
of frequency was found to be dependent on the parameter AQ.7. The peak in the response spectrum depends on AR-7 and indicates that the relaxation oscillation in the laser output due to external feedback is suppressed when the returned light favorably interferes ( A R - 7 z 0) with the field in the diode, while it is enhanced and prolonged when A Q . z has negative values of an appreciable magnitude. Olesen ef al. (1986) and Tromborg et al. (1984) examined the influence of nonlinear dynamics on the linewidth, spectral behavior, and stability properties for a semiconductor laser with an external cavity. Olesen et al. separated the electric field equation given by Lang and Kobayashi (Eq. 34) into two separate equations for amplitude and phase of the electric field. They used the noise-driven rate equations instead of linearized small-signal equations for their simulations. Linewidth increase was found to be connected to an abrupt transition from a coherent to an incoherent state, or in other terms, a transition from a fixed-point attractor to a strange attractor (chaotic state). They determined the stability limits of a laser diode under optical feedback by considering phase condition and gain condition limits and also identified regions of dynamic instability. They showed that the ratio of linewidths with and without optical feedback in the stable region of operation is given by
Av
+ X J m cos(wz + $)}2 (44) AVO where Av is the spectral width with optical feedback, and Avo is the corresponding width without feedback. The X is the feedback parameter given by X = K T / T ~ , , where zin and T are the roundtrip times in the laser cavity and the external cavity, respectively, and is the power reflected from the external cavity relative to the power reflected from the laser mirror. Here c1 is the linewidth enhancement factor, w satisfies the phase condition - = [l
woz = w7
+ ~ J i T Z s i n ( o z + +)
(45)
and = arctan c(
(46) where coo is the solitary laser (without feedback) angular oscillation frequency. They introduced the concept of a “coherent feedback level” and showed that strong increase in linewidth is possible under certain conditions. The three rate equations for field amplitude, phase, and carrier density (Olesen et al., 1986; Mmk et al., 1992) can be solved when the initial conditions for field amplitude, phase, and carrier density are specified. However, due to the roundtrip delay time z, the amplitude and phase of the )I
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
101
field for - T < t < 0 also needs to be specified. This makes the system infinite-dimensional because time evolution actually depends on the specified values for the amplitude and phase of the field in a continuous time interval. A solution to the three equations describe a trajectory in the three-dimensional (3-D) space ( E o ,4, N ) , where the symbols represent the field amplitude, the field phase, and the carrier density, respectively. The trajectory obtained after the transients have died away constitutes the attractor. There is in general several coexisting attractors for a given feedback level. Some typical attractors are fixed point solution, limit cycle (a cyclic closed trajectory), the torus, and the strange (chaotic) attractor. Kikuchi and Lee (1987) analyzed the spectral stability of weakly coupled external cavity semiconductor lasers. They used the same type of equations as were used by Lang and Kobayashi (Eqs. (34) and (37)) and numerically solved the two equations to find the field spectrum by simulation. They concluded that the relation between the external cavity mode spacing f,, and the relaxation frequency of the carrier density fR determines the spectral stability. For a good spectral stability, it is important to satisfy the requirement that the external mode spacing is much larger than the relaxation resonance frequency of the solitary laser. Results from computer simulations suggest that the occurrence of mode hopping (Schunk and Petermann, 1988; Msrk and Tromborg, 1990), low-frequency intensity fluctuations (Mark et al., 1988), and the onset of coherence collapse (Olesen et al., 1986; Schunk and Petermann, 1988; Schunk and Petermann, 1989b) can be correctly predicted by the Lang-Kobayashi equations. For feedback levels below about -45dB one may assume that the light intensity is constant, which leaves the phase of the electric field as an independent variable of the system. This approximation leads to a potential model (Msrk and Tromborg, 1990; Msrk et ul., 1990b; Lenstra, 1991), which explains why external cavity lasers prefer to oscillate at the mode with minimum linewidth instead of the mode with minimum threshold gain. The model also predicts the experimentally observed rates of mode hopping to good accuracy. However, the potential model does not apply to the regime of coherence collapse. Mirasso and Hernandez-Garcia (1994) studied the effects of current modulation on timing jitter of semiconductor lasers in the short external cavities that are good for use in packaged laser diodes. Using the usual Lang-Kobayashi equations, they analyzed statistical properties of the turnon time of a laser diode. Besnard et al. (1993) took into account multiple feedback contributions (Hjelme and Mickelson, 1987; Favre and Le Guen, 1985) to the basic Lang-Kobayashi equations and qualitatively explained many of the new phenomena observed by them (see Section IV-B that follows).
102
MOHAMMAD F. ALAM A N D M O H A M M A D A. KARIM
B. Coherence Collapse
Lenstra et al. (1985) developed a set of equations for a laser diode under optical feedback to model the coherence collapsed state by assuming that the fluctuating phase difference 4(t) - 4(t - T ) is not very small where t is the external cavity roundtrip time. Their equations are:
d dt
1 2
- 4(t) = - aSAN(t)
d
-AZ(t) dt
=
(ZAN(t)
+ F,(t) + yS,(t)
+ F,(t) + 2ZyS,(t)
d Q: - N(t) = - 2ARAN(t)- -AZ(t) dt
(47)
SI
+ FN(t)
(49)
AZ(t) and AN(t) are the fluctuations in photon number and carrier number, respectively, a is the linewidth enhancement factor (Spano et ul., 1984; Agrawal, 1984; Henry, 1983), and 5 = Sg/SN is the differential gain (g is the stimulated emission rate or gain and N is the mean carrier number); Z is the mean number of photons in the mode. In addition, QR and ,IR are relaxation oscillation frequency and damping constant, respectively, and Fd,,F I , and FN are the Langevin forces, which model spontaneous transitions as a noise source acting on the phase, photon number, and carrier number, respectively. Here c is the velocity of light, R the power reflectivity of the laser mirror, r the fraction of power reflected back onto the laser facet, Id,eff the effective diode cavity length, and ,f is the fraction of the reflected field, which couples back into the lasing mode due to diffraction limited imaging. The quantities S, and S, are given by: s,(t) + js,(t) = ~ ( t=) e - j f ~ r [ e - I [ , ( l ) - , ( l - r l l - G(T ) l (51) where w is the mean frequency of the laser field and G is the stationary-state correlation function G(t) = (expi
-jC4@’) - 4@‘- t)l)>
(52)
where ( ) denotes averaging over t’. Using these equations, they derived the mean square fluctuations as
CA,(t)12
=
(CW + t ) - 4(”>
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
where A,
=
1;
:j
A, =
103
IG(t)12 d t
(54)
IG(t)12cos(0,t) dt
(55)
G(t) = exp( -f[I$(t)I2)
(56)
and a correlation function which self-consistently satisfies Eqs. (54) and (55) for very small lG(t)J has been derived. For such a lG(t)), the quantity -ln(G(t))/t has a slope given by n1i3
p- 8
r;/3a4/3
(1 -
9)2i3 p 3 .
(57)
The preceding expression was used for verifying experimental results on coherence collapse, and was able to explain experimentally obtained linewidth broadening. Cohen, Lenstra and coworkers (Cohen and Lenstra, 1989; Cohen et al., 1990) further studied coherence collapse by considering the light injected from the external cavity as a noise source. They obtained a statistical description (Dorizzi et al., 1987) of the collapsed state by self-consistency calculations that agree well with experiments. However, some discrepancies between theory and measurements have been detected (Cohen et al., 1990). Wang and Petermann (1991) used a similar approach to obtain an upper limit on the RIN due to optical feedback. The approach cannot, however, determine which route to chaos a laser undergoes under optical feedback. Another approach to a simplified analysis of the collapsed state is the injection locking model introduced by Henry and Kazarinov (1986). In this model, the feedback system is replaced by a laser diode exposed to injection of the stationary feedback field. Mnrrk et nl. (1988) showed that the injection locking model can explain the characteristic pattern of intensity dropouts observed in the time evolution of the intensity for low bias currents (Temkin et at., 1986; Sacher et al., 1989). Mmk et al. (1992) analyzed the stability of semiconductor lasers under optical feedback and found that within the regime of coherence collapse, the laser dynamics display the typical characteristics of chaos. They also found that two attractors associated with the same external cavity mode coexist but have different relaxation oscillation frequencies. Spontaneous emission leads to random jumping between the two attractors, which results in two strong peaks in the intensity noise spectrum. The origin of the second attractor was identified as a second Hopf bifurcation from an unstable external cavity mode. Li and McInerney (1993) showed from theoretical
104
M O H A M M A D F. ALAM A N D MOHAMMAD A. KARIM
calculations that the coherence collapsed state is a chaotic attractor with a fractal dimension of between 2 and 3, even with the inclusion of realistic spontaneous emission equation. C. Bistability under Optical Feedback
In a semiconductor laser operating close to the threshold under strong feedback, transition to chaos is preceded by random drops of the intensity, which give rise to a kink in the light-current characteristics (Henry and Kazarinov, 1986). Weak optical feedback can transform the relaxation oscillation of the solitary laser diode into self-sustained oscillations. The low-frequency fluctuations can be explained by a transient bistability (Mark et al., 1988). The bistability is caused by the competition of the external resonator mode of maximum gain reduction and another mode of smaller linewidth. The latter mode can live on a timescale of less than one roundtrip in the external cavity. Above a critical feedback level the stationary solution of the deterministic equation loses stability. One or two coexisting limit cycles with different oscillation frequencies close to the relaxation oscillation frequency Q turn up in numerical simulations in agreement with experimental findings (Mark et ul., 1990b; Merrk et al., 1992). The presence of limit cycles depends on QT where z is the round trip time in the external cavity. Starting from the Lang-Kobayshi equations, Ritter and Haug (1993) studied the bistability of limit cycles created by Hopf bifurcations from the same external cavity mode of a single mode semiconductor laser with optical feedback. They analyzed the pulsation amplitudes, frequencies, and the range of bistability under weak optical feedback. A short external cavity having a length of less than a few millimeters has been shown to be able to avoid coherence collapse regime (Mark et al., 1992; Schunk and Petermann, 1989b). A 50% increase in bandwidth of direct current modulation was also shown to be attainable for short external cavity laser diodes due to the dependence of effective differential gain on the detuning between laser diode cavity and external cavity (Lau and Yariv, 1985; Agrawal and Henry, 1988; Elenkrig et al., 1990). The effect of a short external cavity on the modulation characteristics has been studied by a number of researchers (Schunk and Petermann, 1989a,b; Lau and Yariv, 1985; Agrawal and Henry, 1988; Suris and Tager, 1985; Lau, 1988; Tromborg et al., 1984; Tager and Elenkrig, 1993). D. Mode Competition Noise Due to external optical feedback, multiple modes can oscillate simultaneously in a semiconductor laser. The nonlinear interactions among various
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
105
NORMALIZEDTHRESHOLD LEVEL DIFFERENCE FIGURE 12. Variation of the ratio of powers between two modes as a function of the threshold gain difference between the two modes. There is a region where the modal power ratio is multivalued. Bi-stability is observed in this multivalued region. (After Yamada, 1986.)
modes give rise to mode competition- the competition between various modes for the total optical power. Yamada (1986) investigated the mode competition noise in diode lasers. An example for the case of two competing modes is shown in Fig. 12. It shows a typical plot of the modal power ratio (between the two competing modes) as a function of the normalized threshold gain difference between the two modes. There is a region in which multiple values of the modal power ratio are possible for the same threshold level difference between the two modes. This can give rise to bistability as well as mode competition between the two modes. Yamada and Suhara (1990) analyzed the noise properties of semiconductor lasers from the viewpoint of mode competition. The rate equations for the mean photon number SN of mode-N and the injected electron density q were obtained by taking nonlinear interaction among lasing modes into account as (Suhara et al., 1994; Alam et al., 1997a):
106
MOHAMMAD F. ALAM A N D M O H A M M A D A. KARIM
D
= 2B
[
x j(w,
- W&f)
+ 7s1 I,,I - I , I,. -*-
-
where A , is the linear gain coefficient, 5 is the confinement factor of the optical field into the active region, a and b are the coefficients giving the gain slope and the wavelength dispersion relation, respectively, lLo is the wavelength at the gain peak, qg is the transparent level of the electron density, B and D are the gain saturation coefficients for the identical mode and different mode, respectively, due to the burning effect on energy of the laser polarization characterized by the intraband relaxation time tinfor the electron wave (Yamada, 1983), R,, is the dipole moment, n is the refractive index, and q, is an injection level characterizing the gain saturation coefficient. is another saturation coefficient due to the beating vibration of the injected electron density giving an asymmetric saturation profile on photon energy (Ogasawara and Ito, 1988; Yamada, 1989); qth is the threshold electron density, I/ is the volume of the active region, z, is the electron lifetime, I , and Ith are the transparent current level and the threshold current level, respectively, given by
and a is the linewidth enhancement factor (Henry, 1982). Gth(,) is the threshold gain level for mode N given by foL gth(N) [IEIy"(z)12 Gth(N)
where
gth(N)
=
+ IEk~'(z)12]1F~(-% y)12 d z
.&joLCIE'N"(Z)l2 + IEk-)(Z)1211~N(X, Y)12 d=
(65)
is a mode solution of the oscillation condition Eq. (28) or Eq.
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
107
(33). C is the spontaneous emission factor (Suematsu and Furuya, 1977) defined as a ratio of the spontaneous field going into a lasing field. Here 3 N ( fand ) 3Jt) are fluctuation components due to spontaneous emission. Correlation functions among these fluctuation terms are given as follows:
Here, 3,, and 3,, are the frequency components of the fluctuation terms 3 N ( t )and 3,(t), respectively. S,, and g, are the dc components of photon number and electron density, respectively. The nonlinear interactions among lasing modes are described in terms of D and H N ( M in ) Eq. (58), and are called the mode competition phenomena. Mode competition enhances fluctuations due to spontaneous emission causing excess intensity noise ) are zero for when more than one mode is present because the H N ( Mterms single-mode oscillation, but are nonzero for multimode operation. The RIN can be calculated from
where S,, is the fluctuation component of the photon number at the angular frequency R. Wu and Chang (1993b) analyzed the mode partition noise in semiconductor lasers with optical feedback for CW and dynamic operation. They used simulation techniques to study photon statistics and RIN spectra for the main and one side mode under optical feedback.
V. EXTERNAL OPTICAL FEEDBACK SENSITIVITY A distributed feedback semiconductor laser usually operates more modes with extremely narrow linewidth compared to type semiconductor lasers. Solutions of Eq. (28) or Eq. (33), give the modes of oscillation of a DFB laser. Each mode
with one or Fabry-Perot for example, has its own
108
M O H A M M A D F. ALAM A N D M O H A M M A D A. KARIM
oscillation frequency and threshold gain. Analysis of optical feedback usually begins from the oscillation condition Eq. (28) or Eq. (33). A. Sensitivity to Threshold Gain and Spectrum Favre (1987) analyzed the effect of external optical feedback on threshold gain, resonant frequency, and spectral linewidth in DFB semiconductor lasers. Starting from the oscillation condition (Eq. (28) with different notation), Favre used linear expansion around the solitary laser mode solutions for threshold gain ct and wave vector deviation 6 (6 is related to departure of the oscillation frequency w from the Bragg oscillation frequency oBby 6 = n ( o - o,)/c where n is the mean effective refractive index and c is the velocity of light in free space). He defined an external optical feedback sensitivity parameter C as follows: AaL - j A6 L = Cre-jwr
(70)
where Aa is the change in threshold gain, A6 is the change in wave vector deviation, r is the power reflectivity of the external reflector, and 7 is the roundtrip delay in the external cavity. The C parameter is a complex quantity and depends only on the solitary DFB laser modal characteristics. This parameter is a measure of how strongly a change in the reflectivity of one of the facets affects threshold gain and oscillation frequency of the DFB laser. Favre showed that for a DFB laser, the C parameters for the left and right facets are related by
where P, and P, are the transmitted powers through the right and left facets, respectively, and Cr and C, are the C-parameters for right and left facets, respectively. In case of Fabry-Perot type laser diodes, the C-parameter for a facet under optical feedback was found to be
where p is the reflectivity of the facet without optical feedback. For a DFB laser with its left facet antireflection (AR) coated, the C-parameters for the right and left facets, C, and C,, respectively, were derived by Favre as:
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
109
where qo = u0 - j S , is the value without feedback, L is the laser length, K is the index coupling coefficient, p , is the reflectivity of the right facet, and Wr(Y,)is the grating phase at the right (left) facet. A similar equation for the C parameter for an antireflection (AR) coated phase-shifted DFB laser submitted to weak external feedback was derived as:
c = -(Y0/40) S
= (1 -
+
U
'
+ p ; ) - (1 - p;02)e-YoL- ( p i - OZ)eYoL - (2~20- 02)eYoL + 2(1 - 0 2 ) ] [ 4 p t / ( i+
02)(1
7'= [O'e-Yo" U
S T
= po(pi - OZ)eYOL - ( 1
- pit)2)e-y0L/po
(75)
-jsy + u2.
(76)
where y is defined by y2
= (E
A mode solution of the oscillation condition (Eq. (28), for example) represents a point ( E ~ 6,) , in the a-6 plane and defines the corresponding yo obtained from Eq. (76). Other parameters appearing in Eq. (75) are as follows: po = ( - y o + a0 - j d 0 ) / j ~qo , = a0 - jS,, and O = e-jn where R is the corrugation initial phase at the phase shift position. Using the foregoing analysis, Favre's 1987 study of index-coupled conventional (without any phase-shift) DFB lasers with a cleaved right facet and an AR-coated left facet (CL-AR) having low coupling strengths ( K L< 1) showed that the sensitivity to optical feedback for these DFB lasers through their cleaved facet is similar to that of Fabry-Perot lasers. A conventional CL-AR DFB laser with K L= 4 was found to be about five times less sensitive to optical feedback through the cleaved facet than Fabry-Perot lasers. Considering feedback through the AR-coated facet, Favre found that DFB lasers are ten times more sensitive to feedback than Fabry-Perot lasers for K L = 0.35, and the DFB laser is less sensitive through the AR-coated facet than a Fabry-Perot laser only for coupling strength uL > 4. Favre also analyzed the phase-shifted index coupled DFB laser and found that for large values of KL,the optimum phase shift for less sensitivity is 2R = n, which is the quarter-wave-shifted (QWS) DFB laser. A plot of the absolute value of the C parameter for a QWS index-coupled DFB laser with AR-coated facets is shown in Fig. 13. It can be noted that QWS DFB lasers are less sensitive to optical feedback than Fabry-Perot type lasers with cleaved facets for K L > 2.5. Favre also found that the ratio of the intensity envelope at the end
110
MOHAMMAD F. ALAM AND MOHAMMAD A. KARIM
COUPLING STRENGTH (d) FIGURE13. Plot of the absolute value of the C-parameter as function of coupling strength
KL for an AR-coated QWS DFB laser. The C-parameter for a cleaved Fabry-Perot type semiconductor laser is also shown for comparison. (After Favre, 1987.)
to the intensity at the center, Z(L/2)/Z(0), also varies with K L almost identically as ICI varies with ICL.Beylat and Jacquet (1988) reported simulation results on optimum facet reflectivity for minimum sensitivity to external optical feedback. Favre (1987, 1991) used a linear gain theory, which provides steady-state lasing conditions at threshold, to analyze the external optical feedback on DFB lasers. However, it has been shown that the axial nonuniformity of carrier density, that is, longitudinal spatial hole burning (SHB) in DFB lasers is an important phenomenon (Whiteaway et al., 1989; Ketelsen et al., 1991; Phillips et al., 1992) which should be taken into account when a DFB laser operates above threshold. Wu and Chang (1993a) used an axially nonuniform carrier density to take into account the axial variations of gain and refractive index due to SHB effects above threshold. They proposed an axially averaged C-parameter and compared the external feedback sensitivity performance for AR-coated QWS DFB lasers for different injection currents. They found that increased injection current also increases the averaged C-parameter. Favre (1991) extended his analysis to complex-coupled lasers as well. He concluded that although the modal characteristics (selectivity, threshold,
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
111
etc.) of pure gain-coupled or partially gain-coupled (complex-coupled) DFB semiconductor lasers showed immunity to external feedback, the external feedback sensitivity parameter (C-parameter) for gain-coupled DFB lasers is comparable to that of index-coupled lasers. The basic parameter that mainly determines feedback sensitivity is the absolute value of the complex coupling strength ZL (2 is the complex coupling coefficient and L is the laser length). Recently, Hirono et al. (1992) proposed an analytical expression for sensitivity of DFB lasers to external optical feedback. They reported that the sensitivity is proportional to the ratio between the output power from the reflector-side facet and the magnitude of the Lagrangian of the electromagnetic field in the cavity. Hui et al. (1994) analyzed the external feedback sensitivity for complex-coupled DFB lasers above threshold, taking into account SHB effects. They found that although pure index grating and partial gain grating in DFB lasers exhibit comparable sensitivity to external optical feedback at threshold, gain grating has the effect of reducing the feedback sensitivity when the lasers operate well above threshold, especially when coupling strength RL is high.
B. Feedback Sensitivity Based on Mode Competition Theory Most of the experimentally observed phenomena, including intensity fluctuations, mode hopping, transition to chaotic state, coherence collapse, etc., usually require relatively higher levels of feedback of the order of ( - 30 dB). On the other hand, mode competition induced noise begins to (-50dB) increase at much lower feedback levels of the order of (Yamada 1986; Yamada and Suhara 1990). Based on the mode competition theory, a critical feedback ratio can be defined above which external cavity modes start to appear around the main laser modes, generating excess noise. Figure 14 shows a typical optical spectrum in the presence of external optical feedback. The semiconductor laser exhibits two groups of lasing modes under the influence of external feedback. One group consists of the internal cavity modes (p-modes), which are determined by the structural parameters of the laser cavity itself. Another group consists of external cavity modes (m-modes), which build up around each internal cavity mode with frequency separations from p-modes characterized by the distance to reflection point I,,. When the effective feedback ratio qT (see Section 11-B beginning on page 76) is small enough, a correctly designed DFB laser operates at a single longitudinal mode. On the other hand, when the external optical feedback ratio is increased beyond a certain minimum feedback ratio qT, (the critical feedback ratio), external cavity modes start
112
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
p : Internal Cavity M o d e s m : External Cavity M o d e s
P -1
p
m'
m
Hf P+l
FIGURE14. Optical spectrum of internal and external cavity modes of a DFB laser. The internal cavity modes are denoted by p-modes while the external cavity modes are denoted m and m'.
to build up with oscillation frequencies close to each of the internal cavity mode oscillation frequencies. The critical feedback ratio above which RIN increases abruptly (Schunk and Petermann, 1989b; Suhara et al., 1994) to a high value also corresponds to that particular feedback ratio below which only a single internal cavity mode around the Bragg wavelength can exist. Thus, a semiconductor laser with a higher critical feedback ratio represents a laser that can withstand higher levels of external optical without generating excess noise, or a laser that is less sensitive to external optical feedback. Suhara et al. (1994) and Alam et al. (1997a) analyzed the critical feedback ratio in index-coupled and complex-coupled DFB semiconductor lasers for various configurations of DFB laser structural parameters. They found that the critical feedback ratio depends significantly on the reflectivities of the facets as well as the phase of the grating at a facet if that facet is not antireflection (AR) coated. They also found that phase-shifted index-coupled lasers show better immunity to external optical feedback when the phase shift is near n/2, although the optimum phase shift for highest critical feedback ratio may vary slightly from n/2. The QWS laser was found to have the highest critical feedback ratio, especially at higher coupling strengths. The critical feedback ratio of a QWS laser as a function of coupling strength xiL is shown in Fig. 15. The partly gain-coupled DFB laser was found to have lower critical feedback ratio and the conventional DFB laser was found to have the lowest critical feedback ratio. For complex-coupled DFB lasers, Alam et al. (1997b) found that the critical feedback ratio varied widely when the relative strengths of index and gain coupling are changed while the total coupling strength (ELI is kept constant. Figure 16 shows the critical feedback ratio as a function of index-to-gain coupling ratio K ~ / K ,for (RL( = 1.5. This is in contrast to the
113
EXTERNAL OPTlCAL FEEDBACK EFFECTS IN LASERS
l
0
-
'
c
COUPLING COEFFICIENT X LASER LENGTH
FIGURE15. The critical feedback ratio in a QWS DFB semiconductor laser as a function of coupling strength tiiL. (After Alam et al., 1997b.)
finding of Favre (1991) where it was concluded that the relative strengths of index and gain coupling have virtually no effect on external feedback sensitivity when JRLIremains constant. Alam et al. (1997b) also analyzed external feedback sensitivity of asymmetric QWS DFB lasers, which were found to be less sensitive to external feedback experimentally (Kurosaki et al., 1994) than QWS lasers with the phase shift at the center of the laser. They found asymmetry in the reflectivity of facets combined with the asymmetry in the position of the phase-shift can increase the mode selectivity of DFB lasers as well as improve external feedback sensitivity. For higher mode selectivity as well as critical feedback ratio, the position of the phase shift has to be moved axially towards the facet with the higher reflectivity of the two. In case of a DFB laser with one facet cleaved and the other facet AR-coated, (CL-AR structure), the optimum phase shift position was found to be about one-third of the total laser length away from the cleaved facet. Increasing the reflectivity of a facet, however, reduces the yield because of the statistical variation of the corrugation phase at the facet during DFB laser fabrication.
114
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
/
10-61
10.'
'
1o.2
I
\
I
10.'
/ -
1oo
10'
INDEX-TO-GAINCOUPLING RATIO
FIGURE16. Variation of the critical feedback ratio in a complex coupled DFB laser as a ) the total coupling (?L(= 1.5. (After Alam function of index-to-gain coupling ratio ( K , / K ~when et a]., 1997a, b.)
VI. CONCLUSION As we move towards the twenty-first century, DFB semiconductor lasers
will play increasingly important roles due to ever-increasing demand for high-speed optical communications, optical data storage, and other applications that require narrow-linewidth cost-effective lightweight laser sources. We have introduced here the basic concepts of DFB laser electromagnetics and its characteristics, which are relevant to external optical feedback. We then described a number of experimentally observed phenomena in semiconductor lasers in general, although some experimental results are particular to DFB lasers. We also discussed some of the theoretical models that explain these experimental phenomena. Finally, we discussed the external optical feedback sensitivity in semiconductor lasers where most of the discussion is specific to DFB lasers.
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
115
REFERENCES Achtenhagen, M., Miles, R. O., Hardy, A,. and Reinhart, F. K. (1996). Erect of the external reflector position on the threshold current in complex-coupled D F B laser diodes, Electron. Lett., 32: 334. Acket, G. A,, Lenstra, D., den Boef, A. J., and Verbeek, B. H. (1984). The influence of feedback intensity on longitudinal mode properties and optical noise in index-guided semiconductor lasers, lEEE J . Quantum Elecron., QE-20: 1163. Agrawal. G. P. (1984). Line narrowing in a single-mode injection laser due to external optical feedback. IEEE J . Qunntum Electron., QE-20: 468. Agrawal, G . P. and Henry, C. H. (1988). Modulation performance of a semiconductor laser coupled to an external high-Q resonator, IEEE J . Quantum Electron., QE-24: 134. Aiki, K., Nakamura, M., Umeda, J., Yariv A,, Katzir, A,, and Yen, H. W. (1975). GaAs-GaAIAs distributed feedback diode lasers with separate optical and carrier confinement, Appl. Phys. Lett.. 27: 145. Alam, M. F., Karim, M. A,, and Islam. S. (1997a). Effects of structural parameters on the external optical feedback sensitivity in DFB semiconductor lasers, IEEE J . Quantum Electron., 3 3 424. Alam, M. F., Karim, M. A., and Islam, S. (1997b). Analysis of external optical feedback characteristics of asymmetric, quarter-wave-shifted, distributed feedback semiconductor lasers, Appl. Opt., 36: 4131. Benoist. K. W. (1996). Influence of external frequency shifted feedback on a D F B semiconductor laser, IEEE Phaton. Technol. Lett., 8: 25. Besnard, P., Meziane, B., and Stephan, G. M. (1993). Feedback phenomena in a semiconductor laser induced by distant reflectors, IEEE J . Quantum Electron., 2 9 1271. Beylat, J. L. and Jacquet, J. (1988). Analysis of DFB semiconductor lasers with external optical feedback, Electron. Lett., 24: 509. Bogatov, A. P., Eliseev, P. G.,Ivanov, L. P., Logginov, A. S., Manko, M. A,. Senatorov, K. Ya. (1973). Study of the single-mode injection laser, l E E E J . Quantum Electron., QE-9: 392. Bourchert, B., Stegmuller, B., and Gessner, R. (1993). Fabrication and characteristics of improved strained quantum-well GaInAlAs gain-coupled DFB lasers, Electron. Lett., 2 9 2 10. Broom, R. F., Mohn, E., Risch, C., and Salathe, R. (1970). Microwave self-modulation of a diode coupled to an external cavity, IEEE J . Quantum Elerrron., QE-6 328. Casey Jr., H. C., and Panish, M. B. (1978). Hcterostructure Lusers. New York: Academic Press. Chinone. N., Aiki, K., and Ito, R. (1978). Stabilization of semiconductor laser outputs by a mirror close to a laser facet, Appl. Phys. Lett., 3 3 990. Cho. Y. and Umeda, M. (1984). Chaos in laser oscillations with delayed feedback: numerical analysis and observation using semiconductor laser, J . Opt. Soc. Am., B 1: 497. Chuang, S. L. (1995). Physics of Optoelectronic Devices. Chapter 10, New York: Wiley. Chuang, Z. M., Wang, C. Y., Lin, W., Liao, H. H., Su, J. Y., and Tu. Y. K. (1996). Very-low-threshold, highly efficient, and low-chirp 1.55-pn complex-coupled D F B lasers with a current-blocking grating, IEEE Photon. Techno/. Lett.. 8: 1438. Cohen, J. S. and Lenstra, D. (1989). Spectral properties of the coherence collapsed state of a semiconductor laser with delayed optical feedback. IEEE J . Quantum Eletran., 2 5 1143. Cohen, J. S., Wittgrefe, F., Hoogerland, M. D., and Woerdinan. J. P. (1990). Optical spectra of a semiconductor laser with incoherent optical feedback, IEEE J . Quantum Electron., 26: 982. David, K., Morthier, C., Vankwikelberge, P., and Baets, R. (1991). Gain-coupled DFB lasers versus index-coupled and phase-shifted DFB lasers: a comparison based on spatial hole burning corrected yield, IEEE J . Quantum Electron., QE-27: 1714.
116
MOHAMMAD F. ALAM AND MOHAMMAD A. KARIM
David, K., Buus, J., and Baets, R. G. (1992). Basic analysis of AR-coated, partly gain-coupled DFB lasers: the standing wave effect, IEEE J . Quanrum Electron., QE-28: 427. Dente, G. C., Durkin, P. S., Wilson, K. A., and Moeller, C. E. (1988). Chaos in the coherence collapse of semiconductor lasers, IEEE J . Quantum Electron., 2 4 2441. Dorizzi, B., Grammaticos, B., Le Berre, M., Pomeau, Y., Ressayre, E., and Tallet, A. (1987). Statistics and dimension of chaos in differential delay systems, Phys. Rev., A 35: 328. Elenkrig, B. B., Nesterenko, A. G., and Tager, A. A. (1990). Modulation bandwidth limits for semiconductor lasers with compound selective cavities, Int. J . Optoelectron., 5: 523. Favre, F. and Le Guen, D. (1985). Spectral properties of a semiconductor laser coupled to a single-mode fiber resonator, l E E E J . Quantum Electron., QE-21: 19. Favre, F. (1987). Theoretical analysis of external optical feedback on DFB semiconductor lasers, IEEE J . Quantum Electron., QE-23 81. Favre, F. (1991). Sensitivity to external optical feedback for gain-coupled DFB semiconductor lasers, Electron. Lett., 27: 433. Fleming, M. W. and Mooradian, A. (1981). Spectral characteristics of external-cavity controlled semiconductor lasers, IEEE J . Quantum Electron., QE-17: 44. Fujita, T., Ishizuka, S., Fujito, K., Serizawa, H., and Sato, H. (1984). Intensity noise suppression and modulation characteristics of a laser diode coupled to an external cavity, IEEE J . Quantum Electron., QE20: 492. Fujiwara, M., Kubota, K., and Lang, R. (1981). Low-frequency intensity fluctuation in laser diodes with optical feedback, Appl. Phys. Lett., 3 8 217. Giles, C. R., Erdogan, T., and Mizrahi, V. (1994). Reflection-induced changes in the optical spectra of 980-nm QW lasers, IEEE Photonics Techno/. Lett., 6 903. Goldberg, L., Taylor, H. F., Dandridge, A,, Weller, J. F., and Miles, R. 0. (1980). Spectral characteristics of semiconductor lasers with optical feedback, IEEE Trans. Microwave Theory Tech., MTT-30 401. Goldberg, L., Taylor, H. F., Dandridge, A., Weller, J. F., and Miles, R. 0. (1982). Spectral characteristics of semiconductor lasers with optical feedback, IEEE J . Quantum Electron., QE-18: 555. Hamel, W. A,, van Exter, M. P., and Woerdman, J. P. (1992). Coherence properties of a semiconductor laser with feedback from a distant reflector: experiment and theory, IEEE J . Quuntum Electron., 2 8 1459. Henry, C. H. (1982). Theory of the linewidth of semiconductor lasers, IEEE J . Quuntum Electron., QE-18: 259. Henry, C. H. (1983). Theory of phase noise and power spectrum of a single-mode injection laser, IEEE J . Quantum Electron., QE-19 1391. Henry, C. H., and Kazarinov, R. F. (1986). Instability of semiconductor lasers due to optical feedback from distant reflectors, I E E E J . Quantum Electron, QE-22: 294. Hernandez-Garcia, E., Mirasso, C. R., Shore, K. A,, and San Miguel, M. (1994). Turn on jitter of external cavity semiconductor lasers, IEEE J . Quantum Electron., 30: 241. Hirono, T., Kurosaki, T., and Fukuda, M. (1992). A novel analytical expression of sensitivity to external optical feedback for DFB semiconductor lasers, I E E E J . Quantum Electron., 28: 2674. Hirota, 0. and Suematsu, Y.(1979). Noise properties of injection lasers due to reflected waves, IEEE J . Quantum Electron., QE-15: 142. Hjelme, A. R., and Mickelson, A. R. (1987). On the theory of external cavity operated single-mode semiconductor lasers, I E E E J . Quantum Electron., QE-23 1000. Hong, J., Makino, T., Lu, H., and Li, G. P. (1995). Effect of in-phase and antiphase gain coupling on high-speed properties of MQW DFB lasers, IEEE Photon. Technol. Leu., 7: 956. Huang, Y., Yamada, H., Okuda, T., Torikai, T., and Uji, T. (1996). External optical feedback
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
117
resistant characteristics in partially corrugated-waveguide laser diodes, Electron. Letr., 32: 1008. Hui, R., Kavehrad, M., and Makino, T. (1994). External feedback sensitivity of partly gain-coupled DFB semiconductor lasers, IEEE Photon. Technol. Lett., 6 897. Ikushima, I. and Maeda, M. (1978). Self-coupled phenomena of semiconductor lasers caused by an optical fiber, IEEE J . Quantum Electron., QE-14: 331. Ito, M. and Kimura, T. (1980). Oscillation properties of AlGaAs DH lasers with an external grating, IEEE J . Quantum Electron., QE-16: 69. Kapon, E., Hardy, A., and Katzir, A. (1982). The effect of complex coupling coefficients on distributed feedback lasers, IEEE J . Quantum Electron., QE-18: 66. Kawai, T., Rahwanto, A,, Kitajima, K., Mori, M., Goto, T., and Miyauchi, A. (1995). Relative intensity noise of DFB L D s with near and far end reflections, IEICE Pans. Electron., E78 1779. Ketelsen, L. J. P., Hoshino, I., and Ackerman, D. A. (1991). The role of axially non-uniform carrier density in altering the TE-TE gain margin in InGaAsP-IP DFB lasers, IEEE J . Quantum Electron., 27: 957. Kikuchi, K. and Okoshi, T. (1982). Simple formula giving spectrum-narrowing ratio of semiconductor laser output obtained by optical feedback, Electron. Lett., 18: 10. Kikuchi, K. and Lee, T. (1987). Spectral stability analysis of weakly coupled external-cavity semiconductor lasers, J. Lightwave Technol., LT-5: 1269. Kobayashi, K. (1976). Improvements in direct pulse code modulation of semiconductor lasers by optical feedback, Pans. I.E.C.E. Japun, E59 8. Kogelnik, H., and Shank, C. V. (1972). Coupled wave theory of distributed feedback lasers, J . Appl. Phys., 43 2327. Kurosaki, T., Hirono, T., and Fukuda, M. (1994). Suppression of external cavity modes in DFB lasers with a high endurance against optical feedback, IEEE Photon. Technol. Lett., 6: 900. Lam, B., Kellner, A. L., Yu, P. K., Sushchik, M. M., and Abarbanel, H. D. (1996). Chaotic instabilities in modulated external-cavity semiconductor lasers, Pror. SPIE, 2610: 13. Lang, R. (1979). Lateral transverse mode instability and its stabilization in stripe geometry injection lasers, IEEE J . Quantum Electron., QE-15: 718. Lang, R. and Kobayashi, K. (1980). External optical feedback effects on semiconductor laser properties, IEEE J. Quantum Electron., QE-16: 347. Langley, L. N. and Shore, K. A. (1992). The effect of external optical feedback on the turn-on delay statistics of laser diodes under pseudo-random modulation, IEEE Photon. Technol. Lett., 4 1207. Langley, L. N. and Shore, K. A. (1993). The effect of external optical feedback on timing jitter in modulated laser diodes, J. Lightwave Technol.. 11: 434. Lau, K. Y. and Yariv, A. (1985). Detuned loading in coupled cavity semiconductor laserseffect on quantum noise and dynamics, IEEE J. Quantum Electron., QE-21: 121. Lau, K. Y. (1988). Efficient narrow-band direct modulation of semiconductor injection lasers at millimeter wave frequencies of 100GHz and beyond, Appl. Phys Lett., 5 2 2214. Lenstra, D., Verbeek, 9. H., and den Boef, A. J. (1985). Coherence collapse in single-mode semiconductor lasers due to optical feedback, IEEE J . Quantum Electron., QE-21: 674. Lenstra, D. (1991a). Feedback noise in single mode semiconductor lasers, SPIE Proc., 1376: 245. Lenstra, D. (1991b). Statistical theory of the multi-stable external-feedback laser, Opt. Commun., 81: 209. Li, G. P., Makino, T., Moore, R., and Puetz, N. (1992). 1.55 pm index/gain coupled DFB lasers with strained layer multi-quantum-well active grating, Elecrron. Lett., 28: 1726. Li, H., Ye, J., and Mclnerney, J. G. (1993). Detailed analysis of coherence collapse in semiconductor lasers, IEEE J . Quantum Electron., 2 9 2421.
118
MOHAMMAD F. ALAM AND MOHAMMAD A. KARIM
Lowery, A. J., and Novak, D. (1994). Performance comparison of gain-coupled and indexcoupled DFB semiconductor lasers, I E E E J . Quantum Electron., 30: 2051. Luo, Y., Nakano, Y.. Tada, K., Inoue, T., Hosomatsu, H., and Iwaoka, H. (1990). Purely gain-coupled distributed feedback semiconductor lasers, Appl. Phys. Lett., 56: 1620. Luo, Y., Nakano, Y., Tada, K., Inoue, T., Hosomatsu, H., and Iwaoka, H. (1991). Fabrication and characteristics of gain-coupled distributed feedback semiconductor lasers with a corrugated active layer, IEEE J . Quantum Electron., 27: 1724. Miles, R. O., Dandridge, A., Tveten, A. B., Taylor, H. F., and Giallorenzi, T. G. (1980). Feedback-induced line broadening in CW channel-substrate planar laser diodes, Appl. Phys Lett., 37: 990, Mink, J. and Verbeek, B. H. (1986). Asymmetric noise and output power in semiconductor lasers with optical feedback near threshold, Appl. Phys. Lett., 48: 745. Mirasso, C. R. and Hernindez-Garcia, E. (1994). Effects of current modulation on timing jitter of single-mode semiconductor lasers in short external cavities, IEEE J . Quantum Electron., 30: 2281. Mitsuhashi, Y., Morikawa, T., Sakurai, K., Seko. A,, and Shimada, J. (1976). Self-coupled optical pickup, Opt. Commun., 1 7 95. Morikawa, T., Mitsubishi, Y., Shimada, J., and Kojima, Y. (1976). Return-beam induced oscillations in self-coupled semiconductor lasers, Electron. Lett., 12: 435. Msrk, J., Tromborg, B., and Christiansen, P. L. (1988). Bistability and low-frequency fluctuations in semiconductor lasers with optical feedback: a theoretical analysis, IEEE J . Quuntum Electron., 24: 123. Mark, J., Mark, J., and Tromborg, B. (1990a). Route to chaos and competition between relaxation oscillations for a semiconductor laser with optical feedback. Pliys. Rev. Lett., 65: 1999. Msrk, J. and Tromborg, B. (1990). The mechanism of mode selection for an external cavity laser, IEEE Plioton. Technol. Lett., 2: 21. Msrk, J., Semkow, M., and Tromborg, B. (1990b). Measurement and theory of mode hopping in external cavity lasers, Eleciron. Leit., 26: 609. Mark, J., Tromborg, B., and Mark, J. (1992). Chaos in semiconductor lasers with optical feedback: theory and experiment, IEEE J . Quantum Electron., 28: 93. Morthier, G., Vankwikelberge, P., David, K., and Baets, R. (1990). Improved performance of AR-coated DFB lasers by the introduction of gain coupling, IEEE Photon. Techno/. Lett., 2: 170. Mukai, T. and Otsuka, K. (1985). New route to optical chaos: Successive subharmonicoscillation cascade in a semiconductor laser coupled to an external cavity, Phys. Rev. Lett., 55: 1711. Nakano, Y., Deguchi, Y., Ikeda, K., Luo, Y. and Tada, K. (1991). Reduction of excess intensity noise by external reflection in a gain-coupled distributed feedback semiconductor laser, l E E E J . Qucintum Electron., 2 1 1732. Nakano, Y., Uchida, Y., and Tada, K. (1992). Highly efficient single longitudinal-mode oscillation capability of gain-coupled distributed feedback semiconductor lasers -advantage of asymmetric facet coating, IEEE Photon. Technol. Lett., 4 308. Ogasawara, N. and Ito, R. (1988). Longitudinal mode competition and asymmetric gain saturation in semiconductor injection lasers 11. Theory, Jupcan J . Appl. Phys., 27: 615. Olesen, H., Osmundsen, J. H., and Tromborg, B. (1986). Nonlinear dynamics and spectral behavior for an external cavity laser, IEEE J . Quantum Electron., QE-22: 762. Osmundsen, J. H., Tromborg, B., and Olesen, H. (1983). Experimental investigation of stability properties for a semiconductor laser with optical feedback, Electonics Lett., 19: 1068. Park, J. D., Seo, D. S., Mclnerney, J. G., Dente, G. C., and Osinski, M., (1989). Low frequency
EXTERNAL OPTICAL FEEDBACK EFFECTS I N LASERS
119
self-pulsations in asymmetric external-cavity semiconductor lasers due to multiple feedback effects, Opt. Lett., 14 1054. Park, J. D., Seo, D. S., and McInerney, J. G. (1990). Self-pulsations in strongly-coupled asymmetric external-cavity semiconductor lasers, IEEE J . Quantum Electron., 26: 1353. Park, K. H., Lee, J. K., Han, J. H., Cho, H. S., Jang, D. H., Park, C. S., Pyun, K. E., and Jeong, J. (1998). Effects of external optical feedback on the power penalty of DFB-LD modules for 2.5 Gb/s optical transmission systems, Optical arid Quuntuni Electron., 30: 23. Patzak, E., Olesen, H., Sagimura, A,, Saito, S., and Mukai, T. (1983). Spectral linewidth reduction in semiconductor lasers by an external cavity with weak optical feedback, Electron. . h / t . . 19: 938. Petermann, K. (1988). Laser Diode Modttltrrion und Noise. Dordrecht: Kluwer Academic. Phillips, M. R., Darcie, T. E., and Flynn, E. J. (1992). Experimental measure of dynamic spatial-hole burning in DFB lasers. IEEE Photon, Tec/7nOl. Lett., 4: 1201. Risch, Ch. and Vouniard, C. (1977). Self-pulsation in the output intensity and spectrum of GaAs-AIGaAs C W diode lasers coupled to a frequency selective external optical cavity, J . Appl. Phys., 4 8 2083. Ritter, A. and Haug, H. (1993). Theory of bistable limit cycle behavior of laser diodes induced by weak optical feedback, IEEE J. Qiiantum Electron., 29: 1064. Sacher, J., Elsaesscr, W., and Goebel, E. 0. (1989). Interniittcncy i n the coherence collapse of a semiconductor laser with external feedback. Phys. Rei,. Lrrr., 63: 2224. Salathe, R. P. (1979). Diode lasers coupled to external resonators, Appl. Phys., 20: 1. Schunk. N. and Petermann, K. (1988). Numerical analysis of the feedback regimes for a single-mode semiconductor laser with external feedback, IEEE J . Quuntum Electron., QE-24: 1242. Schunk, N. and Petermann, K. (1989a). Measured feedback-induced intensity noise for 1.3 pm DFB laser diodes, Electron. Lett., 25: 63. Schunk, N. and Petermann. K. (1989b). Stability analysis for laser diodes with short external cavities, IEEE Photon. Teclinol. Lett., I: 49. Seo, D. S., Park, J. D., Mclnerney, J. G., and Osinski, M . ( 1 9 8 8 ) . Effects of feedback asymmetry in external-cavity semiconductor laser systems, Eleciron. Lett., 2 4 726. Sigg, J. (1993). ENects of optical feedback on the light-current characteristics of semiconductor lasers, IEEE J . Quuntum Electron., 29: 1262. Simonsen, H. (1993). Frequency noise reduction of visible InGaAsP laser diodes by different optical feedback niethods, IEEE J . Quantum Electron.. 29: 877. Spano, P., Piazzolla, S., and Tamburrini, M. (1984). Theory of noise in semiconductor lasers in the presence of optical feedback, IEEE J . Quuntum Electron., QE-20: 350. Suematsu, Y. and Furuya, K. (1977). Theoretical spontaneous emission factor of injection lasers, Trans. I.E.C.E. Japan, E60: 467. Suhara, M., Islam. S., and Yamada, M. (1994). Criicrion of external feedback sensitivity in index-coupled and gain-coupled DFB semiconductor lasers to be free from excess intensity noise, IEEE J. Quantum Electron., 30: 3. Suris, R. A. and Tager, A. A. (1985). Influence of the carrier-density dependence of the refractive index on the emission spectrum of an injection laser, Sou. Phys. Semicond.. 19: 266. Szoke, A., Daneu, V., Goldhar, J., and Kurnit, N. A. (1969). Bistable optical element and its applications. Appl. Phys. Lett., 15: 376. Tager, A. A. and Elenkrig, B. 8.(1993). Stability regimes and high-frequency modulation of laser diodes with short external cavity, IEEE J. Quuntum Electron.. 29: 2886. Temkin, H., Olsson, N. A,, Abeles, J. H., Logan, R. A,, and Panish, M. B. (1986). Reflection noise in index-guided InGaAsP lasers, I E E E J . Quaniurn Elrmolt., QE-22 286. Thompson, G. H. B., Lovelace. D. F., and Turlcy, S. E. H . (1978). Kinks in the lightjcuirent
120
MOHAMMAD F. ALAM AND MOHAMMAD A. KARIM
characteristics and near field shifts in (GaA1)As heterostructure stripe lasers and their explanation by the effect of self-focusing on a built-in optical waveguide, IEEE J . Solid State and Electron Devices, 2 12. Tkach, R. W. and Chraplyvy, A. R. (1986). Regimes of feedback effects in 1.5-pm distributed feedback lasers, J. Lightwave Technol., LT-4: 1655. Tomasi, F., Cerboneschi, E., and Arimondo, E. (1994). Asymmetric pulse shape in the LFF instabilities of a semiconductor laser with optical feedback, IEEE J . Quantum Electron., 30: 2217. Tromborg, B., Osmundsen, J. H., and Olesen, H. (1984). Stability analysis for a semiconductor laser in an external cavity, I E E E J . Quantum Electron., QE-20: 1023. Tsang, W. T., Choa, F. S., Wu, M. C., Chen, Y. K., Logan, R. A,, Chu, S. N. G . , and Sergent, A. M. (1992a). Semiconductor distributed feedback lasers with quantum well or superlattice gratings for index or gain-coupled optical feedback, Appl. Phys. Lett., 60: 2580. Tsang, W. T., Choa, F. S., Wu, M. C., Chen, Y. K., Logan, R. A,, Sergent, A. M., and Burrus, C. A. (1992b). Long-wavelength InGaAsP/InP distributed feedback lasers incorporating gain-coupled mechanism, IEEE Photon. Technol. Lett., 4: 212. Twu, Y., Parayanthal, P., Dean, B. A., and Hartman, R. L. (1992). Studies of reflection effects on device characteristics and system performances of 1.5 pm semiconductor DFB lasers, .I. Lighrwave Technol., 10: 1267. Voumard, C., Salathe, R., and Weber, H. (1977). Resonance amplifier model describing diode lasers coupled to short external resonators, Appl. Phys., 12: 369. Wang, C. Y., Chuang, Z. M., Liao, H. H., Lin, W., Tu, Y. K., and Lee, C. T. (1997). Resistance to external optical feedback of low-chirp strained-quantum-well complex-coupled distributed-feedback laser, Japanese J . Appl. Phys., Part-I 36: 2685. Wang, J. and Petermann, K. (1991). Noise analysis of semiconductor lasers within the coherence collapse regime, IEEE J . Quantum Electron., 2 7 3. Whiteaway, J. E. A., Thompson, G. H. B., Collar, A. J., and Armistead, C. J. (1989). The design and assessment of 4 4 phase-shifted DFB laser structures, I E E E J . Quantum Electron.. 25: 1261. Woodward, S. L., Koch, T. L., and Koren, U. (1990). The onset of coherence collapse in DBR lasers, IEEE Photon. Technol. Lett., 2: 391. Wu, H. and Chang, H. (1992). Turn-on jitter in semiconductor lasers with moderate reflecting feedback, IEEE Photon. Technol. Lett., 4 339. Wu, H. and Chang, H. (1993a). Analysis of external optical feedback on distributed-feedback semiconductor lasers above threshold, IEEE Photon. Technol. Lett., 5: 1168. Wu, H. and Chang, H. (1993b). Mode partition in semiconductor lasers with optical feedback, IEEE J . Quantum Electron., 2 9 2154. Wyatt, R. and Devlin, W. J. (1983). 10 kHz linewidth 1.5-pm InGaAsP external cavity laser with 55 nm tuning range, Electron. Lett., 19 110. Yamada, M. (1983). Transverse and longitudinal mode control in semiconductor injection lasers, IEEE J . Quantum Electron., QE-19: 1365. Yamada, M. (1986). Theory of mode competition noise in semiconductor injection lasers, IEEE J . Quantum Electron., QE-22: 1052. Yamada, M. (1989). Theoretical analysis of nonlinear optical phenomena taking into account the beating vibration of the electron density in semiconductor laser, J . Appl. Phys., 6 6 81. Yamada, M. and Suhara, M. (1990). Analysis of excess noise induced by optical feedback in semiconductor lasers based on mode competition theory, Trans. I.E.I.C.E. Japun, E73: 77. Zhang, L. M. and Carroll, J. E. (1993). Enhanced AM and FM response of complex coupled DFB lasers, IEEE Photon. Technol. Lett., 5: 506.
ADVANCES IN IMAGING AND ELECTRON PHYSICS. VOL. 107
Atomic Scale Strain and Composition Evaluation from High-Resolution Transmission Electron Microscopy Images A. ROSENAUER and D. GERTHSEN Luborutory for Electron Microscopy, University of Kurlsruhe, 76128 Kurlsruhe, Gerniany
I. Introduction
.
.
,
. . . . . .
.
. . . . . . .
11. Strain-State Analysis . . . . . . . .
. .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . .
A. The Measurement of Displacements and Lattice Spacings on an Atomic Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Determination of the Sample Thickness , . . . . , , . . , . . . . . , C. Consideration of the Elastic Relaxation . . . . . . . . . . . . , . . . 111. Composition Evaluation by Lattice Fringe Analysis . . . . . . . . . . . . . A. The Basic Idea behind Composition Evaluation by Lattice Fringe Analysis Method.. , . . , . . . . . . . . . , . . . . . . . . . . . . . . B. The Fringe Images . . . . . . . . . . . . . . . . . . . . . . . . . C. Theoretical Considerations . . . . . . . . . . . . . . . . . . . . . . D. Determination of Sample Thickness and Phase x. . . . . . . . . . . . . E. The Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . . F. Correction of Imaging Conditions Varying Across the Image . . . . . . . G. Errors of the Composition Detection Due to Sample Thickness Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . 1V. Applications . , . . . . . . . . . . , . . . . . . . . . . . . . . . . A. Strain-State Analysis of Zn,Cd, _,Se/ZnSe Heterostructures . . . . . . . B. In,Ga, -,As/GaAs Stranski-Krastanow Islands . . . . . . . . . . . . . C . Strain State Analysis of an Array of Misfit Dislocations . . . . . . . . . D. Composition Evaluation by Lattice Fringe Analysis Method Evaluation of a CdSe/ZnSe(OO1) Heterostruct ure . . . . , . . . . . . . . . . . . . . V. Summary and Discussion of the Atomic Scale Analysis Methods . . . . . . . Appendix A: List of Variables . . . . . . . . . . . . . . . , . . . . . .
121 125 125 137 145 154 154 160 161 167 170 175 177 182 182 196 207 2 14 222 225
I. INTRODUCTION Regardless of the high degree of development of the transmission electron microscopes that allows high-resolution images of almost all materials to be obtained, the extraction of quantitative information requires considerable additional effort. Digital cameras, in particular charge-coupled device (CCD) cameras with pixel resolution of at least 1024 x 1024, are an important prerequisite to the processing of high-resolution transmission electron microscopy (HRTEM) images without distortions and with a linear contrast transfer between electron intensity and gray levels. As far as software 121 Volume I07 ISBN 0-12-014749-1
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright R 1999 by Acddcmic Prew All righlc of reproduction in any form reserved lSSN 1076-5670/99 $30 00
122
A. ROSENAUER AND D. GERTHSEN
is concerned, there are well-developed high-resolution image simulation program packages (Stadelmann, 1987; McTempas; the NCEM Simulation System) available that provide understanding of the imaging condition influences on image appearance. General purpose image processing program libraries like SEMPER (Saxton et al., 1979) contains a large variety of image processing functions. However, software for specialized purposes for the HRTEM image evaluation is generally not commercially available. The present article focuses on the description of the program package digital analysis of lattice images (DALI), which was developed to quantify HRTEM image information. These programs are applied to semiconductor heteroepitaxial layers where the strain state and the composition on an atomic scale are of particular interest. However, application to other materials can be well envisaged. Highly perfected crystal growth techniques, for example, molecular beam epitaxy (MBE) and the different variants of chemical vapor deposition (CVD), allow the growth of epitaxial layers with monolayer control. However, lack of basic understanding of growth processes prevails, particularly for the three-dimensional (3-D) growth modes where isolated islands are nucleated on a continuous wetting layer covering the substrate (Stranski-Krastanow growth mode (Stranski and Krastanow, 1939)) or directly on the substrate (Volmer-Weber growth mode (Volmer-Weber, 1974)). In this context, segregation and interdiffusion effects must be investigated on an atomic-scale spatial resolution. The 3-D growth modes are used to obtain self-organized nanostructures-“quantum dots”-whose optical and electronic properties are intensively studied by many groups; further, correlation with structural and compositional properties is required. The basis of one possible approach to solving the task of determination of strain and composition on an atomic scale is measurement of local lattice parameters, that is, measurement of the distance between adjacent atomic columns. This simply requires the detection of the intensity maxima positions in a high-resolution image that can be considered a fingerprint of the local lattice parameter. It is not necessary to know the actual position of the atomic columns with respect to the intensity maxima position if the TEM specimen thickness does not change significantly and composition-insensitive imaging conditions are chosen so as to avoid chemical shifts of the contrast pattern. Local composition can be extracted directly if the relationship between composition and lattice parameter is known. For many compound semiconductors, for example, In,Ga, -,As and Cd,Zn, -,Se, Vegard’s law (Eq. (1)) can be applied where the lattice parameter and the composition are linearly correlated: a A , B ~ - , C = uBC + x ( U A C - uBC). (1)
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
123
If a lattice mismatch f = (af - a,)/uJ exists between the lattice parameters of the substrate a, and the epilayer as, the distortion of the unit cells in the epilayer must be taken into account. The tetragonal distortion can be easily calculated for coherently strained two-dimensional (2-D) layers below the critical thickness for the plastic relaxation by misfit dislocations, as seen in the work by Hull and Bean (1992). Measurement of the local lattice parameters was done previously by Bierwolf et al. (1993) and Jouneau et al. (1994) to investigate the strain distribution of thin epitaxial layers. Robertson et at. (1995) used Fourier-filtered HRTEM images to measure the distance and lattice fringe deviations. The situation becomes more difficult for 3-D growth modes. Deviations from tetragonal distortion occur close to the surface due to the elastic relaxation of the strained lattice. To compute the strain distribution in nanoscaled SiGe islands on Si(OO1) substrates the finite element method (FEM) was first applied by Christiansen et al. (1994). A complete relaxation of the misfit strain close to the top surface is obtained. This is the major driving force for island growth. An accurate knowledge of strain distribution is therefore a necessary prerequisite for composition evaluation that is based on the measurement of the local lattice parameters in epitaxial islands. Another question to be addressed is the elastic relaxation of strained structures due to the small HRTEM specimen thickness (typically 20 nm at most), which can modify significantly tetragonal distortion depending on local specimen thickness and the dimension of the strained structures. Specimen thickness must be accurately measured in the region of interest of the evaluated HRTEM image. A further important step in quantification is the calculation of elastic relaxation as a function of TEM specimen thickness and layer morphology. Analytical solutions to this problem exist for simple layer stuctures, as was shown by Treacy and Gibson (1986). For more complicated morphologies, FEM simulations can be applied (Tillmann et al., 1996). Due to modification of local lattice parameters by both local specimen thickness and elastic strain relaxation in the islands, a different and less elaborate approach would be desirable for composition determination. A solution for the TnGaAs system, presented here, can be extended to other ternary compound semiconductors such as CdZnSe and AlGaAs. The approach is based on the evaluation of Fourier amplitudes of lattice fringe images (CELFA: composition evaluation by lattice fringe analysis), which is shown to depend sensitively on indium concentration (Rosenauer et ul., 1998). The enhancement of chemical contrast under off-axis imaging conditions was previously applied by Jia et al. (1993). This chapter is organized in the following way: Chapter I1 presents details of strain-state analysis. The measurement of displacements and lattice
124
A. ROSENAUER A N D D. GERTHSEN
spacings on an atomic scale (Section 11-A) includes the noise reduction (Section 11-A-l), detection of lattice sites and subdivision into image unit cells (Section II-A-2), calculation of lattice base vectors (Section II-A-3), the calculation of local (Section 11-A-4) and averaged (Section 11-A-5) displacements and lattice spacings. The thickness measurement is outlined in Section 11-B; cell transformation is found in Section 11-B-1 and the determination of relative and absolute thickness values is found, respectively, in Sections 11-B-2 and 3. The elastic relaxation of the thin TEM specimen is considered in Section 11-C, which is subdivided into the analytical solution of the thinand the thick-sample limit (Section 11-C-1) and the finite element simulations (Section 11-C-2) to quantify composition. Chapter 111 outlines the composition evaluation by lattice fringe analysis (CELFA) procedure. First, the basic idea behind CELFA is explained in Section 111-A, followed by a discussion of the fringe images and the theoretical background of their formation found, respectively, in Sections 111-B and C. The analysis of the contrast patterns of the images of a defocus series leads to the determination of sample thickness (Section 111-D), which knowledge is necessary for the evaluation procedure explained in Section 111-E. The approximate correction of the effect of imaging conditions or sample thickness varying across the image is shown in Section 111-F. Finally, Section 111-G provides an estimation of the errors of composition detection that are due to uncertainties in sample thickness measurement. Chapter IV is concerned with the presentation of selected evaluation examples. Section IV-A begins with the strain-state analysis of a variety of Cd,Zn, -,Se/ZnSe heterostructures with different layer thicknesses and Cd contents that help explain the determination of local concentrations and the overall amount of CdSe that was deposited. This section also shows that the results of strain-state analyses are in good agreement with in situ reflection high-energy electron diffraction (RHEED) measurements (see Section IV-A1). Section IV-A-2 deals with the determination of diffusion coefficients for the diffusion of Cd in ZnSe in the temperature range 330-400°C. The investigation of In,Ga, -,As/GaAs Stranski-Krastanow islands is described in Section IV-B-1, which deals with free-standing islands (Section IV-B-2) and islands capped with 10-nm GaAs (Section IV-B-2). It is shown that all analysis methods, which include strain-state analysis, thickness measurement, finite element calculations, conventional (002)-dark-field imaging, in tandem with application of the CELFA method cooperate to yield a consistent and convincing image of the morphology and compositional inhomogeneities of the specimen. Section IV-C demonstrates that strainstate analysis is also applicable to interfaces containing an array of misfit dislocations. Finally, Section IV-D provides a further example of application of the CELFA method, which consists of an evaluation of a CdSe/ZnSe(OOl)
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
125
heterostructure. This section also contains a discussion of the effect of crystal tilt around an axis parallel to the interface plane on the evaluated concentration profile, which turns out to be negligible under certain conditions. 11. STRAIN-STATE ANALYSIS A . The Measurement of Displacements and Lartice Spacings on an Atomic Scale
We describe here the measurement of local and averaged lattice parameters and displacements. Our method is similar to those suggested by Bierwolf et al. (1993) and Brandt et al. (1992), Paciornik et al. (1993), Seitz et al. (1995), and Jouneau et al. (1994). It contains the following analysis steps: 1. 2. 3. 4.
Noise reduction; Detection of lattice sites and gridding; Calculation of lattice base vectors; and Analysis of displacements and lattice spacings.
We will apply here individual analysis steps to the HRTEM micrograph depicted in Fig. 1, which shows an In,Ga,-,As island on a GaAs(OO1)
FIGURE1. (110) HRTEM image of an In,Ga, -,As/GaAs (001) Stranski-Krastanow island containing the grid that connects the local brightness maxima of the dumbbells. The marked area of interest (AOI, blue frame) is used for the determination of the In-concentration inside the island. The reference area (green frame) is used for the calculation of the basis vectors of the reference lattice. (See also Plate 11.)
126
A. ROSENAUER AND D. GERTHSEN
substrate. The cross-sectional image was taken along the [l lo] projection. The nominal In-content was 60%, the nominal In,Ga, -,As layer thickness was 1.5nm, and the growth temperature during MBE was 500°C. This micrograph will also serve as an application example for description of the measurement of local sample thickness and for finite element modeling. We will acquire a model of the local In-contents inside the island. This evaluation example was chosen specifically because the determination of local In-contents in the “bulk” of the island cannot easily be performed using other methods on an atomic-scale spatial resolution. The reason is that lattice parameters (parallel as well as perpendicular to the interface plane) change inside the island due to the elastic relaxation of the island at its free surfaces. This circumstance excludes methods in which a latticeparameter fluctuation may affect the In-content measurement (this is also the case for the lattice fringe analysis that will be outlined in Section 111). 1. Noise Reduction
Images digitized with either an off-line or an on-line CCD camera attached directly to the microscope contain some noise. One source of noise is thin amorphous layers both on the top and on the bottom surface of the sample, which are formed during the ion-milling step of sample preparation for the TEM (as seen in Schuhrke et a/., 1992). Other possible sources are the grain of photographic negative film emulsion and electronic noise of the device used for the digitizing processes. For noise reduction, we use a Wienerfiltering technique (Press et al., 1992) where the noise level is estimated locally in the Fourier-transformed image C . It consists of the undisturbed signal S and the noise part N :
C=S+N.
(2)
Noise reduction is carried out by applying a filter 0 to the Fouriertransformed image C :
C=cx0.
(3)
C is the Fourier transform of the noise-reduced image if
else where yx and gy are the spatial frequencies. It is appropriate to note that the filter CD calculated according to Eq. (4) is often called the “optimum” or
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
127
“conventional” Wiener filter. Other choices (e.g., the parametric Wiener filter) are compared in Marks, 1996. In Fourier-transformed HRTEM images of defect-free lattice structures the information IS/’ is predominantly localized around those spatial frequencies that correspond to lattice spacings in real space whereas the noise part INI2 has a low, slowly variable intensity. Therefore, the noise part IN\’ is estimated using the following procedure: the power spectrum ICI2 is divided into equally sized areas A , (Fig. 2). The area extension must exceed the Bragg spot extensions, which are contained in the Fourier-transformed image. Furthermore, each area is subdivided into blocks B,. For each block the intensity IBm is calculated as the maximum of all pixel intensities in B,. For each area A, the values I s , of the blocks in A , are averaged:
The weighting factors wB, ensure that the intensive Bragg peaks contribute
FIGURE2. Schematic drawing, which shows the decomposition of the Fourier-filtered image into areas and blocks. The smallest detectable unit is given by a pixel (picture element).
128
A. ROSENAUER AND D. GERTHSEN
much less (we, z 1) to the noise part estimation than blocks with low block values (we,, maxs,(I8,,,)) and that a smooth variation of the noise part inside the area is taken into account. As a result, the values I A n form a map of the noise part The noise part for each pixel is calculated by bilinear interpolation with respect to IA,,. The example in Fig. 3 demonstrates the efficiency of the described procedure (Rosenauer et ul., 1996). Figure 3a shows a part of the power spectrum ICI2 and a small part of the lattice image in the insert. The power depicted in Fig. 3b also contains a small spectrum after noise reduction part of the lattice image that results from the inverse Fourier transform of 5;.
[el2
2. Detection of Luttice Sites and Gridding
The contrast of HRTEM images is the result of the dynamical diffraction in the crystalline specimen depending on sample thickness, microscope parameters (defocusing distance, electron energy spread, beam convergence angle) and nonlinear image formation. As a consequence, the exact determination of atom positions usually requires the comparison of experimental with simulated images. We use the positions of the intensity maxima to obtain a lattice that represents the dimensions of the projected unit cells. The positions of the intensity maxima positions may correspond to the location of the columns of atoms, the tunnel sites and-sometimes-neither of the two. However, our approach does not rely on the knowledge of the positions of atomic columns with respect to the intensity maxima. It is based only on the assumption of a constant spatial relationship between the intensity maxima positions and the columns of atoms. This requirement is often fulfilled in small investigated areas with insignificant change in specimen thickness. The formation of the 2-D grid is performed in five steps: 1. Finding the positions M ( ' ) of pixels corresponding to local brightness maxima (Fig. 4a). 2. Fitting parabola to the intensity profiles along two lines L , and L, across M ( ' ) that run along the x- and y-directions, yielding a more accurate position M"' (Fig. 4b). 3. Fitting parabola to the intensity profiles along four lines L , to L, (Fig. 4c) across M'". Averaging of the parabola's maxima positions results in the final position M ( 3 ) . 4. Formation of grid lines by connecting positions along each of two selected directions (Fig. 4d, e). 5. Continuously numbering the grid lines, separately for each of the two sets of lines (Fig. 4d,e), yielding the 2-D grid where a pair of indices is assigned to each position.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
129
FIGURE3. Three-dimensional plots of the intensities of the Fourier-transformed images of (a) the original image, and (b) the image after Wiener noise reduction. The inserts show the corresponding lattice images.
FIGURE 4. Schematic drawing showing the procedure of the lattice site determination. In the first step (a) brightness maxima positions M’), marked by crosses, have to be found. Intensity profiles along the straight lines L , and L, are fitted by parabolas. One of these is shown in (b) as an example. Their vertices (M$’I, M!?) give the position with improved accuracy. The same procedure is applied to intensity profiles along L,.2,3,4 (c) for further accuracy enhancement yielding the final position M3).The detected positions form a 2-D grid whose grid lines are numbered continuously with respect to their points of intersection (marked with rectangles for the set of horizontal grid lines and with squares for the vertical ones) with two lines x and y (d). The dot marks the chosen point of intersection of the lines x and 4’;(e) shows the numbering for grid lines where each of them connects positions that belong t o the same [ 11 I plane; (f) shows the resulting indexing for the gridding in the case (d).
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
131
In the first step, care has to be taken that local brightness maxima between the “main” maxima positions are excluded. An enhanced intensity between the “main” bright spots of the HRTEM image are often observed under imaging conditions close to the “half spacing” contrast where the tunnel positions as well as the positions of the rows of atoms show similar brightness. To avoid these “artificial” positions a minimum distance between adjacent positions is defined. If a new maximum position is found during the search procedure, its distance to all other positions found previously is checked. If a distance is smaller than the predefined minimum distance, the position with lower intensity is deleted. The detection of pixels with a local intensity maximum performed in the first step is inaccurate due to residual noise and finite pixel extension. Because an intensity profile across a spot is nearly sinusoidal, it can be fitted by a parabolic curve in a region close to the estimated intensity maximum. The maximum of the parabolic curve that is fitted in the second step yields a more accurate estimate of the peak position, because not only maximum intensity is used for the position determination (which can be affected severely by noise). Instead, many intensity values around the true maximum position are used together with an approximated functional relationship. A further gain in accuracy is reached by the third step, where parabola are fitted along four lines, which is shown in F i g . 4 ~The . maxima positions of the parabola are averaged, leading to the final position W3’. For each position, the standard deviation G = (crx, cry) of the maxima positions of the four parabola is stored. A typical value for 101 is 0.2 pixels. The generation of grid lines is performed in the third step. The procedure starts with the selection of two directions d , and d2 along which the grid lines are intended to run. For each of the detected positions P its next neighbor N ’ ( N 2 ) lying in positive direction d , ( d , ) is searched (Fig. 5). The next neighbor position in direction - d , ( - d , ) is A ’ ( A 2 ) .In this way, strings of neighboring positions are formed representing two sets of grid lines, one set (1) with lines along d , and the other (2) with those along d,. In the fifth step, a point of the image is chosen that will be used as the intersection point (marked with a dot in Fig. 4d,e) of two axes x and y with the x-axis in the horizontal and the y-axis in the vertical direction of the image. The grid lines of set l(2) intersect the x-axis at an angle of ai(a:) and the y-axis at a:(.;). If la; - n/21 < la: - n/21, then the x-axis is used for the indexing of the grid lines of set 1, else the y-axis. The indexing is performed in such a way that the indices correspond to the order of the intersection points of the grid lines with the appropriate axis x or y. An analogous procedure is performed with the grid lines of set 2. The result is shown for two different choices of d , and d,: in Fig. 4d for the [ l l O ] and the [OOl] directions and in Fig. 4e for the two (1 11) directions. In this way, we obtain
132
A. ROSENAUER AND D. GERTHSEN
FIGURE5. Schematic drawings that show the positions N', N2, and P as well as the positions A' and A'. The distance arrows point from a position to its neighbor position. Therefore, only the N' and N z positions are considered to be neighbors of P , whereas P is a neighbor of both A' and A'. The parameters g1.2,3.4regulate the contribution of the individual distance arrows to the calculation of local distances: (a) corresponds to grid lines shown in Fig. 4e whereas (b) corresponds to Fig. 4d.
a 2-D grid where each lattice point P is characterized by two indices i and j . Therefore, a lattice point P will be denoted by P i , j .An example shown in Fig. 4f was obtained with the indexing shown in Fig. 4d. Note that there may be some positions such as i,j = 1,2 that do not belong to an existing lattice position Pi,j. The gridding of the evaluation example is also shown in Fig. I, where the directions d , = [ l l O ] and d2 = [OOl] were chosen. 3. Caicuiation of Lattice Base Vectors
In this section we describe the generation of lattice base vectors. They will be used in the next section for generation of a reference lattice that facilitates calculation of local displacement vectors. The lattice base vectors should be deduced in a reference lattice region without deviations from the perfect crystal structure, which is located far away from any lattice defects. The region indicated by the green rectangle in Fig. 1 is chosen as a reference area. The lattice positions inside the reference area are used to calculate two lattice base vectors a1 and a2 which correspond to the directions d , and d2.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
133
For each grid line, all those positions that lie within the reference region are used to fit a straight line; this results in two sets of straight lines. The directions 2, and a, of the lattice base vectors are calculated by averaging the gradients of the fitted straight lines. The positions of each grid line found inside the reference area are projected onto the directions 6, and i 2 .The distances between neighboring projected points are averaged for each of the sets 1 and 2, which yields the lengths of the base vectors a , and a2, respectively. It is appropriate to note that the lattice base vectors that are obtained in the described way are not understood to constitute lattice translation vectors. According to Fig. 4d, the lattice base vector parallel to to the (virtual) position the x-direction points from the position (Fig. 4f). In this case, the grid consists of two sublattices that are marked by dark and white grid lines in Fig. 4f. 4. Culculution o j Local Displacements und Luttice Spacings The lattice base vectors a , and a2 obtained in the previous section are now used to calculate reference lattice positions. The positions
Ri,j= ia,
+ j a , - a.
(6)
that form a reference lattice can be directly compared with the accompanying positions P i , j . The vector a. in Eq. (6) results from the condition that the sum of deviations Ri,j- P i q jcalculated inside the reference region (Ref. R.) vanishes:
The standard deviation of R,,j - Pi.j computed inside the reference lattice can be used to estimate the accuracy of the position determination. We define
Ref.R.
where N R e f , R , is the number of lattice positions inside the reference region. Typical values are in the order of 0.005nm. In the case of our evaluation example, 6 = 0.004 nm is obtained. Assuming a crystal lattice parameter of 0.6 nm, we can estimate an accuracy of the local lattice-parameter determination of 0.5% to 2%. The next step consists in the definition of local displacement vectors . = p 1.J. . - R 1.J . ..
1.1
(9)
For most purposes the displacement vectors u ; , have ~ to be projected onto
134
A. ROSENAUER AND D. GERTHSEN
a certain lattice direction parallel to the selected direction
Q = ka, + la,,
Q given by (10)
where the values k and 1 are small integer numbers that determine the direction on which the distance vectors will be projected. Using Eq. (lo), we obtain the projected displacements
which is consistent with the definition used in the DALI program package. Here N,, is the number of sublattices of the 2-D grid; it is 2 in the case of Fig. 4f. The choice of the grid line directions shown in Fig. 4e would lead to N,, = 1. Local lattice distances between next neighbor positions are defined by
m=1
(12) where N:,? and A:,j2 have previously been defined in Section 11-A-2. For clarification, N:,j2and A:,? as well as the position Pi,jare shown in Fig. 5a, b for the most common choices of d , and d,. The values Y , , , , ~ , ~ E- l{, O , l} regulate the contribution of the four distance vectors to Ai,j. Figure 6 depicts the local displacement vectors according to Eq. (9), and which were evaluated from the right-hand side of the island shown in Fig. 1 with the DALI program package. Due to the larger lattice parameter in the island, the lengths of the displacement vectors grow from the interface to the top. In the vicinity of the island border the displacement vectors exhibit a component parallel to the interface plane, which is due to the elastic relaxation of the strained island. The local projected displacements according to Eq. (11) are displayed in Fig. 7 as color-coded maps. In Fig. 7a, the displacement vectors are projected onto the growth direction using k = -2, I = 0 in Eq. (lo). It might be astonishing that k = -2 is chosen instead of k = 1. First, a negative value for k is selected because the indices of the horizontal grid lines in Fig. 1 increase from the top to the bottom. Hence, the lattice base vector a , points towards the bottom. In order to achieve positive values for the projected displacement vectors, Q has to be chosen pointing in growth direction. Second, k = - 2 instead of k = - 1 is
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
135
magnified 2x Experiment FEM
FIGURE 6 . Part of the displacement vector fields evaluated from Fig. 1 (drawn in red) and obtained by finite element calculation (blue) as outlined in Section 11-C-2. (See also Plate 12.)
chosen because the selection of Q occurs in the DALI program by two mouse klicks on two adjacent lattice positions, which have to be P i , j and P i - 2,j. As already discussed at the end of the previous section, the position P i - does not exist. To take the indexing into account, the normalization factor (IQI/NsL)- was introduced in Eqs. (10) and (1 1); it halves the length of Q in the case of NSL= 2. Figure 7b shows the projection of the displacement vectors onto the direction parallel to the interface plane, which was achieved by Q = 2a, in Eq. (10). Red regions indicate displacement vectors pointing to the right whereas blue regions correspond to a component to the left. 5. Calculation
05Averaged Displucements und Lattice Spucings
In some cases it is appropriate to average the local displacements and lattice spacings. In the DALI program package, the scalar values ui,j and Ai,j can be averaged either along the whole length or on only a part of the grid lines. The region in which the averaging is performed can be chosen by the selection of an “area of interest” (AOI). As an example, Fig. 8 shows averaged displacements as a function of the monolayer number along the growth direction that were obtained from the local displacements of Fig. 7a. The displacements were averaged along the horizontal grid lines inside an AOI, which is marked by a blue rectangle in Fig. 1. The vertical dashed line in Fig. 8 indicates the position of the surface besides the island. It is conspicuous in Fig. 8 that the displacements to the
136
A. ROSENAUER A N D D. GERTHSEN
FIGURE 7. Color-coded maps of the components of the displacement vector field (a) in growth direction and (b) in interface direction (a positive value indicates a displacement vector pointing to the right) deduced from Fig. 1. (See also Plate 13.)
right of the surface marker are not equal to zero as might be expected, but show a weak slope. This effect can be explained by a strain inside the GaAs buffer that is caused by the biaxially strained island; this will be verified using finite element simulations in Section 11-C-2. The previous annotation clearly shows that a quantitative interpretation of the displacement vector field requires a known correlation between the strain field and the chemical and geometric structure of the investigated sample area. One way to find a correlation is application of finite element calculations based on a sample geometry derived from the HRTEM image.
137
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION 1
0.8 c
ia
'
I
'
I
I
1
I
I
I
,
I I
C
0.6
8 m
c
Q u)
5 0.4
0.0 (002) plane number FIGURE 8. Components of the displacement vector field averaged for each monolayer in growth direction along the horizontal direction in the area of interest (AOI) of Fig. 1, marked with solid dots. The open squares are the result of the finite element method (FEM) simulations obtained for the FE-model with the best fit (see Section 11-C-2).
Whereas the projected shape of the island is directly visible in the HRTEM image, the evaluation of the local sample thickness constitutes a more complicated problem for which we will give a possible solution in the next section. B, Determination of the Sample Thickness This section deals with the determination of the local sample thickness, which is the basis for the finite-element modeling that will be described in the next section. Our approach is based on quantitative analysis of information from the transmission electron micrographs (QUANT1TEM)-procedure suggested by Schwander et ul. (1993), Ourmazd et al. (1989), Ourmazd et al. (1990, 1993), and Kisielowski et al. (1995). The QUANTITEM procedure has recently been discussed with regard to composition evaluation in ternary mixed crystals like In,Ga, -,As (Maurice er ul., 1997). QUANTITEM detects the projected crystal potential that is proportional
138
A. ROSENAUER AND D. GERTHSEN
to the sample thickness. In Kisielowski et ul. (1995) it is stated that the QUANTITEM analysis is valid to the extent that dynamical scattering in the investigated material can be described in terms of two Bloch waves. However, it is also shown in Kisielowski et u1. (1995) that the procedure can be used for 111-V semiconductors like GaAs or AlAs where three Bloch waves are excited with substantial intensity. Our implementation of the QUANTITEM-procedure as a part of the DALI program package uses decomposition of the HRTEM micrograph into image unit cells that are given by the 2-D grid from Section 11-A. The QUANTITEM method is based on the interpretation of each image unit cell as an image vector. Its dimension is given by the number of pixels included in the cell and should be equal for all cells of the image. The image unit cells from Section 11-A may differ in their sizes and angles. Therefore, the first step in the thickness determination procedure (described in the following section) is a transformation of the image cells that then provides quadratic cells of identical size 2" x 2" pixels (typically n = 5).
1. Cell Transformution
In this section we show an algorithm that transforms an irregularly shaped cell Z into the regularly shaped cell 2' (Fig. 9b). T o keep all of the information contained in cell Z , the size of cell Z' significantly exceeds the size of cell Z. For a description of the transformation procedure we have to define the four vectors u, b, c and d(u', h', c', and d') that point from the center of area M(A4') of the cell Z(Z') to the four points at its corners (where M is given by M ( x , y= ) (x, y ) dx dy/Jz dx dy). The following procedure is applied to obtain the intensity of a pixel pLm inside the regularly shaped cell Z': For each pixel pLm its midpoint pLm is described by a linear combination
sz
of the two adjacent vectors v; and v i (e.g., v;
=d
and v;
= c'
for the pixel
phmshown in Fig. 9b). The corresponding position p in the cell 2 is given by
P
= EVl
+ PV2,
(14)
which generally does not coincide with the midpoint of any pixel in Z. The vectors v1 and v2 are the two vectors in the cell Z which correspond to vf1 and v; in Z' (e.g., v l = d and v2 = c in Fig. 9b). We find four pixels prim, Pn,m+1, P , , + I . ~and pn+ l . m + l in Z whose midpoints define a square containing p (Fig. 9c). The values of these four pixels are used to calculate the intensity of the pixel pLm.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
139
FIGURE9. Schematic drawing, which explains the transformation of the original image unit cells as determined by DALI into quadratically sized lattice cells 2': (a) depicts a small part of the HRTEM image shown in Fig. 1. The detected lattice sites at the centers of the bright dots are connected by white lines. The black rectangle indicates the original cell Z ; (b) illustrates the transformation of the cell Z into the quadratic cell Z . The point [J inside the cell Z corresponds to the midpoint of the pixel pbm inside the cell Z ; (c) shows the system of coordinates used for the description of the transformation procedure. The crosses mark the midpoints of the pixels. The brightness of each pixel is described by its intensity value I,,,.
140
A. ROSENAUER AND D. GERTHSEN
For that purpose, four vectors Y, P,,+ 1, Y,+ and r,+ that point and pn+ I,,,+ 1 from p to the midpoints of the four pixels p,,, prim+ 1, pn+ inside the cell 2 (Fig. 9c) are defined. Using a coordinate system where the distance between adjacent pixels is 1 (Fig. 9c), the intensity I;,,, of pixel pkm is calculated by
I;,
1
=-
S
C
rij(1 - Irij12)fii
1 j=m,m+ 1 i=n.n+
2. Determination of Relative Thickness Values The intensity values I;,,, of the pixels Q;, inside a cell 2' are used to define the vector R
= (I; 1,.
. . ,I i N , 1; 1 , . .. ,I i N , . . . ,IbN),
(16)
with N = 2" and n being an integer number. Three template images R::2,3 are calculated by averaging the image vectors of the cells contained in three small regions. As shown in Kisielowski et al. (1995), the result of the QUANTITEM evaluation is nearly independent of the selection of the three (different) regions. The further steps of QUANTITEM are based on the assumption that each image vector R can be expressed as a linear combination of the template image vectors
R z plRT + p2RT + p3RT,
(17)
which defines a 3-D subspace in the N2-dimensional image vector space. Furthermore, the tips of all image vectors R lie on a plane E in the 3-D image vector space (Fig. 10a). The vectors El,, given by
-
B , :=
Rf - R l IRT - RTI
and B, :=
v- B1(V-E1) with IV-
i1(V*k1)1
V = R3'- R: (18) IRT - RTI 3
are an orthonormal basis of E (Fig. lob). Due to the noise of the HRTEM image, the tips of the evaluated image vectors ReValmay deviate slightly from the plane E . Therefore, we use ReValto define an in-plane vector TLlvaland a vector T&alperpendicular to E (Fig. lob).
ATOMIC SCALE STRAIN AND COMPOSITlON EVALUATION
141
2b
FIGURE10. Visualization of the QUANTITEM procedure which explains (a) the plane E, which is defined by three template image vectors R I , Z , 3It. contains the cloud of experimental data points represented by vectors (one of them is marked by R,,,,) whose tips form an ellipse; (b) shows the decomposition of the vector T into the components TI,?a, parallel and Tival perpendicular to the plane E ; B, and 8, form an orthonormal basis of E ; (c) illustrates the meaning of some variables used in Eqs. (20) to (22).
142
A. ROSENAUER AND D. GERTHSEN
which is in the range AXval 0.01 - 0.1, is used The value AXv,,:= to estimate the reliability of each value TL\,I. The data i"~v,l= xib, yib2 ( i = 1,2,. . . numbers the unit cells) obtained for each cell Zi are used to fit an ellipse given by x('p)k, y ( ' p ) b , with
+
+
x(9) = a cos cp
+ xo
Y ( ' p ) = b sin('p - 'po)
+ Yo.
(20)
The values a, b, xo, yo, and 'po, which are explained in Fig. lOc, are obtained from a fit procedure. They are used to calculate the angle cpi corresponding to each cell Zi by: 'pi= arctan
a sin cpi + b sin 'pocos 4; b cos cp; cos 'Po
where
4i= arctan (m). (21) Xi-Xo
To obtain sample thickness d, at the position of each image unit cell, we use the following approximation given in Kisielowski et a / . (1995):
where 5 is the extinction distance of the undiffracted beam along the (110)-zone axis (e.g., 5 = 14.7nm for GaAs at 200kV), and 0 is an angle that defines the origin of the thickness scale. The angle 0 is unknown, and its determination requires some additional information. If the thickness of one unit cell is exactly known, 0 is obtained by Eq. (22). The procedure to obtain 0 is outlined in the next section. Another approach that does not require three template image vectors uses the correspondence analysis (CA) seen in Aebersold et nl. (1996), which is also implemented in the DALI program package. This procedure is analgous to the interpretation of the cloud of data points given by Eq. (16) as a distribution of masses that approximates the shape of a nearly 2-D ellipsoid. The CA is used to find the axes of least inertia, which are the "main axes" of the ellipsoid by the calculation of the eigenvectors and eigenvalues of the matrix of inertia. The two eigenvectors that correspond to the two largest eigenvalues define a plane that is analogous to the plane E shown in Fig. 10. The projection of the image vectors onto the plane yields an ellipse that can be evaluated analogously to Eqs. (20-22). However, we found that the application of the CA does not improve significantly upon the use of the three image template vectors RT,2,3.Furthermore, calculation of the two largest eigenvalues and their corresponding eigenvectors takes about 30 min in the case of a 1024 x 1024 matrix of inertia. We turn now to the evaluation of the experimental HRTEM image shown in Fig. 1. However, in this section we use a region of the photographic
143
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
Sample thickness = 0 nm
-0.05
0.00
0.05
0.15
0.10 p
FIGURE11. The open circles indicate the tips of vectors solid line shows an ellipse fitted to the experimental data.
s
i
,
0.20
/[arb. units]
fldva, on the plane (h,,B2).The
negative that contains a larger part of th_e substrate (Fig. 13). Figure 11 shows the tips of the vectors TI,?,f on the (Bl,k2)plane. The solid line is the ellipse, which was fitted to the experimental data according to Eqs. (20) and (21). The thicknesses that correspond to the data points increase in clockwise direction. The next section outlines two methods to calculate 0, which can be used to determine absolute thickness values. 3. Determination of Absolute Thickness Values
A simple method to obtain 0 that works with sufficient accuracy in most cases uses an image vector R, = (1; l , . . . , . . ,Z;VN) with I\ l , . . . , 1; . . ,l i N= c. Here R, represents an image cell that contains only one gray level c corresponding to the image that is expected for a vanishing sample thickness. This cell is added as another data point to the cloud of points formed by Eq. (16), leading to an additional point ( i o , j o ) in E as indicated with a cross in Fig. 11. A straight line that connects the origin of intersects the ellipse at a point (xo,yo), the ellipse with the point (,fo,jo) which can be regarded as the point of the ellipse that corresponds to a
144
A. ROSENAUER A N D D. GERTHSEN
sample thickness 0 nm. However, in systems where three Bloch waves are excited with substantial intensity, this may constitute an insufficient approximation. A second procedure to determine the offset angle 0 of Eq. (22) is based on the method of Stenkamp and Jager (1996) and Stenkamp and Strunk (1996), who suggested considering the amplitudes of appropriate Fourier coefficients Ji. The intensity distribution Z(v) in the high-resolution image is derived from a Fourier sum
in which the complex Fourier coefficients J ( g ) depend on the beam amplitudes and phases and on the microscope parameters as seen in Stenkamp and Jager (1996) and Stenkamp and Strunk (1996). A fast Fourier transform algorithm is used to obtain the Fourier coefficients J ( g ) and to calculate the amplitude
The offset angle 0 may now be derived using a known functional relationship for J ( g ) in dependence of the specimen thickness. Especially the relation
can be used because the amplitudes of all Bloch waves with g # (000) are zero for a vanishing sample thickness. It is appropriate to note that double spacing images, which are dominated by the (220)-lattice fringes, and images showing complex contrast features that originate from low intensities of the J , coefficients cannot be used. However, this restriction does not affect the applicability of the described procedure in practice if a defocus series is taken instead of a single HRTEM images. Both for the conditions mentioned here and for sufficiently small thicknesses the following approximation can be made
IJ,
11~4l/lJ000~4l
4
(26)
which works for GaAs in (110) projection up to a sample thickness of about 6 nm as verified by EMS simulations. Figure 12 displays IJ, ,(d)l/lJ ooo(d)l evaluated from the part of the experimental HRTEM image previously used for the computation of Fig. 11. In this case, the offset angle 0 was calculated as an average of the results obtained for the two methods described in the foregoing. The first method leads to an origin of the thickness scale that is indicated by a cross in Fig. 12.
145
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
0.20
0.15
-2 --. b -
b 0.10 -
0.05
0
2
4 6 a 10 Measured thickness / [nm]
12
14
FIGURE12. The ratio \J,,,\/\Jo0,J for each image unit cell is plotted versus the measured thickness. The dashed line shows the extrapolation of the line that is fitted to the data below 6 nm. The cross on the thickness axis marks the origin of the thickness scale that is obtained using an image unit cell of uniform intensity.
The second method yields an origin given by the intersection of the straight line fitted to the data points (marked with a dashed line) below 6 n m and with the thickness axis in Fig. 12. Together with an error of thickness determination, which is obtained from the spread of the data points in horizontal direction as indicated in Fig. 12, we can estimate a maximum error of f1.5nm in the present example. Figure 13 shows the resulting thickness map, which reveals a wedge-shaped crystal whose thickness increases from the upper part to the bottom of the image.
C. Considerution of the Elustic Relaxation The determination of the local sample thickness is the basis for the development of a 3-D model for the application of the finite-element (FE) method (which represents the main subject of the current section). The FE method is introduced to take into account the elastic relaxation of the tetragonally distorted epitaxial layer, which is due to the small sample thickness in electron beam direction of less than typically 20 nm. In the next
146
A. ROSENAUER AND D. GERTHSEN
FIGURE13. Color-coded map of the evaluated thicknesses. (See also Plate 14.)
section we calculate the tetragonal distortion for two cases: the thin- and the thick-sample limits. 1. The Analytical Solution of the Thin- and the Thick-Sample Limits Figure 14a, b depicts the situation that applies for cross-sectional samples of quantum well type heterostructures. In both cases the strained layer (gray) is able to expand at the free sample surfaces. In the case of the thin sample (Fig. 14a), the strained layer is elastically relaxed at a maximum extent. This
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
a
147
I’
H
Electron bea direction
-&
; I I-
*-* *
#
FIGURE14. Sketch showing the reduction of the tetragonal distortion of the strained layer (marked in gray) in a specimen that is thin in electron beam direction (a) in comparison with a biaxially strained thick sample (b).
leads to a diminished tetragonal distortion in comparison to Fig. 14b that corresponds to the case of a very thick sample that is equivalent to bulk structure. The displacements evaluated in Section 11-A depend on the local composition according to Vegard’s law, which for a ternary material A,B,-,C is given by aArBl -,C
= XaAc
+ (1 - X ) u B , ,
(27)
where aACand a,, are the lattice parameters of the binary components. The main application of strain-state analysis is the composition evaluation in pseudomorphically grown structures. In this case, the unit cells of the epitaxial layers are tetragonally distorted. The lattice parameter a,, of a layer unit cell parallel to the interface plane and perpendicular to the electron
148
A. ROSENAUER AND D. GERTHSEN
beam direction is defined by the lattice parameter a, of the substrate (Fig. 14). The lattice parameter .I";" parallel to the interface plane and parallel to the electron beam direction as well as the parameter a, perpendicular to the interface plane vary locally. For comparison of the FE calculation. with the experimental displacement field we use the approximation that an atomic distance measured from the HRTEM contrast pattern corresponds to a lattice parameter a, that is averaged along the electron beam direction. In the following, the lattice parameters a, are calculated under the assumption of a cubic crystal structure for the electron beam directions (1 10) and (100). If the reference area is chosen inside the substrate (lower crystal part in Fig. 14) the lattice parameter that corresponds to measured lattice spacings in growth direction (Eq. (12)) are, in the two limiting cases of a thin or a thick sample, given by as - a, - cIR----a, - a -a,
a,
where a is the (local) bulk lattice parameter and Ci,jare the elastic constants of the strained layer. Table I gives an overview of the elastic constants and the lattice parameters of the semiconductors used here. Equation (28) can be used to calculate the error of the composition determination if sample thickness is not known. An analytical solution analogous to Eq. (28) for any TABLE I ELASTICCONSTANTS AND LATTICE PARAMETERS OF THE SEMICONDUCTORS THATAREOF INTEREST ~~~~~
Material Lattice parameter/ [nm] at 20°C c l I/[GPa] cl2/[GPa] c44/[GPa]
~~
GaAs
InAs
ZnSe
ZnTe
CdSe
0.5653 118.1 53.2 59.4
0.6058 83.3 45.3 39.6
0.5669 81.0 48.8 44.1
0.6104 71.2 40.7 31.2
0.6081 66.7 46.3 22.3
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
149
sample thickness is given in Treacy and Gibson (1986). Assuming one of the extreme sample thicknesses, we can also give an approximation for the measured displacements. From Eqs. (9) and (11) we obtain
i=l
us
i=l
where u;’ is the (averaged) displacement of the grid line il parallel to the interface plane (Fig. 15). If the strained layer is a ternary material we define the term “integral” AC-content of the binary compound C,, of a ternary material A,Bl -,C in the layer by n
C,,
=
1 x(i)
in units of [MLAC],
130)
i= 1
where the distance between adjacent planes i and i + 1 of grid lines parallel to the interface plane is designated to be one monolayer (ML). From Eqs. (27) and (29) we deduce assuming a, = a&
where u,,, is the maximum displacement that is measured on top of the
I: lattice position
0: reference lattice position
FIGURE15. Schematic drawing that explains the increasing displacements u, in a region of a ternary material A,B, -,C with a larger lattice parameter as the binary compound BC if the reference region is chosen inside the material BC.
150
A. ROSENAUER AND D. GERTHSEN
strained layer and u i c is the lattice parameter in the “substrate” perpendicular to the interface plane. Note that sic is used instead of the bulk lattice parameter aBc in order to take into account cases where the “substrate” BC is a buffer layer that may itself be tetragonally distorted to a certain extent. 2. Finite Element Calculations In the previous section it was shown that the lattice mismatch of a strained layer causes a tetragonal distortion that is reduced by the small sample thickness. A further elastic relaxation can take place for an island that is able to expand at its free surfaces. In the case of the composition evaluation of islands the application of FE-calculation is recommended. It is also advisable for investigation of 2-D buried or free standing layers if the composition estimation based on Eq. (28) is not sufficiently accurate. The FE-calculation starts with the generation of a 3-D geometric model of the sample, which we perform with the MSC-PATRAN program (see reference). For that purpose, the projected shape of the sample that is visible in the HRTEM micrograph as well as the local sample thickness that is determined according to Section II-B are exploited. The 3-D model is composed of “solids” containing a uniform composition. An island or a 2-D layer is therefore designed by a stack of slices where the slice thicknesses in growth direction usually are in the order of two to four monolayers. Figure 16 shows the decomposition of the FE model into solids for the evaluation example. In order to simulate an assumed concentration profile, elastic constants, a virtual thermal expansion coefficient @Thermal, and a heating temperature AT are assigned to each solid with the appropriate material parameters. The expansion coefficients are introduced to simulate a local lattice mismatch that will occur during the FE calculation by a heating of AT. In practice, AT = 0 is chosen for the solids of the substrate and AT = 1 for the solids of the strained material. Therefore, the thermal expansion coefficient of a solid has to fulfill
where as is the lattice parameter of the substrate and a the bulk lattice parameter corresponding to the material of the solid. One also has to define the orientation of the coordinate system (a,,, jet, 2,J associated with the elastic constants, which may deviate from the orientation of the coordinate axes (igeomr jgeom, igeom) used for the generation of the geometric model. For the generation of the geometry the jgeom-
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
151
FIGURE16. Geometry of the FE model and its decomposition into solids of uniform composition.
axis is usually chosen parallel to the growth direction and the igeom-axis parallel to the electron beam direction. If the latter is a crystallographic (1 10) direction, the cordinate system associated with the elastic constants can be formulated as (a,,, j,,, zec) = (2geom + igeom, jgeomr- Ageom + igeom). Furthermore, boundary conditions are defined for displacements u of the = uYgeom = uzeeam = 0) as well as for the bottom plane of the FE model (uXgeom = 0). side planes (uXgeom The third step is decomposition of the solids into finite elements where care has to be taken that the element density is high in regions where large displacement changes are expected (e.g., inside the island or the 2-D strained layer). Figure 17 shows the decomposition of the FE model into finite elements for the evaluation example. In our case, the structural data are written to a file (the input file for the ABAQUS solver (see references)). Figure 17 depicts a color-coded map of the resulting displacements in growth direction, given in nanometers. We apply the following steps to the result of the FEM to gain direct comparison with the experimental displacement values determined by the strain-state analysis of the HRTEM image: In the first step, atomic positions are calculated in the 3-D F E model of the specimen, depending on crystal structure and orientation. Next, atomic displacements are determined by
152
A. ROSENAUER AND D. GERTHSEN
FIGURE 17. FE model with color-coded values of the components of the displacement vectors in growth direction. The color-coded scale is given in nanometers. The light-blue grid indicates the finite elements. (See also Plate 15.)
interpolation of the surrounding nodal displacements. Finally, the atomic positions and displacements are averaged along the atomic rows in electron beam direction. As a result, a 2-D field of projected atomic positions and displacements is obtained that can be evaluated with the DALI program. The result of the FEM simulation is then compared with the experimental displacements. In an iterative process, the solid compositions are changed until sufficient agreement with the experiment is achieved. The displacement vectors that result from the FE-calculation (blue) can be compared in Fig. 6 with the experimental displacement vectors (red) evaluated from a part of the example HRTEM image. The high degree of coincidence shows the validity of the FEM approach to the nanoscopic problem. Figure 18 displays the In-concentration profile in growth direction used for the FEM. Note that the assumed In-concentration does not vary along the planes parallel to the interface. Figure 19 shows color-coded maps of (a) the displacements in growth direction and (b) the displacements in [110] direction. The scaling is chosen identically to that of Fig. 7, which
153
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
80-
-
0 0 0 0 0
-
I
s 60-
0 0 0
u
-
0I
c
.-0 2 E 40-
t
+a
8 c
s 5
-
0 0 0:
. I
20-
-
:0 I
:
0~
1
1
1
1
1
1
1
1
1
1
-
0 0 0 0 0 0 1
~
1
1
1
1
,
1
1
,
(002)plane number FIGURE18. Resulting In-concentration plotted versus the (002)-plane number. The dashed line marks the position of the surface next to the island. The finite In-content t o the right-hand side of the dashed line corresponds to the wetting layer with a thickness of one ML.
FIGURE 19. Components of the displacement vector field in (a) growth direction, and (b) horizontal direction evaluated from the FE calculation as described in Section 11-C-2. (See also Plate 16.)
154
A. ROSENAUER A N D D. GERTHSEN
reveals the good agreement between Fig. 19 and Fig. 7. Figure 8 contains the FE-displacements averaged in a region corresponding to the A01 in Fig. 1. 111. COMPOSITION EVALUATION BY LATTICE FRINGE ANALYSIS The alternative composition evaluation procedure presented in this section does not exploit the information of lattice parameter fluctuations. Therefore, it may be regarded as an analysis method that is complementary to the strain-state analysis described in Chapter 11. First, we will explain the basic idea that leads to the CELFA composition evaluation by lattice fringe analysis (CELFA) method. A. The Basic Idea Behind Coniposition Evaluation by Lattice Fringe Analysis
To outline the basic idea of CELFA let us use the abstraction of the composition evaluation problem sketched in Fig. 20. The investigated crystal is considered to be a system that is defined by N parameters P I . .. P,, where examples are the crystal structure, the orientation, or the composition. The electron beam constitutes an incoming test signal Sinput. The response of the system on the test signal is the intensity distribution of the real image or the diffraction pattern, which is called an outgoing signal Soutpul. The response signal is proportional to the test signal and contains a function of the parameters P , ... P, as well as an additional noise signal Snoise. The noise signal disturbs the measurement and leads to errors of the interpretation of the response signal, which is exploited for computation of some of the parameters P , . . . P,.
Incorning test-signal
defined by the System' parameters P,..P,
1
Outgoing
sinput
so"fp"t=
ffpf *PN1+L *
FIGURE20. Sketch showing the abstraction of the composition determination problem where the investigated crystal is regarded as a system defined by N parameters. The incident electron beam corresponds to the incoming test signal that interacts with the system. The response signal of the system contains a function of the parameters and an additive noise signal.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATlON
155
In our case, the parameter “composition” is of special interest. The information concerning the composition is contained in many of the Bloch waves. In the sphalerite structure, the structure factor SF,,, of a reflection (hkl) is given by
SF,,,
= 4[fA
+
fB
exp{i2n(h
+ k + !)/4}],
(33)
where the f A , Bare the atomic scattering factors of the two-atomic basis, for example, A = Ga, In and B = As. In the case of the (002) reflection we obtain SF,,, = 4 ( f A- f B ) . Therefore, the (002) reflection depends strongly on the chemical composition and is called a “chemically sensitive” reflection. It is appropriate to note that this is not always the case. For an electron beam direction close to, for example, a (1 10)-zone axis, an “artificial” excitation of the (002) beam can occur due to multiple scattering (e.g., (1 11) + (111) = (002)). Figure 21a shows as an example an InGaAs layer buried by 10 nm of GaAs, which was taken with a strongly excited (002) beam (close to the [loo]-zone axis) centered on the optical axis. Figure 22 depicts the dependence between (002)-beam intensity and In-concentration for sample thicknesses up to 150 nm, calculated with the EMS program package using the Bloch-wave method. In Fig. 22, each intensity curve is normalized in such a way that it is 1 in GaAs. Figure 22 can be used to compute a color-coded map of the In-content from Fig. 21a, which is then presented in Fig. 21b, where the sample thickness was assumed to be smaller than 100 nm. In this way one can gain an overview impression of the In-concentration in the wetting layer, which covers the GaAs substrate and in the islands. However, local composition evaluation is inaccurate because of the large amount of noise that is visible in Fig. 21a, b. The amount of noise can be reduced by filtering techniques that are applicable to nonperiodic images. An example given in Baba and Kanaya (1989) is an approach using the autocorrelation (AC) function where uncorrelated information close to the center of the AC spectrum is removed (Fig. 21c). However, Fig. 21c reveals that the accuracy of the local composition detection is not sufficient even in the case of the noise reduced image. Moreover, blurring of the interfaces is observed. For further development of the CELFA procedure, let us again consider Fig. 20. The situation shown in Fig. 20 reflects not only experimental methods like the electron microscopy, but applies to a wide variety of measurement techniques. A simple example is the recording of the current versus voltage curve of some resistor or semiconductor samples. In terms of Fig. 20, the incoming test signal is the applied voltage, the system is the resistor or the semiconductor, and the response signal is the measured current. The system is defined by its resistance, which is calculated by Sinpu,/Soutput. If the resistor is used for the temperature measurement in a
156
A. ROSENAUER AND D. GERTHSEN
FIGURE21. (a) (100) TEM dark field image of an In,Ga,-,As/GaAs (001) StranskiKrastanow layer capped with lOnm GaAs, obtained with the strongly excited (002) reflection centered on the optical axis; (b) color-coded map of the local In-content calculated according to Fig. 22; (c) TEM image; and (d) color-coded map after noise reduction. (See also Plate 17.)
helium cryostat, the produced heat has to be kept as low as possible. In this case, the measured currents are low and the resistance measurement is disturbed significantly by noise. The signal-to-noise ratio (SNR) is improved by the use of the lock-in technique. This means that the test signal Sinpur is modulated with a defined frequency fmod. The output signal that is
U '
0
I
2
I
I
4 6 Normalized image intensity
I
8
FIGURE22. Indium concentration plotted versus the normalized image intensity calculated for an image that is obtained with the (002) beam centered on the optical axis. The curves corresponding to different sample thicknesses as given in the legend are calculated with the Bloch-wave method of the EMS program package. Each computed curve is normalized with respect to the intensity in the GaAs ( x = 0) at the appropriate thickness.
158
A. ROSENAUER AND D. GERTHSEN
proportional to Sinput shows the same modulation. A Fourier filter is used to measure the amplitude of the Fourier coefficient corresponding to f m o d . In this way, the SNR is improved significantly because the Fourier spectrum of the noise signal generally contains only negligible contributions of the frequency fmod. The question that arises at this point is: How can we make use of the lock-in technique for composition evaluation with chemically sensitive reflections? Of course, modulation of the test signal “beam current” is the wrong way because the noise signal that stems mainly from amorphous surface layers is alscproportional to the test signal. In this case one has to consider a spatial modulator rather than a modulator in time. If we remember that HRTEM images are composed of a limited number of spatial frequencies we can learn that the crystal lattice itself can take over the task of a spatial modulator. From the preceding discussion, the closest approach to the lock-in technique would be an image that shows only one spatial frequency leading to a fringe pattern where the local amplitude of the fringes is proportional to the local amplitude of the (002) reflection. In this case, the noise-filtering technique described in Section 11-A-1 and that works with periodic images could be used. The measurement of the local amplitude of the (002) reflection in image unit cell diffractograms would yield undisturbed information. How can we approach this idealized concept in practice? First, we have to consider that the (002) beam that carries relevant information should not be modified by lens aberrations. Therefore, the (002) beam has to be centered on the optic axis. In this way delocalization effects (Thust et ul., 1996) are also avoided because they depend on the spatial derivative of the aberration function that vanishes on the optic axis. Second, adsorbed objects at the surface of the specimen can lead to a local modification of the signal, which would induce an error in composition detection. Therefore, we suggest the use of two spatial frequencies, where one of them does not carry significant chemical information as it is approximately the case for the (004) beam. The amplitude of the second reflection depends on local absorption but not on composition. Therefore, we suggest the measurement of the ratio of the amplitudes of the (002) and the (004) beam. Third, one has to take into account the nonlinear image formation if the Fourier amplitudes of image cell diffractograms are measured. In summary, we suggest the following imaging condition in the case of the sphalerite structure: A three-beam condition close to the [100]-zone axis is required where only the (000), (002), and (004) beams are strongly excited. The chemically sensitive (002) reflection has to be centered on the optic axis. Figure 23a,b shows the imaging condition that will be used for the
15.5
14.0 -0.1
0.1
MAGNETIC flELD (T) PLATE1. Conductance measurements of open quantum dots provide a spectroscopic probe of their discrete level spectrum. The energy level spectrum of an isolated (0.3 ,urn) dot is shown on page 6; here, we show the corresponding conductance contour plot, obtained with four modes now propagating in the dot leads. Lighter regions correspond to higher conductance.
PLATE2 . Quantum mechanical wavefunction simulation showing electrons emerging in a highly collimated beam from a quantum point contact. The gate geometry is taken to be the same as the asymmetric pattern shown in Fig. 2(a). In the left-hand figure, only one occupied mode is present in the quantum point contact, while in the right-hand one, seven modes are supported.
PLATE 3. The well-defined periodicity observed in the weak field magneto-conductance fluctuations is found to be correlated to the recurrence of well-defined wavefunction scars within the dot. In this figure we show the behavior observed in a 0.4-pm split-gate dot, which reveals fluctuations with a fundamental frequency of 9 T-'. This frequency content does not change significantly as the dot lead openings are varied, and corresponds closely to the field scale over which a diamond scar recurs. Lighter regions in these probability density plots correspond to regions of enhanced probability density. The experiment was performed at a temperature of 0.01 K.
PLATE4. Diamond scars formed in quantum dots with different numbers of modes present in the quantum point contact leads. The dot size here is 0.3 pm, which corresponds to the effective size of the experimental dot studied in Fig. 8.
PLATE5. Self-consistently computed wavefunction plots, obtained from simulations of the splitgate dot geometry shown in Fig. 14. The plots were obtained at three different gate voltages and the darker regions correspond to enhanced probability density. A typical dot profile is shown in the upper left figure.
-0.419
-0.363 -0.25
0
0.25
MAGNETIC FIELD (TESW PLATE6 . Experimentally determined conductance contour plot, obtained for a 0.4-prnsplit-gate quantum dot. The color scale ranges from red to blue, indicating low to high conductance, respectively.
PLATE7. Numerically determined conductance contour plot, for a 0.3-pm quantum dot. The well-defined lines indicated by the arrows correspond to lines of constant scarring.
Plate 8. Left: The root mean square (rms) amplitude of the conductance fluctuations decreases exponentially with increasing temperature in experiment. Right: The experimental temperature variation is thought to reflect a similar exponential quenching of the wavefunction scarring, which is induced as the electron dephasing rate increases. In this figure the computed wavefunction in a 0.3prn dot is shown for a number of different phase breaking times (ta).
PLATE9. Convolution of the density of states with the derivative of the Fermi function in a I - p n gated dot at 2.9 T (only the two lowest Landau levels are shown). In this figure, only the upper left comer of the dot is shown, in the region near the input quantum point contact (see Fig. 2 for the dot geometry) (Bird et al., 1997e).
PLATE10. The edge state structure in a quantum dot at high magnetic fields suggests an analogy to the level structure of atoms. In this case, the red regions correspond to compressible electron gas and the calculation is performed for the gate geometry shown.
PLATE11. { llO} HRTEM image of an In,Gal.fis/GaAs(OO1) Stranski-Krastanow island containing the grid that connects the local brightness maxima of the dumbbells. The marked area of interest (AOI, blue frame) is used for the determination of the In-concentration inside the island. The reference area (green frame) is used for the calculation of the basis vectors of the reference lattice.
PLATE12. Part of the displacement vector fields evaluated from Fig. 1 (drawn in red) and obtained by finite element calculation (blue) as outlined in Section II.C.2.
PLATE13. Color-coded maps of the components of the displacement vector field (a) in growth direction and (b) in interface direction (a positive value indicates a displacement vector pointing to the right) deduced from Fig. 1 .
RATE 14. Color-coded map of the evaluated thicknesses.
PLATE15. FE model with color-coded values of the components of the displacement vectors in growth dirction. The color-coded scale is given in nanometers. The light-blue grid indicates the finite elements.
PLATE16. Components of the displacement vector field in (a) growth direction, and (b) horizontal direction evaluated from the FE calculation as described in Section II.C.2.
PLATE17. (a) { 100) TEM dark field image of an InxGal.fis/GaAs(OO1) Stranski-Krastanow layer capped with 10 nm GaAs, obtained with the strongly excited (002) reflection centered on the optical axis; (b) color-coded map of the local In-content calculated according to Fig. 22; (c) TEM image; and (d) color-coded map after noise reduction.
PLATE18. Color-coded maps of the local In-contentx of a nominally 2 nm thick In,Ga,,As layer capped with 10 nm GaAs. The cap layer is to the right of the In,Gal.,As. The maps are obtained from / lJw4/ and (b) lJw21of local unit cells using a mean value of (a) ~Tooz~ / lToo41and (b) I (a) I,,JI Too21calculated from the GaAs Buffer on the left-hand side of the In,Ga,.&.
PLATE19. Color-coded maps of the local In-content x obtained from lJoo21/ IJ I of local unit 904 cells using (a) a mean value of lToo21I 1Too41calculated from an area of 10x10 cells in the upper left comer of the image and (b) a local map of lToozl/ lToo41.
PLATE 20. (a) Local values of Too2/ T,,computed in two regions to the left and the right of the In,Ga&s; (b) local map of Too2 / Tm4 obtained after averaging and extrapolation of the values shown in (a).
PLATE2 1. Color-coded maps of the relative error Ax / x per Af = 1 nm uncertainty of the measand (c,fl AI,Ga,,As using (a,b,c) lJoo21 ured sample thickness for (a,d) In,Gal,As (b,e) CdJnl,Se / lJm41and (d,e,f) lJm21 for the composition determination. The graphs were computed according to Eq. (65) using the definition of S (x,t,0) given in Eq. (59) for (a,b,c) and in Eq. (66) for (d,e,fl.
PLATE22. Color coded maps of (a) local lattice parameters in growth direction; (b) local displacement vector field components in growth direction; and (c) thicknesses evaluated with QUANTlTEM procedure in the region indicated with a black rectangle in (a). The arrow in (c) marks the region corresponding to the In,Ga,,As where the thickness map yields invalid results.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
159
FIGURE23. (a) Electron diffraction pattern that illustrates the imaging condition. The white cross in the upper left corner marks the [100]-zone axis; (b) schematically shows the (000). (002), and (004) beams that are used for the formation of the HRTEM image. The chemically sensitive (002) beam is centered on the optic axis; (c) illustrates the possibilities of tilting the crystal around an axis perpendicular to the interface plane (light gray) and parallel t o it (dark gray). The former case results in a fringe pattern parallel to the interface and the latter leads to a pattern with fringes running perpendicular to it. Note that in the latter case the tilt angle should be smaller than 4' in order to prevent a significant broadening of the evaluated concentration profile.
description of the CELFA method. Figure 23c visualizes the crystal orientation for a specimen containing a thin buried layer with a [OOl] growth direction. There are two possibilities, one using the (000), (020), and (040), the other using the (000), (002), and the (004) beams, to gain the required three-beam condition. The difference between the imaging conditions using either the ( O O j ) or the (0,jO) beams ( j = 2,4) is that the fringes run either parallel or perpendicular to the (001) interface. In the first case, the fringe spacings depend on the local composition in strained heterostructures. In the second case, all fringes have the same spacing, the substrate lattice
160
A. ROSENAUER A N D D. GERTHSEN
parameter, in pseudomorphically grown structures. Therefore, strain effects that may influence the HRTEM image patterns are avoided if the (OjO) beams are used. As it is shown in Fig. 23c, the evaluation with the ( O j O ) beams requires a tilt of the sample around an axis parallel to the interface plane, which results in a broadening of the projected interlayer. Therefore, the tilt angle has to be small and should be in the range of 2”-4”. In the following treatment, we will not distinguish between the (OOj) or the (OjO) beams. The “standard procedure” for determination of the local of the composition is based on consideration of the ratio IJoo21/1Joo41 amplitudes of the (002) and the (004) reflections of Fourier-transformed image unit cells. The suggested procedure requires a defocus series of n images (typically n = lo), which enables evaluation of the local sample thickness. All free parameters of the image formation are derived directly from the defocus series with a simple procedure. Therefore, the CELFA method does not require any knowledge concerning imaging conditions (except sample orientation). Furthermore, we will show how the effect of locally changing imaging conditions across the HRTEM image can be taken into account.
B. The Fringe Images In Section 111 we change the evaluation example and use a buried In,Ga, -.As Stranski-Krastanow layer that was grown on a GaAs (001) substrate and capped with 10 nm GaAs. Figure 24a depicts a fringe image, which is the first image of a defocus series of 10 images. The defocus stepsize was adjusted to 9 nm. The image contains a large amount of noise. Figure 24b represents the same image after noise reduction, performed with the Wiener filtering method described in Section 11-A-1. Additionally, the back transformation of the noise-reduced Fourier transform was performed with only circular areas around the (002) and (004) reflections. The radii of the circular apertures were chosen in such a manner that the circles overlap. Inside the GaAs buffer and cap layer, Fig. 24b reveals a contrast pattern consisting of alternating bright, dark, less bright, and again dark fringes. The distance between the bright fringes is about 0.28 nm. With increasing In-content (from the left to the middle part of Fig. 24b), the intensity of the bright fringes decreases whereas the brightness of the darker fringes increases. As we will see later, their intensity becomes equal at an In-content of about 22%. From Fig. 24b we realize that the used imaging condition seems to be well suited for the compositional analysis that we intend. The next section provides the theoretical description of the observed contrast pattern, which is necessary for understanding of the described analysis procedure.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
161
FIGURE 24. HRTEM fringe image obtained with the imaging condition shown in Fig. 23; (a) the micrograph before; and (b) after Wiener noise filtering. The white rectangle in (a) marks the area that is used for the calculation of the sample thickness.
C. Theoretical Considerutions
We start with a consideration of the (000), (002), and (004) beams that contribute to the image formation. The amplitudes of all other Zero Order Laue zone (ZOLZ) beams are comparably low and will be neglected in the following. Figure 25 shows the amplitudes and phases of the three beams in In,Ga, -,As in dependence of the sample thickness and the In-concentration
162
A. ROSENAUER AND D. GERTHSEN
'
I , , , , , , , , , , , ,, ,, , , ,
I
0.00 0 5 10 15 20 25 30 35 40 45 50 Sample thickness [ nm ]
0 5 10 15 20 25 30 35 40 45 50 Sample thickness [ nm ]
-
FIGURE25. Results of Bloch-wave computations performed with the EMS program package for In,Ga, _,As for different indium concentrations. Amplitudes of (a) the (OOO), (c) the (002), and (e) the (004) beams are plotted versus the sample thickness; (b), (d), and (f) show the corresponding phases.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
163
x calculated with EMS. Figure 25 reveals that the amplitudes and phases of
the (000) and (004) beams show a rather weak dependence on N. This is also the case for the phase pooz of the (002) beam, whereas its amplitude uoo2 varies strongly with x. Note that aoozchanges its sign in dependence of x. This is due to a phase shift of n, which for clarity was attributed to the amplitude aoo2 instead of to poo2. For x = 25%, the curve aoo,(t) (t denotes the sample thickness) is close to zero throughout the entire sample thickness range. In this case, the image is formed by the interference of only the (000) and (004) beams, which leads to an image pattern consisting of fringes with the same maximum intensity at a distance of 0.14 nm. This is the case for all sample thicknesses t and defocus values A$ Figures 26 and 27 show the amplitudes and phases for Cd,Zn,-,Se and Al,Ga, -,As, respectively. In Cd,Zn,-,Se, the amplitude of the (002) beam vanishes at a Cd-content of approximately x = 40%. In contrast, Al,Ga, -,As represents a material where the amplitude of the (002) beam remains positive in the whole range of x. Let us now consider the nonlinear image formation in some detail for the given conditions. According to lshizuka (1980), the complex amplitude J ’ ( g ) of the reflection g of the image power spectrum is in the case of the untilted electron beam given by
J’(g) = C T(g
+ h, h; Af)F(g + h)F*(h)
(34)
h
where F(g) is the Fourier transform of the object transmission function. The T(g + h, h; A f ) is the transmission cross coefficient defined by Ishizuka (1 980), which has the properties Tk, h ; A f ) = T * ( k g;A.f)
(35)
T(g,h; A f ) = V - g , - k A . f ) .
(36)
and
As described in Section 111-A, the incident electron beam is tilted in such a way that the (002) beam is parallel to the optical axis. In this case, Eq. (34) is modified to J(g) =
c T(g + h
- goo,,
h - goo,;
+ h)F*(h)
(37)
h
As mentioned in the preceding, we only consider the beams (000), (002), and (004), which leads us to F(goo,)
=0
for 1 # 0,2,4.
(38)
Therefore, we obtain the complex Fourier coefficients of the three relevant
164
A. ROSENAUER AND D. GERTHSEN 1.oo
0.80 0
B
0.60 'El c
.--
5 0.40 0.
0.20
0.00 0.2
0.1 8
m
a
u
.z - 0.0 E+
Q
-0.1
-0.2
0.80
H
," 0.60 'El 2 -a .c
0.40
0.20
0.00
Sample thickness [ nm ]
Sample thickness [ nm ]
FIGURE26. Results of Bloch-wave computations performed with the EMS program package for Cd,Zn, _,Se for different cadmium concentrations. Amplitudes of ( a ) the (000).(c) the (0021, and (e) the (004) beams are plotted versus the sample thickness; (b), (d), and (f) show the corresponding phases.
ATOMIC SCALE STRAIN A N D C O M P O S I T I O N EVALUATION
165
1.oo
0.80 8
," 0.60 '0 3 c .Q 0.40
0.20
0.00
0.2
B
rn
a,
3 .--
I
E"0.1
Q
0.0
0.80
B 0.60
3 .--
I
5a 0.40 0.20
0.00 (
5 10 15 20 25 30 35 40 45 50
Sample thickness [ nm ]
0 5 10 15 20 25 30 35 40 45 50
Sample thickness [ nm ]
FIGURE27. Results of Bloch-wave computations performed with the EMS program package for AI,Ga, -.,As for different aluminium concentrations. Amplitudes of (a) the (000). (c) the (002). and (e) the (004) beams are plotted versus the sample thickness; (b), (d), and (f) show the corresponding phases.
166
A. ROSENAUER A N D D. GERTHSEN
reflections in the image power spectrum as follows:
T(0, -gOo2;A f ) ' 2 )T*(-g 0 0 2 , 0; A f ) ( 3 )T*(goo2,0; 4 f ) .
(42)
The quantity T(goo2,-gOo2; A f ) in Eq. (41) is real because of T k o o 2 , -x',oz:
* 4f) (35) = T ( - ~ 0 0 2 > g 0 0 240 ~ (2)T*(g,,,,
-goo,;
41'). (43)
Therefore, we may use the abbreviations TOo2 exp( - i ( x ,
+ xSN := T(O, - g o O 2 ;4fL
To,,, exp(i(x.r+ xs)) : = T(g002,0; 4f) and
To04 =
T(g002, - g 0 0 2 ;
4f) (44)
where Too2 and TOo4are real numbers and xr and xs are phase shifts introduced by the objective lens defocus Aj' and the spherical aberration, given by 71
2 xf = 5 24f1.goo2;
71
xs = - CsL3&02.
2
(45)
In Eq. (45), i, is the wavelength of the incident electron beam and C, the spherical aberration constant. Inserting the abbreviations in Eq. (44) into Eqs. (40) and (41) yields J(x'oo2) =
~oo,CexP(- i(X.r.
+ X s ) ) F ko o z F * ( O )
+ exp(i(X.r + x s ) ) F k 0 0 4 ) ~ * ( ~ 0 0 2 ) 1
(46) (47)
J(g004) = T004F(g004)F*(0).
Furthermore, we introduce F(go0,)= u O o feipooJ, . 1 = 0,2,4.
(48)
by From Eqs. (46) and (47) we obtain the amplitudes of J(g,,,) and J(gOo4) IJ(~O"2N
= ~ " 0 2 J ( ~ 0 0 2 ~ o o o +(Q )2 004Q"02)2
/J(g004)1= Too4~004~0oo~
+ 2Q002~000Q004~002 cos(cp,,)
(49)
(50)
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
167
where q n is given by
vn = - 2(x + xs) + (Po02 =
+ 2P002 - PO00
-2x,,
- P O O " ) - (Po04
- Po021
(51)
- P004'
Here x,,= xs + xs corresponds to the image n of the defocus series. Equations (45), (49), and (51) show that a linear change of the defocus results in an oscillation of IJ(g,,,)J. The defocus change corresponding to a full oscillation of lJ(gOO2)) is given by
The ratio ~ J O O 2 ~ / ~ is J ocalculated o4~ from Eqs. (49) and (50),leading to Too2
lJ002l/lJ004l
= __ Q002
To04
1
1 x
u604
+,-+-
'000
2
coS(-2x,+2P,02
-PoOo-P0O4).
u004'002
(53)
D. Determinution qf Sumple Thickness und Phase I, Equation (49) indicates an oscillation of (J(goO2)l in dependence of q,,, which is a linear function of the defocus according to Eqs. (45) and (51). Figure 28 gives an example of a defocus series consisting of 10 exposures (9 of them are shown), where the defocus stepsize of the microscope was adjusted to 9nm. Each image is a small part of an image of the size of 1024 x 1024 pixels2. An area of known composition in the GaAs buffer layer is used for the thickness determination and is marked by the white rectangle in Fig. plotted versus the image 24a. Figure 29 shows the amplitudes JJ(gOo2)l number (running from 1 to 10, 1 corresponding to the largest underfocus). Figure 29 was obtained in the following way: From each of the ten images, a region of the same size and position (indicated in Fig. 24a) was Fouriertransformed. The 10 pixels with the largest amplitudes enclosed in a circular area around the (002) reflection were summed for each image. The data points were fitted by a curve according to Eq. (49) given by
+
I J ( ~= ~ (1 ~ ~ An)JB ) I
+ Ccos(D(iz - E ) ) ,
n = 1 ... 10,
(54)
where A , B, C, D, E are the fit parameters and n is the image number. The factor (1 An) in Eq. (54) takes into account the (weak) defocus
+
168 A. ROSENAUER A N D D. GERTHSEN
FIGURE28. Small parts of the HRTEM images of a defocus series of 10 images (nine of them are shown), each showing the In,Ga, _,As interlayer as well as the GaAs cap and buKer layers. A defocus stepsize of the microscope of 9n1n was chosen. The images 2, 5, and 8 show a fringe pattern that is dominated by the (004) fringes corresponding to minima of the amplitude of the (002) reflection in Fig. 29.
169
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION I
-
I
I
I
I
I
I
I
I
-
2 . 0 1~04-
2
4" 0
-
1.5~10~-
W
-g
-0
2 .-
-
1.0~10~-
d
-
5 . 0 1~03I
I
1
2
I
I
I
I
I
I
3 4 5 6 7 8 Number of image of defocus series
1
I
9
10
FIGURE 29. Amplitude of the (002) reflection in the d i h c t o g r a i n plotted versus the image number shown in Fig. 28. The values J,,, and Jmi, are used to derive the specimen thickness.
dependence of TOo2in Eq. (49) that contains the source-size-dependent envelope function E,(g h, h; Af) (Ishizuka, 1980). The values obtained for the fit parameters are
+
A
B
= 0.012,
= 1.88 x los,
C = 1.7 x lo8, D
= 2.06
0.04, E = 0.6. (55)
According to Eqs. (52) and (54), the defocus stepsize between adjacent images of the defocus series is given by
AStepsize(Af)= D/(2nAy&,2) = (10.3 f 0.2) nm. , ~Eq. (49) result from Eq. (54) to q n = D(n - E ) . Finally, the The angles ~ p of phases x, of Eq. (51) are calculated by ~n
= ~ 0 0 2-
3vn + ~ 0 0 0+ ~ 0 0 4 )= ~ 0 0 2- t ( D ( n - E ) + ~ 0 0 0+ P O ~ J ,
(56)
where the pOoj are computed by the Bloch-wave method for x = 0 for the appropriate thickness whose determination is described in the following. From Eq. (49) we recognize that the maxima J,,, and minima Jminof IJCg,,,)l correspond to cos(qn) = f 1, leading to Jmawnin
= ~ 0 0 2 J ( a 0 0 2 a 0 0 0 ) ~ 2aoozuoooaoo4aoo2
=
*
~ 0 * 2 ( ~ 0 0 2 ~ 0 0 aoo4aoo2) 0
+ ( ~ 0 0 4 a o O2~ ) (57)
170
A. ROSENAUER A N D D. GERTHSEN
. .
-
0
-
-
0
m"
-
0
5
10
15
30 35 sample thickness [ nm ] 20
25
40
45
50
FIGURE 30. Amplitude ratio of the (000) and (004) beams plotted versus the sample thickness. The graph is used to determine the specimen thickness.
As already mentioned in this section, Too2contains the source-size-dependh,h; A , f ) (Ishizuka, 1980) that weakly depends ent envelope function E,(g on the defocus A$ Therefore, the values of J,,, and Jmi,change slightly For the following we between two adjacent oscillation periods of IJ(gnnz)l. as shown in Fig. 29 and deduce from Eq. (57): use J,,, and Jmin
+
+
In our case, the value (J,,, Jmi,,)/(JmaX - Jmin)= 1.6 is calculated from Fig. 29. The sample thickness can be obtained directly from the thickness dependence of aooo/aoo4for GaAs shown in Fig. 30, which was calculated with the Bloch-wave method. Figure 30 also contains uooo/aoo4for ZnSe. The calculations were performed with an absorption coefficient of 0.04. However, the curves anoo/uon4versus the thickness do not depend on the absorption in a good approximation. In the case here, we find a sample thickness of 16nm.
E. The Evaluation Procedure In this section the individual steps of the evaluation process are listed. Figure 31 gives an overview of the analysis steps that are described in the
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
171
FIGURE 31. Schematic drawing showing the individual analysis steps of the CELFA procedure. The rectangles indicate procedures and the ellipses are the data that are used for a special procedure. In the first step ( I ) , N images of a defocus series (squares with rounded corners) are used to evaluate the sample thickness t (Eq. (58) and Figs. 29 and 30) and the phasc angles xi (Eqs. (49) and ( 5 1 ) ) . One image i with a large value of IJOnzl is noise reduced (2) and oriented that the fringes are running in a horizontal direction (3). Using a Fourierfiltered image formed by the (004) reflection only, a 2-D grid is calculated that subdivides the image into image unit cells (4). The next step (5) involves the computation of reflection amplitudes and phases, the measurement of the real factors Tnnzand Tno4either as mean values inside a reference area or locally inside regions with .Y = 0. In the latter case, it follows the computation of a local map of either T,,,/T,,, or Tnnz that is used for concentration determination according to Eq. (53) or Eq. (49), respectively. In the steps (6) and (7), experimental values of either ~J0,,~/JJno41 or lJn021 obtained from each imagc unit cell are or of JJoO11, which are computed with Eqs. (49) compared with a list of values of ~Jooz)/JJ,04J and (50). The entry with the best fit yields the evaluated concentration x.
172
A. ROSENAUER AND D. GERTHSEN
following: 1. Determination of sample thickness und x,,. Sample thickness and angle
X, are measured from the unprocessed N images of a defocus series as described in Section 111-D. 2. Noise reduction. The image n of the defocus series to be evaluated is Fourier transformed and the filter @ of Eq. (4) is computed (Section 11-A-1) and subsequently applied to the Fourier transform. Only circular areas around the k ( 0 0 2 ) and the k(004) reflections as well as the central pixel are used for the inverse Fourier transformation. The size of the circles has to be chosen large enough that relevant information is not lost (e.g., satellite reflections close to the (002) reflection in the case of a compositional superlattice must be included). 3. Correction of the image orientation. The fringe patterns recorded with an on-line CCD camera are accidentally oriented. However, the CELFA procedure of the DALI program package requires the fringes running along the horizontal direction. Therefore, the image has to be rotated with the procedure outlined in Section 11-B-1: “cell transformation” where the whole image is used as “cell”. The rotation angle is computed from a line that is drawn by the user parallel to a fringe. 4. Subdivision of the image into imuge unit cells. In the case of the fringe pattern the 2-D gridding is connected with two difficulties. First, information is not available for the positions of the grid lines perpendicular to the fringes. Their distance, therefore, has to be chosen. In most cases quadratic image cells seem meaningful. The horizontal grid lines parallel to the fringe pattern are found by searching for brightness maxima along the chosen vertical grid lines perpendicular to the fringes. For measurement of the Fourier amplitudes, their minimum distance is given by the spacing of two bright fringes do,, = a0/2 with a. being the lattice parameter. Often, the distance 2dO0,is chosen in order to improve the accuracy of the composition evaluation. In both cases, the horizontal grid lines lie on either the bright or the less bright fringes in the regions with x = 0. However, the search procedure condition, in which the grid lines have to be positioned on either the bright or the less bright fringes, is not sufficient if their brightnesses interchange for concentrations x > xo. The value xo corresponds to the concentration where amplitude aoo2 of the (002) reflection vanishes, for example, xo = 0.22 for In,Ga,-xAs. This second difficulty of the gridding can be surmounted by the following procedure. The image is Fourier transformed and the inverse transformation is performed with the k (004) reflections only. Intensity maxima positions are searched along the chosen vertical grid lines leading to a distance dOo2/2of the horizontal grid lines. Then, horizontal grid lines are deleted in such a way that either each second
A T O M I C SCALE STRAIN A N D C O M P O S I T I O N EVALUATION
173
or each fourth line is kept. The reduced grid is indexed as described in Section 11-A-2 and lattice base vectors are calculated according to Section 11-A-3. Next, the positions of the grid lines with respect to the original image have to be checked if one aims for positioning the grid lines either (a) on the bright or (b) on the less bright fringes in the region with x = 0. Figure 32 explains why these two options are not equivalent in particular
FIGURE 32. Schematic drawings explaining the difference if either (a) the bright fringes or (b) the dark fringes are lying at the image unit cell corners. Note that the egecect may also occur conversely.
174
A. R O S E N A U E R A N D D. G E R T H S E N
cases. Let us assume that the bright fringe in the center is induced by one row containing Al atoms with the concentration x l , which is embedded in GaAs with the interfaces parallel to the vertical grid lines. Let us furthermore assume that the distance of horizontal grid lines is d,,,. In case (a), there will be two rows of cells that reveal an Al-concentration of x1/2, whereas in case (b) we find only one row that shows a concentration x l . If the grid lines are not positioned as intended, they can be shifted by one-fourth or one-half of the vertical lattice base vector. The resulting grid that decomposes the image into unit cells will be used for the following analysis steps. 5. Determination of To,,, und Too4.A table is calculated with the Blochwave method and lists the values uooo,uoo2. uoo4 and pooo, poo2, poo4 as a function of the In-concentration x for the relevant sample thickness. A stepsize of 1% is chosen for the In-concentration. A “reference region” is are selected in an area with x = 0 of the image n and lJ(goo2)land lJ(goo4)l measured and averaged over all unit cells contained in the reference region. Then, Too2and Too4are calculated from Eqs. (49) and (50) using the sample thickness (Eq. (58)) and the angle x,, (see Eq. (56)) obtained by the procedure in the previous section as well as uooo, uoo2. aoo4 and pooo, poo2, poO4 calculated for x = 0 with EMS using the Bloch-wave method. The procedure that is indicated by the dotted box in Fig. 31 is only applied if the sample thickness and defocus change significantly in the area of interest (see Section 111-F). 6. A second table computed from the first one, which was previously generated in step 5 , lists the values of the formula at the right-hand side of the equal sign of Eq. (53). 7. The ratio 1Joo21/1Joo4( for each image unit cell of the experimental image is compared with the calculated values of the second table formed in step 6. The table entry with the best agreement yields the local In-content x. Figure 33a is a color-coded map of the local In-content obtained for image 1 of the defocus series. The yellow regions correspond to the GaAs and the green to red regions show the In-content in the In,Ga, -,As layer. Steps 1 to 7 describe the “standard” evaluation procedure CELFA. However, in some cases it may be favorable to apply variants. A first variant of the standard procedure that is contained in the DALI forJ ocomposition o4~ program package does not use the ratio ~ J o o z ~ / ~ determination but only exploits IJoo21. This variant is meaningful if only small variations of lJoo2[ have to be detected with high accuracy. It is do not contribute to advantageous that errors of the measurement of [Joo41 the evaluation of x. The disadvantage is that this variant does not take account of disturbances on the sample surfaces (amorphization by the ion-milling procedure, oxides, contaminants). However, the presence of such
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
175
FIGURE33. Color-coded maps of the lcal In-content .Y of a nominally 2nm thick In,Ga, _,As layer capped with lOnm GaAs. The cap layer is to the right of the In,Ga,_,As. The maps are obtained from (a) ~.Joo2~/~Joo4[ and (b) lJoo21 of local unit cells using a mean value 4 (b) Too, calculated from the GaAs buffer on the left-hand side of the of (a) ~ o z / T o oand InXGa, (See also Plate 18.)
as.
imperfections can previously be checked with the standard procedure. In this case, steps 6 and 7 of the standard procedure have to be altered. In step 6, the second table has to list the values of the right-hand side of Eq. (49) instead of Eq. (53). In step 7, IJooz( is compared with the second table formed in step 6. Figure 33b gives the result for the described variant. Differences between Fig. 33a and b can only be recognized for the highest lead to the In-concentrations, where the errors of the measurement of (Joo4( largest deviations A x . A second variant that is described in the next section takes account of imaging conditions that may vary across the image.
I? Correction of Imaging Conditions Varying Across the Image Figure 34a shows a map of the In-content calculated from the largest possible section of image 1. The reference area was chosen in the upper left corner. Obviously, the evaluated In-content does not vanish in the whole GaAs region but increases from the top to the bottom. This artifact is due to the slow variation of the imaging conditions such as the defocus and the sample thickness, which affects the local validity of T',,,/T,04 calculated in the reference region. In this section we describe the correction of this effect, leading to a higher accuracy of the evaluated In-concentrations.
176
A. ROSENAUER AND D. GERTHSEN
FIGURE34. Color-coded maps of the local In-content x obtained from ~ J o o 2 ~ / ~ Jof oo4~ local unit cells using (a) a mean value of To,,,/Too4calculated from an area of 10 x 10 cells in the upper left corner of the image, and (b) a local map of Tooz/Too4.(See also Plate 19.)
The procedure is based on the calculation of a map that yields the appropriate values of T',,/To,, for each image unit cell. Obviously, To,,/T,,, is easily obtained for each unit cell inside the regions with x = 0 by the application of Eqs. (49) and (50). Figure 35a shows a color-coded map of Too2/To,, calculated in the GaAs. We clearly recognize the variation of Too,/Too,from the top to the bottom of the image. The values for Too,/To,, in the regions with x > 0 are extrapolated from those with x = 0. Optionof each cell inside the region with x = 0 can be ally, the value T,,,/T,,, averaged by the computation of a mean value with neighbor cells. Figure 35b shows the completed map for Too2/Too4. The evaluation process of Section 111-Ehas to be altered in such a way that the second table generated in step 6 is calculated for each image unit
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
177
FIGURE35. (a) Local values of Tooz/Too4 computed in two regions to the left and the right of the In,Ga,_,As; (b) local map of Too2/Too4 obtained after averaging and extrapolation of the values shown in (a). (See also Plate 20.)
cell separately using the appropriate value for Too2/Too,contained in the map (Fig. 35b). The result is shown in Fig. 34b. Obviously, the artifacts visible in Fig. 34a are removed in Fig. 34b. In cases where only lJOo2/is exploited for the composition evaluation as described in the previous section, a map of local instead of IJoo21/1Joo41 values of Too2is generated instead of Too2/Too4.
G. Errors of the Composition Detection Due to Sample Thickness Uncertainties In the following, we discuss the accuracy of the evaluation of the Inconcentration depending on the error At for determination of sample
178
A. ROSENAUER AND D. GERTHSEN
thickness. First, we introduce the abbreviation S(X, t,
cp) =
&-
1
2
a004
aOOO
(59)
Wcp).
+T+ a004uOO0
Next, we use the following approximation for aoo,(x, t): aoo,(x, t) = aoo,(O, t ) . c(x)
with c(0) = 1 and c(xo) = 0,
(60)
where xo is the concentration for which the amplitude uoo2 vanishes. For AI,Ga, -,As with aoo2> 0 for the whole range of concentrations, xo is = -0.28 and obtained by extrapolation. The values are ~ ! j " ~ * ~ = 0 . 2xtkGaAs 2, xCdZnSe 0 = 0.41. The function uooz(x,t)/aoo,(O, t ) is shown in Fig. 36a, b,c for In,Ga, -,As, Cd,Zn, -,Se, and AI,Ga, -,As, respectively. Figure 36 clearly indicates the validity of the approximation in Eq. (60) because the curve is nearly independent of sample thickness t. In the case of Al,Ga, -,As, Eq. (60) can be applied either for Al-contents below 50% or for sample thicknesses below 20nm. Furthermore, we recognize that c(x) can be regarded as a linear function c(x) = c x (x - xo) in good approximation. Now we use Eq. (53) to deduce:
-
=: G
/
-
1 for x=O
The value of G is obtained for x = 0 in the fifth step of the evaluation procedure of Section 111-E by the determination of ( T002!T004)measured from (jJ002//(Ja,04/)measured inside the reference region. The obtained value of G is valid for all In-concentrations x because all factors that contribute to G are independent of x. An error At of the thickness measurement, therefore, affects only S(x, t, cp). Furthermore, it is necessary to understand that it is not the absolute value ) may cause an error of the x-measurement, but only of S(x, t , ~ that deviations of S(x, t , cp)/S(O, t, cp) for different thicknesses are relevant. For the explanation let us assume that the measured thickness t deviates from the real thickness trealby At. In this case, the value ( T 0 0 2 / T 0 0 4 ) m e a s u r e d that is determined from Eq. (61) (using the measured thickness t ) in the GaAs will also deviate from the real value. Let us assume that we now use Eq. (61) for the measurement of x in a region with x > 0. From Eq. (61) it becomes obvious that the determined value of x will be correct if S(x, t , cp)/S(O, t , cp) = S(x, treal,cp)/S(O, treal,cp). Therefore, only deviations of S(x, t, cp)/S(O, t , cp)
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
179
a
-
3. -*
-.
1
InGaAs
FIGURE36. Amplitude uooz(x,1 ) of the (002) beam normalized with rcspect to ao,,,(O, r ) , plotted versus the sample thickness t and the In-concentration .x for: (a) In,Ga,-zAs; (b) Cd,Zn, -.ySe; and (c) AI,Ga,_,As.
180
A. ROSENAUER AND D. GERTHSEN
from S(x,trea,,cp)/S(O,treal,cp) may cause an error of the determination of x. For convenience, we will use the abbreviations (IJOD21/IJ,041)measured =: M and S(x, t, cp)/S(O, t, cp) = S,(x, t , cp) in the following. From Eq. (61) we obtain
From Eq. (62) we obtain the error
In order to estimate the maximum error with respect to cp, we use the phase q,,, that maximizes dS,(x, t, cp)/dt, yielding
Finally, we obtain by inserting S,(x, t, cp) = S(x, t, cp)/S(O, t, cp), multiplying with (x - x,)/x and taking the absolute value
The color-coded map of Fig. 37a shows the result of Eq. (65) for In,Ga, -,As. If we assume a measured thickness of t = 15 nm with an error of At = & 5 nm we obtain an error of Ax/x = 0.007 nm- x 5 nm = 0.04 at an In-content of 6O%, which would lead to x = (60 & 2)%. The plot shows large errors at thicknesses close to 45nm, which can be attributed to the complex behavior of the a,,,(x, t)- as well as the uoo4(x,t)-curves in this region as can be seen in Fig. 25. The error vanishes at an In-concentration of 22% because a,,,(x = 22%) M 0 for all thicknesses. Similar results are obtained for Cd,Zn, -,Se and Al,Ga, -,As as shown in Figs. 37b and c. If only JJoorJ is used for the evaluation instead of (Joo21/lJoo41, Eq. (65) holds if the following definition for S(x, t, cp) is used instead of Eq. (59):
'
S(x,t, q) = J a i 0 4
+ a600 + 2a004aOOO cos(q).
(66)
In Figs. 37d, e, and f the results are plotted for Eq. (65) using Eq. (66) for In,Ga, -,As, Cd,Zn, -,Se and Al,Ga, -,As, respectively.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
181
FIGURE37. Color-coded maps of the relative error A x / x per At = 1 nm uncertainty of the measured sample thickness for (a, d) In,Ga, _.As, (b, e) Cd,Zn, _,Se, and (c, f ) AI,Ga, -,As using (a, b, c) ~ J o o z ~ / ~ and J o(d, o 4e,~f) lJoo2/ for the composition determination. The graphs were computed according to Eq. (65) using the definition of S(x, t, cp) given in Eq. (59) for (a, b, c) and in Eq. ( 6 6 ) for (d, e, f). (See also Plate 21.)
182
A. ROSENAUER AND D. GERTHSEN
IV. APPLICATIONS We describe here some applications of the evaluation techniques outlined in the previous sections. The examples are given in chronological order. The strain-state analysis was first carried out with ZnSe/Cd,Zn, ,Se/ZnSe/ GaAs (001) heterostructures (Section IV-A). A first aim (Section IV-A-1) was verification of the validity of our implementation of the strain-state analysis in the DALI program package. For that purpose, our results were compared with reflection high-energy electron diffraction (RHEED) investigations of as grown MBE samples. The next step was the measurement of the diffusion coefficient for the diffusion of Cd in ZnSe in the temperature range 340-400°C. Then we investigated free standing (Section IV-B-1) and buried (Section IV-B-2) In,Ga, _,As Stranski-Krastanow islands. In free standing islands, an inhomogeneous distribution of the In-concentration was measured where the mean In-content depends on the growth temperature. One of our main interests concerned the transformation of the morphology of the islands, which is caused by overgrowth with GaAs. The results that were obtained by strain-state analysis and the CELFA method provided valuable structural data for interpretation of optical spectra. The high accuracy of the CELFA procedure facilitated the quantitative investigation of the thickness and composition of the wetting layer (Section IV-B-2). A further application of the strain-state analysis that is given in Section IV-C deals with the measurement of displacements at a ZnSe/ZnTe interface that contains an array of misfit dislocations. Morover, first results of recent CELFA evaluations concerning Cd-content fluctuations of CdSe layers in ZnSe are presented in Section IV-D. ~
A. Strain-State Analysis of’Zn, Cd, -,SelZnSe Heterostructures The Cd,Zn, _,Se/ZnSe quantum wells (QWs) were grown and investigated in the Institute for Experimental and Applied Physics, University of Regensburg (Reisinger et al., 1996). The growth was performed on (001)oriented GaAs substrates in a conventional MBE system with elemental sources. The substrates were degreased and etched in standard solutions and then transferred into the MBE chamber where they were annealed for 5 min at 350°C before purging in H-plasma from an RF-plasma discharge source. The deoxidation process runs for 10 min at 300 W RF-power. The substrate temperature was lowered after 5 min to the growth temperature, which was kept constant at TG= 300’C. For the growth of the QWs, Cd is supplied additionally, whereas the Zn and Se fluxes remained unchanged. Growth and composition of the heterostructures was controlled by RHEED. The
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
183
RHEED system consists of a 35 keV electron gun and a fluorescence screen from which the RHEED pattern was scanned with a high-sensitivity TV camera connected to a personal computer. In order to investigate the RHEED oscillations, the time-dependent integral intensity of a rectangular area around the specular spot was measured. The incident angle of the electron beam was 1.2", the accelerating voltage 10kV and the azimuth either [llO] or [IIO]. For the TEM investigations { 110) cross-sectional samples were conventionally prepared. In the final stage of thinning two Art ion guns were used under an incidence angle of 14". During ion milling with a GATAN Dual-Ion mill the sample was kept in a LN, cooled specimen holder. The HRTEM micrographs were obtained with a Philips CM30 microscope equipped with a twin lens. The acceleration voltage was 300 kV and the Scherzer resolution 0.23 nm. 1. lnvestigutions of' As-Grown Srmples by Struin-Stutr Anulysis und RHEED
Five samples were grown with different Cd,Zn, -,Se interlayer thicknesses and Cd-concentrations x. Figure 38a shows two example micrographs of the samples MBE180 and MBE336, which represent the samples with the thinnest and the thickest Cd,Zn, -,Se insertions. The interlayers can be identified as vertically oriented dark bands. Figure 38b represents the displacement components in growth direction of three samples, which were averaged in planes parallel to the interface plane. The regions with increasing displacement values allow the determination of the Q W thicknesses that are given in Table 11. According to Eq. (31), the maximum displacements u, yield the integral Cd-content. The factor cxR of Eq. (31) was calculated according to an analytical solution for the elastic relaxation of the tetragonal distortion in interlayers with homogeneous composition, which was found by Treacy and Gibson (1986). The sample thicknesses were assumed to lie in the range of 10-30nm. Averaged Cd-concentrations x are determined by dividing the integral Cd-contents by the thicknesses in Table 11. The results are shown in Table Ill. The averaged lattice plane distances normalized with the ZnSe lattice parameter are given in Fig. 38c. Tables 11 and 111 also contain the QW thicknesses and Cd-concentrations x, which were determined by RHEED using the following procedure (Reisinger et al. (1996); Rosenauer et al. (1995)): The intensity oscillations (Fig. 39) of the specular spot (SS) are recorded with a high-sensitivity T V camera and a personal computer equipped with a frame grabber. Each period in the RHEED oscillation curve corresponds to the growth of one monolayer (M L). With the preceding experimental parameters, the curve starts with a maximum. Each of the following maxima corresponds to a
184
A. ROSENAUER A N D D. GERTHSEN
FIGURE 38. (a) (110) HRTEM micrographs of the Cd,Zn,-,Se/ZnSe quantum well structures MBEl8O and MBE336; (b) averaged displacements for each monolayer in growth direction; and (c) averaged lattice parameters obtained for the samples MBE180, MBE337, and MBE336. Evaluation results and corresponding RHEED results can be found i n Tables 2 and 3.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
185
TABLE I1 Cd,Zn, ,Se INTERLAYER THICKNESSES d,, DETERMINED BY THE STRAIN-STATE ANALYSIS AND RHEED FOR VARIOUS ZnSe/Cd,Zn, -,Se/ZnSe/GaAs (001) SAMPLES GROWN R Y MBE dcd
/C M Ll
Strain-state analysis RHEED
MBE 180
MBE 216
MBE 335
MBE 336
MBE 337
6.4 2 0.5 6.9 f 0.2
23 f 1 23.4 f 0.6
13.7 f 0.7 14.1 f 0.4
36.9 & 1.6 36.2 If- 0.7
11.3 & 1.0 12.0 f 0.3
TABLE I11 COMPARISON OF THE. Cd CONCENTRATIONS xCdOBTAINED BY STRAWSTATE ANALYSIS WITH RHEED MEASUREMENTS FOR THE SAMPLES THAT ARE LISTEDIN TABLE11
.ki/[%l Strain-state analysis RHEED
MBE 180
MBE 216
MBE 335
46 & 4 45 3
61 k 5 59 4
38 3 42 f 4
11*
TCdZnSe
i
0
+
20
40
MBE 336 29 29
+5 k4
MBE 337 31&3 31 f 4
60
Growth time / [ s ] FIGURE 39. Specular-spot intensity plotted versus the growth time for (upper curve) ZnSe and (lower curve) a Cd,Zn,_,rSe quantum well. The exponential decay as well as the decreasing oscillation period with increasing growth time of the lower curve indicate a broadening of the transition region between ZnSe and Cd,Zn, -.Se.
186
A. ROSENAUER AND D. GERTHSEN
further complete ML. After the growth of a Cd,Zn, -,Se QW it is observed that the barrier grows faster, that is, the time interval between two maxima is shortened compared to the deposition of the pure ZnSe buffer layer. This behavior is connected with the appearance of an alloy mixing. In the present case Cd,Zn, -,Se is formed where x decreases with increasing number of monolayers. The local Cd-concentration is determined by using the change of growth rate. The exact time period TznSeof the ZnSe growth is obtained from the RHEED oscillation curve of the ZnSe-buffer.This curve (upper curve in Fig. 39) is fitted by
where Zcos(t) describes a damped cosine oscillation with the frequency 1/T The damping is due mainly to a successive increase of the surface roughness. In Eq. (67) Ie...,(t) is an exponential function that takes into account the change of intensity when different sources are offered due to different atomic scattering factors of the elements. From the work of Gaines and Ponzoni (1994) we use the relation
for the determination of the Cd-content x where 1/7&ZnSeindicates the growth rate of the ternary and l/Tznsethat of the binary compound. Note that Eq. (68) is applied to each single monolayer j . Using the detection of RHEED oscillations during growth, the computer control of the epitaxial process enables the growth of QWs with monolayer accuracy. Comparing the results of the strain-state analysis with the RHEED measurements in Tables 2 and 3 reveals a good agreement. Figure 40 shows the mean displacements and lattice parameters obtained for the sample MBE216. In Fig. 40b a subdivision of the region with mean lattice parameters (normalized with the ZnSe lattice parameter) larger than I into both transition regions and the QW region is shown. In the following we will estimate the amount of Cd contained in the transition regions, which can be regarded as a measure of the interfacial roughness. The number of MLs Cd,Zn, -,Se with a composition corresponding to the mean lattice parameter A of the Q W region (Fig. 40b) is obtained by
For the sample MBE216 we obtain NCd,Zn,-xse = 1.62/0.074 [ML] = 21.9 [ML]. As shown in Fig. 40b, the QW region consists of 19 MLs. Therefore, 21.9 - 19.0 = 2.9 MLs Cd,Zn, -,Se are contained in the transition regions. Each interface represents a transition region containing about 1.5 ML
187
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
3-
0
10
20
30
40
50
60
7 region
\transition I
0
I
10
region7 I
#
I
20 30 40 50 Numberj of (002)plane
I
60
FIGURE 40. (a) Averaged displacements and (b) averaged lattice parameters plotted versus the number of the (002) plane in growth direction; umarand - 1 as well as the thickness of the transition regions are used to determine the amount of Cd contained in the transition regions.
a
Cd,Zn,-,Se. With the Cd-concentration of 61% given in Table I11 we obtain an amount of Cd inside each of the two transition regions corresponding to 0.9 ML CdSe. Table IV gives an overview of the Cd-contents of the transition regions of the five investigated samples reaching from 0.2 to 0.9 ML. However, it is appropriate to note that these values should be regarded as estimates because the subdivision into QW region and transition region is somewhat arbitrary. In conclusion we can give a mean value of 0.4 ML CdSe per transition region.
188
A. ROSENAUER A N D D. GERTHSEN TABLE IV REGION BETWEEN THE Cd,Zn, -.Se CdSe CONTENTI N EACHTRANSITION THE ZnSe MATRIX MBE 180 _____~
AND
MBE 216
MBE 335
MBE 336
MBE 337
0.9
0.4
0.45
0.2
~
Content [ML]
0.23
2. The Determination of Cd Diffusion in CdSelZnSe Single Quuntum Wells
A ZnSe/CdSe/ZnSe/GaAs (001) sample was grown by MBE as described in the preceding. The deposited amount of Cd was equivalent to 2 M L CdSe. The ZnSe/CdSe/ZnSe thickness was about 60nm, which is well below the critical thickness for the generation of misfit dislocations as seen in Rosenauer et al. (1996). The sample was cleaved into four pieces. The pieces were annealed at different temperatures (337"C, 3 6 7 T , 382°C and 394°C) for 1 h in an N,-atmosphere and then rapidly cooled to room temperature (RT). After annealing, the samples were prepared for TEM. The Cd-concentration profile shown in Fig. 41, detected in sifu with RHEED as described in Section IV-A-1, represents the profile of the cis grown sample. Figure 41 clearly shows that a 6-ML thick mixed crystal is obtained instead of a 2-ML thick CdSe interlayer. Figure 42a, b, and c shows HRTEM images of the specimens that were
I
'
' I T-
'
n
Y
\ -cI
c
J
Q)
+-,
C
0
0
73
,577
0
1
2
3
4
5
6
number of monolayer j
FIGURE41. Cd-concentration profile in growth direction evaluated from RHEED data for the nominally 2 M L thick CdSe quantum well embedded in a ZnSe matrix (Rosenauer er a/., 1995).
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
ZnSe CdxZn,,Se
189
ZnSe
FIGURE 42. (110) HRTEM lattice images of the interlayer regions of the samples annealed at (a) 337°C. (b) 367°C and (c) 394°C (Rosenauer c't ul., 1995).
annealed at 337"C, 367'C, and 394°C respectively. Figure 43 gives the displacement curves that were measured with the DALI program package. Obviously, the displacement curves reveal a significant broadening of the QW region that increases with annealing temperature. In Fig. 43, the 337°C plot also shows the displacement curve, which is directly evaluated from the RHEED data of Fig. 41. Good agreement with the strain-state analysis suggests a negligible diffusion of Cd at an annealing temperature of 337°C.
190
A. ROSENAUER AND D. GERTHSEN I
I
-
0.2 - 337°C
0.1 0.0
-
- Fit curve u(z)
-
DEva'=(0.16~0.08)1 0 "cm2 substrate I
I
I
I
I
0.0 I
I
-30
I
I
-20
I
I
I
I
I
-1 0
0
10
20
I
(002) plane numberj FIGURE 43. Averaged displacements plotted versus the (002)-plane number for the different annealing temperatures. The upper curve represents the displacement curve expected for the concentration profile shown in Fig. 41 that corresponds to a diffusion coefficient close to zero. The given values for the diffusion coefficients DE""' are directly determined from the 1995). experimental data by a fit of Eqs. (77) and (78) (Rosenauer et d.,
PLATE 23. Schematic drawing on the origin of the red regions in Fig. 50. The region with increased In-concentration induces a bending of the horizontal lattice planes corresponding to an enlarged displacement component in growth direction.
PLATE 24. Color-coded maps of the local In-concentration of (a) the wetting layer and (b) an
island.
PLATE25. (a) Finite element model for the specimen region evaluated in Fig. 50. The thickness in the In,Ga,.,As region is obtained by an extrapolation of the thicknesses of neighboring GaAs regions. The light-blue grid shows the finite elements. The shape of the model also contains the influence of the displacement field (bowing of the surface of the InXGal,As) amplified by a factor of 20. The colors correspond to the displacement vector field component in growth direction. The legend is given in (b), which shows displacement vector field components evaluated from a projected 3-D atomic model that was deduced from the results of the FE-calculation shown in (a).
PLATE26. { 110) HRTEM image and gridding (red) of the interface between the ZnSe and ZnTe of a heterostructure grown on GaAs(001) with MOVPE. The Burgers circuits around the misfit dislocations are marked in green indicating three Lomer (LO) and one 60" dislocation.
. . ......................... ... ' . . . . . . . . !....... .... . ..L....... . .. ............ .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................................ .................... ................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................................... . . . . . . . . . . . . .............. ............ . . . . . . . . . . . . .. .............................................. .............. .............................. ....... .............. .............................................. ............................................ ............................................... ................................................ .................................................. .............................................. ................................................. ................................................ ................................ .............. .............................................. ................................................. ................................................ ...... _.. ........................................ ....................................... ....... . . . _ ---_.. ......................................... ....-.--*,........ .......................... ................................ .......... - - ......
............... . ...*.*............ ,,,,,
.................. ..................../,,..... ~---.......,....,,,.,,,,,,', .--......... ..,.\..,,,,,.,,
, . \ . . . . . . , . , I . . . . . . . . . .
e,,,,,.......
2ao.o
# . . C 1.-
400.0
(#Qo
Distance in 25) indicates an overall Cd content that is equivalent to 0.97 M L CdSe.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
219
FIGURE 65. Amplitude of the (002) reflection in the diffractrogram plotted versus the image number of the images of a defocus series. The values J,,, and Jmi, are used to derive the specimen thickness. The corresponding crystal region was chosen inside the ZnSe buffer layer yielding an upper limit of the thickness.
growth direction. Fitting Eq. (85) to the experimental profile yields A = 0.89 ML CdSe and w = 5.19 ML. To investigate the influence of the cry: i1 tilt as it relates ) the shape of the real concentration profile, we first have to find a function X , that describes the real profile. Furthermore we will approximate the projected profile X;lted of the tilted specimen, which will then be compared with the experimental profile. Using this procedure, we will determine which “real” profiles X , lead to matching profiles in the case of a tilted crystal. For this purpose, the functional relationship
previously used in Eq. (72) seems to be well suited. Parameter P describes the sharpness of the transition regions of the profile, which proceeds to a Heaviside-function with broadness b for P 40. In this case, the function constitutes the “ideally sharp” profile with a homogeneous Cd-content of 89% and a width of b MLs, which is chosen as b = 1 ML to be consistent with the experiment. Parameter A is equal to the area below the curve (as
220
A. ROSENAUER AND D. GERTHSEN
in Eq. (85)). Therefore, parameter P can be varied without changing the area below the curve, which is identical with the entire deposited amount of CdSe. Note that the entire amount of CdSe is known from Fig. 63 because it cannot be affected by crystal tilt. Figure 67a shows the profiles obtained from Eq. (86) for some different values of the parameter P . The next step is the approximation of the crystal tilt sketched in Fig. 66. The crystal of thickness T in electron beam direction is decomposed into slices of thickness dt, which are shifted in such a way that the projected profile approximates (and for dt -+ 0 is identical with) the projected profile of the tilted sample. This transformation of the profile is described by 1
xlilted
PT
X,(z - ( t - T/2)tan Cp, P ) dt.
(87)
Figure 67b depicts the result of Eq. (87), which was calculated with 4 = 4” for a specimen thickness T = 15 nm and for the same set of parameters P that were used for the computation of Fig. 67a. Figure 67b also contains the
FIGURE66. Schematic drawing that shows the approximation that is made for the calculation of the concentration profile of the tilted specimen.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
22 1
Numberjof (002)-plane 1
-
1
'
1
'
1
'
1
~
1
.
1
.
1
.
Parameter P=
80
.....
-
0.001
-
560 c a
*
$40
Experiment
E
8
8 20 0 -10
-24
-10
'
I
-8
-4 -2 0 2 4 Number) of (OOZ)-pIane
-6
-8
'
t
-6
'
I
'
I
'
I
'
I
'
I
-4 -2 0 2 4 Number] of (002)-plane
6
'
I
6
8
'
I
8
,
I
10
FIGURE67. (a) Cd-concentration profiles computed according to Eq. (86) for different width parameters P under the condition of a constant area equivalent to 0.89 M L CdSe (in accordance with the dashed profile shown in Fig. 63) below the profile curves; (b) Cdconcentration profiles of the tilted specimen with a thickness of I5 nm computed according to Eq. (87) with the same values for P as in (a); (c) difference between the curves in (b) and (a) for P 3 0.5.
222
A. ROSENAUER AND D. GERTHSEN
experimental concentration profile that is well fitted by the curve with P = 1.82. Obviously, the profile with the sharpest transition regions that corresponds to P = 0.001 is most affected by the tilt. Figure 67c shows the difference X , - XgJfedfor the other curves with P 2 0.5. One clearly sees that the difference decreases with increasing broadness of the transition region. For P = 1.82 with the best fit to the experimental profile, the difference becomes negligible, We therefore have to conclude that the profiles shown in Fig. 63 are not affected by crystal tilt. First, profiles with sharper transition regions do not fit the experimental data because their height is increased due to the fact that all curves have to enclose areas of the same size, a consequence of the known amount of CdSe deposited in all. Second, the effect of the crystal tilt can be neglected for profiles corresponding to a parameter P > 1 at conditions q5 < 4" and T d 15 nm. V. SUMMARY AND DISCUSSION OF THE ATOMICSCALE-ANALYSIS METHODS
This work, concerned with two methods to evaluate the composition of ternary sphalerite structure crystals, relied on different sources of information of the HRTEM micrograph. The first procedure reviewed was strainstate analysis. It exploits the different lattice parameters of layer and substrate/buffer in strained layer heteroepitaxy. This procedure has to be regarded as an indirect method of composition evaluation because the strain state of the strained layer under investigation is influenced by several factors of which a researcher must be aware. First, the tetragonal distortion of a pseudomorphically grown structure depends on specimen thickness due to the very small values < 20 nm that are necessary in HRTEM. In indefinitely thin specimens, the biaxial strain of the bulk sample is reduced to a uniaxial strain state. The amount of strain relief itself, however, does not depend only on the sample thickness but is also influenced by the concentration profile of the strained layer. This is due to the fact that different Fourier coefficients of the profile show a different relaxation behavior of the tetragonal distortion. To take this effect into consideration one has to know the local sample thickness, obtained by application of the QUANTITEM procedure to the investigated micrograph. The next step is generation of a hypothetical model of compositional morphology of the specimen, which then is the basis for a finite element calculation. The resulting displacements of the FE calculation then are used to compute a 3-D atomic model. To simulate the imaging process the displacements are averaged along atomic rows in the electron beam direction. The resulting 2-D grid is again evaluated with the DALI procedure analogously to the experimental image. Comparison of the simulated
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
223
and experimental displacements gives an indication as to how the guessed input model of the FE calculation has to be modified. In this way, the model is improved in an iterative way until the experimental and simulated displacement vector fields show sufficient agreement. As an alternative test, HRTEM image simulations can be performed on the basis of an atomic model deduced from the FE simulations as shown in Section IV-B-2 (Fig. 55). The sources of error for this analysis procedure are as follows: First of all, detection of the local brightness maxima in the HRTEM image has an accuracy of about 0.2 pixels. In TEM micrographs that are obtained for a (1 10) direction of the incident electron beam, the position of the brightness maxima of a dumbbell that cannot be resolved at a resolution 0.15 nm depends on both sample thickness and defocus. Therefore, a (100) projection seems to be preferable to the (1 10) projection where each spot in the HRTEM image stems from one column of atoms. However, the spacings between the projected rows is small (0.28nm for GaAs) and hence the contrast pattern depends strongly on variations in defocus, sample thickness, and orientation, all of which leads to small areas of the image that can generally be evaluated. Most of the examples of the strain-state analysis presented here are therefore based on the (1 10) orientation in this review. The application of the (100) projection seems promising if the TEM resolution increases and the sample preparation techniques are improved vis-a-vis the wedge shape of the specimen as well as the smoothness of its surfaces, which creates smaller variations in thickness in the electron beam direction. A third source of error whose effect becomes obvious mainly at interfaces with sharp chemical transitions is the delocalization that is given by Thust et al. (1996) and Lichte (1991)
where C, is the spherical aberration constant, ithe electron wavelength, and Af the defocus. Equation (88) represents a spatial delocalization imposed by the microscope on a certain spatial frequency 9. The delocalization is minimized for a particular g at a defocus of Af = - C,i2g2. The second method that has been outlined in this review is composition evaluation by the lattice fringe analysis (CELFA) procedure. This method uses a (000), (040), and (020) 3-beam imaging condition with the chemically sensitive (020) reflection centered on the optical axis. This condition is simple and hence nonlinear imaging can be solved analytically. Furthermore, all free parameters of the imaging can readily be extracted from a defocus series and the electron microscope parameters do not need to be
224
A. ROSENAUER AND D. GERTHSEN
known. Analysis of the series leads to values for the sample thickness t in electron beam direction and the defocus dependent phases 4,, of the oscillation of JJo,,l with advancing image number n. It was shown in Section 111-G and Fig. 37 that errors of thickness determination only weakly influence the evaluated concentrations. Moreover, a variation of either imaging conditions or specimen thickness throughout the image can be corrected. Two modes are possible for the evaluation. First, the amplitude ratio (Joo2j/~Joo,l of image cell diffractograms can be used. This method has the advantage that adsorbates at the surfaces of the specimen that lead to a modification of the electron wave function do not influence the measurement. Second, it is possible to exploit only the amplitude lJ,,,(, which offers the advantage of better accuracy. This is due to the fact that only errors of the determination of 1Joo21 are relevant for the measurement contribute in the first case. whereas the errors of both lJoo21 and [.lOo4J Furthermore, the error introduced by an uncertainty of the thickness determination influences the concentration determination to a smaller degree as shown in Fig. 37. In pseudomorphically grown strained heterostructures, the lattice fringes perpendicular to the interfaces must be used for evaluation because the contrast pattern may be influenced by a variation of fringe distance. For that purpose, the crystal has to be tilted around an axis parallel to the interface plane. To keep the induced blurring of the interface composition profile small, the tilt angle has to be kept below 4”. In this case, the considerations in Section IV-D revealed that the tilt noticeably influences the measured concentration profile only for extremely sharp interfacial transitions. This effect is supported by the fact that the measurement of the entire deposited amount of interlayer material, which is equivalent to the area underneath the concentration profile, does not depend on the crystal tilt. The CELFA procedure seems to be an efficient and accurate image evaluation tool both because the electron microscopist does not need to know any of the imaging parameters and because conventional comparisons between experimental and simulated images are not necessary. Moreover, the accuracy of the evaluation is high and the method is applicable to a whole variety of compound semiconductor systems where Cd,Zn -,Se, In,Ga,-,As, and Al,Ga, -,As are only a few examples.
ACKNOWLEDGMENTS This work is supported by the Volkswagen Stiftung under contract number 1/71 014. The authors would like to thank U. Fischer, S. Kaiser, T. Reisinger,
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
225
and T. Remmele for contributing valuable results within the scope of their diploma theses and dissertations. We also would like to express our thanks to A. Forster (Institute for Thin Film and Ion Beam Technology of the Research Center, Julich) for MBE growth of the InGaAs samples, N. N. Ledentsov and his group at the Ioffe Institute (St. Petersburg) for providing CdZnSe specimens, W. Gebhardt, H. Stand, and B. Hahn for the growth of the MOVPE samples, and J. Zweck for many valuable discussions and remarks.
APPENDIXA: LISTOF VARIABLES
In the following, a list of variables is given that does not include abbreviations that are very limited in scope.
u lattice parameter of the bulk material u l l lattice parameter purnllel to the interface plane and perpendiculur to the electron beam lattice parameter parallel to the interface plane and parallel to the electron beam a, lattice parameter perpmdicular to the interface plane ukc lattice parameter perpendicuhr to the interface plane in the “substrate” that is a binary material BC a, b, c, d vectors pointing from the center of the image unit cell Z to its corners a’,h’, c’. d vectors pointing from the center of the image unit cell Z’ to its corners uoo, real amplitude of the (OOj) beam a , origin of the grid of lattice positions a 1 , 2 lattice base vectors af lattice parameter of the film a, lattice parameter of the substrate uALB,.,c lattice parameter of a ternary material A,B, -,C uAC lattice parameter of a binary material AC A area below the concentration profile in Section IV-C A , B,C , D, E fit parameters in Eq. (54) A’s2 antineighbor position A, “area” defined for the decomposition of the Fourier-transformed image tlthermal virtual “thermal” expansion coefficient used for the FE calculation
uy
226
A. ROSENAUER A N D D. GERTHSEN
angle of the grid lines of set (1) or (2) with the x- or the y-axis b Burgers vector l~~~ Burgers vector of a 60" misfit dislocation bLomer Burgers vector of a Lomer misfit dislocation hl,2 orthonormal basis of the plane E of the QUANTITEM image vectors B , "block" defined for the decomposition of the Fourier-transformed image C Fourier-transformed image C , components of the elastic tensor C,, integral AC-content of a ternary material A,B,-,C in units of CMLl C, spherical aberration constant xf wave aberration due to the defocus xs spherical wave aberrations zn xf xs for the image n of a defocus series d j sample thickness corresponding to the image unit cell Zi dl,2 direction of grid lines D ( T ) diffusion coefficient in dependence of the temperature T in Section IV-A-2 6 accuracy of the position detection, calculated from deviations inside the reference lattice S(Af) defocus change corresponding to a full oscillation of /J(goo2)l Ai+j local lattice distance corresponding to the position PiTj AStepsize(Af) stepsize of the defocus change between adjacent images of a defocus series A mean lattice parameter in growth direction in Section IV-A-1 (see p. 000) E plane of QUANTITEM image vectors f misfit fA atomic scattering factor of the atom A modulation frequency of the input signal in the log in-technique ,fmod F(g) complex amplitude of the beam that corresponds to the reciprocal lattice vector g Af objective lens defocus 4 Wiener filter and tilt angle in Section IV-C 'pi, Qi angles describing a position on the QUANTITEM ellipse corresponding to the image unit cell 2: qn phase of JJ(g,,,)J in dependence of the defocus for the image n of a defocus series g vector of the reciprocal lattice g,.,.,,, factors with values - 1,0,1 defining the contribution of the a:::
+
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
227
distance vectors (Eq. (12)) I intensity J ( g h , k , l ) or J , h k [ Fourier coefficient of the diffractogram corresponding to the (h, k, l ) reflection JmaxrJmin maximum and minimum of the oscillation of JJ(goo2)l in dependence of the defocus Li line i 2 electron wave length M , M' midpoints of the cells Z and Z' in Section 11-B-1 M",2t3' lattice positions used for the position detection procedure N 2 dimension of the image vector space of the QUANTITEM procedure in Section 11-B N noise part of the Fourier-transformed image N i S 2 neighbor position NRef.Rnumber of lattice positions inside the reference lattice N,, number of sublattices of the grid of positions NCd,Zn,-,Senumber of MLs contained in the Cd,Zn, -,Se interlayer v Poisson's ratio pooj phase of the (OOj) beam P parameter describing the abruptness of the transition region of the Cd-concentration profile in Section IV-C, defined in Eq. (86) P , . . . P, some parameters P i , j latice position with indices i, j Q direction on which displacements and distance vectors are projected 0 offset angle of the QUANTITEM procedure corresponding to a sample thickness -,0 R delocalization in Section V R image vector of the QUANTITEM procedure Rfs2,3 template image vector of the QUANTITEM procedure Ri,j reference lattice position corresponding to the position Pi,j R, image cell of the QUANTITEM procedure that contains only one gray level c Sinputtest signal Soutpu,response signal SFh,, structure factor of the beam with Miller indices h, k, E CT standard deviation of the intensity maxima determination t duration of the annealing in Section IV-A-2 T oscillation period of the RHEED signal in Section IV-A-1 and annealing temperature in Section IV-A-2 as well as the crystal thickness in electron beam direction in Section IV-C T image vector corresponding to ReValinside the plane E component of the evaluated image vector T parallel to the plane E
228
A. ROSENAUER A N D D. GERTHSEN
Tival component of the evaluated image vector T perpendicular to the plane E T ( g + h , h; Af) transmission cross coefficient of the nonlinear image formation ATval value used to estimate the reliability of each vector TI,?,, AT virtual heating temperature used for the FE calculation ui., displacement vector corresponding to the position P,.J u,, maximum displacement measured on top of a strained layer w width of the Gaussian curve in Eq. (85) wB,,, weighting factor of the block B,,, x composition or spatial coordinate x, positions of atomic rows in Section IV-C (?,,, jet, 2,,) coordinate system associated with the elastic constants (igeom, igeom, coordinate system associated with the geometry of the FE model X(z, t , T ) Cd-concentration profile in dependence of the distance z in growth direction, the heating duration t and the temperature T in Section IV-A-2 Xslngle(z, t , 7') Cd-concentration profile of an ideal interlayer of 1 ML CdSe in Section IV-A-2 X,(z) real concentration profile in Section IV-C X:l'ed(~) real concentration profile of the tilted specimen in Section IV-C 5 extinction distance of the undiffracted beam y coordinate z coordinate in growth direction Z irregularly shaped image unit cell 2' quadratic image unit cell
REFERENCES ABAQUS 5.5, Hibbitt, Karlsson & Sorenson Inc. Aebersold, J. F., Stadelmann, P. A., and Rouviere, J.-L. (1996). UltiumicloscopJ: 62: 171 189. Baba, N. and Kanaya, K. (1989. Reseurcli Reports of Koyctkuin Uiriuersity, 66: 97-101. Bauer, S., Rosenauer, A., Link, P., Kuhn. W., Zweck, J., and Gebhardt, W. (1993). Ultrcimicroscopy, 51: 221 -227. Bauer, S., Rosenauer, A., Skorsetz, J., Kuhn, W., Wagner, H. P., Zweck, J., and Gebhardt, W. (1992). J . C r p t . Growth, 117: 297-302. Bierwolf, R., Hohenstein, M., Phillipp, F., Brandt, O., Crook, G. E., and Ploog, K. (1993). Ultmmicroscopy, 49: 273-285. Brandt, O., Ploog, K., Bierwolf, R., and Hohenstein, M. (1992). P/iyx. Rev. Lett., 68: 1339-1342. Y , 64: 3617. Christiansen, S., Albrecht, M., Strunk, H. P., and Maier, H. J. (1994). AppL P / I ~ . Lett., ~
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
229
Fortini, A. and Brault, M. (1990). Revue Phys. Appl., 2 5 1037-1047. Gaines, J. M. and Ponzoni, C. A. (1994). SurJirce Sci., 310: 307. Gerard, J.-M. and Marzin, J.-Y. (1992). P/7ys. Reu. B 45: 6313. Hirth, J. P. and Lothe. J. (1968). Theor.v of Dis1octrtion.s. New York: McGraw-Hill. Hull, R. and Bean, J. C. (1992). Criticul Revitws in SoIicI Srcrt~nncl Muteriuls Sciences, 176 507. Ishizuka, K. (1980). Ulrrumicroscopy, 5: 55-65. Jia, C. L., Thust, A., Jacob, G., and Urban, K. (1993). Ultiumicroscopy 49: 330-43. Jouneau. P. H.. Tardot, A., Feulliet, G . , Marietta, H., and Cibert. J . (1994). J. Appl. Phys., 75: 7310. Kisielowski, C., Schwander, P., Baumann, F. H., Seibt. M., Kim, Y. O., and Ourmazd, A . (1995). Ultrci,nieio.seopy, 58: 13I 155. Lichte, H. (1991). Ultrrrmicroscopy, 3 8 13. Maree, P. M. J., Barbour, J. C., van der Veen, J. F.. Kavanagh. K. L., Bullelieuwma, C. W. T., and Viegers, M. P. A. ( I 987). J. A p p l . Phys., 62( 1 1): 44 13. Marks, L. D. (1996). Ultrurnicroscopy, 62: 43-52. Martin, W. E. (1973). J . Appl. Phys., 44: 5639. Maurice, J.-L., Schwander, P., Baumann. F. H., and Ourmazd, A. (1997). Ultrumicroscopj, 68: 149-161. Mazzer, M., Carnera, A., Drigo, A. V., and Ferrari, C. (1990). J . Appl. Phys., 68(2): 531-539. McTempas by Total Resolution, Berkeley, CA. Moison. J. M., Guille, C., Houzay, F., Barth, F., and Van Rompay, M. (1989). Phys. Rev. B 40 6149. NCEM Simulation System. The National Center for Electron Microscopy, Lawrence Berkeley Laboratory, Berkeley, CA. ODonnell. K. P. and Woggon, U. (1997). Appt. P/IJX Lett., 70: 2765. Ourmazd, A,, Baumann, F. H., Bode, M., and Kim, Y. (1990). Ultrurvicroscopy, 34: 237-255. Ourmazd, A,, Schwander, P.. Kisielowski, C., Seibt, M., Bauniann, F. H., and Kim, Y. 0.(1993). I n s t . Phys. Car$ Ser., 134: Section 1, 1-10. Ourmazd, A,, Taylor, D. W.. Cunningham, J., and Tu, C. W. (1989). P/i,ys. Rev. Le/t., 62: 933. Paciornik, S., Kilaas, R., and Dahmen, U. (1993). Ultrumicroscopg, 50: 255-262. PATRAN 6.0, MacNeal-Schwendler Corporation. Press, W. H., Vetterling, W. T.. Teukolsky, S. A., and Flannery, B. P. (1992). Numericrrl Recipes in C. pp. 547-549. Cambridge: Cambridge University Press. Reisinger, T., Lankes, S.. Kastner, M. J., Rosenauer. A,, Franzen, F., Meier, M., and Gebhardt, W. (1996). J . Cryst. Growth, 159 510-513. Robertson, M. D., Curie, J . E., Corbett, J. M., and Webb, J. B. (1995). Ultrumictosropy, 58: 175. Rosenauer, A,. Fischer, U., Gerthsen. D., and Forster, A. (1998). Ultrrrmicroscopy, 72: 121- 133. Rosenauer, A., Kaiser, S., Reisinger, T., Zweck, J., Gebhardt. W., and Gerthsen, D. (1996). Optik, 102: 63-69. Rosenauer, A,, Fischer, U., Gerthsen. D., and FBrster, A. (1997). AppL Ph.v.~. Lett., 71: 3868-3870. Rosenauer, A,, Reisinger, T.. Franzen, F., Schiitz, G., Hahn. B.. Wolf, K., Zweck. J., and Gebhardt. W. (1996). J . Appl. Phys., 79(8): 4124-4131. Rosenauer, A.. Reisinger, T., Steinkirchner, E., Zweck, J., and Gebhardt. W. (1995). J . Cryst. Growth, 152: 42-50. Saxton, W. 0.. Pitt, T. J., and Horner, M. M. (1979). Digital image processing: The Semper system, Ultrumicro.scopy, 4 343; Semper 6 by Synoptics Ltd., Cambridge, UK. Schuhrke, T., Mindl, M., Zweck, J., and HolFmann, H. (1992). Ulrrtmzicroscopy. 4 5 41 1-415. Schwander, P., Kisielowski. C., Seibt, M., Baumann, F. H., Kim, Y., and Ourmazd, A. (1993). Phys. Rev. Lett.. 71: 41 -~50. -
230
A. ROSENAUER A N D D. GERTHSEN
Seitz, H., Seibt, M., Baumann, F. H., Ahlborn, K., and Schroter, W. (1995). Phys. Stut. Sol. (a), 150: 625. Stadelmann, P. A. (1987). A software package for electron diffraction analysis and HREM image simulation in materials science, Ultramicroscopy, 51: 131-145. Stenkamp, D. and Jager, W. (1993). Ii1st. Phys. Car$ Ser. 134 Section I , 15-20. Stenkarnp, D. and Strunk, H . P. (1996). Appl. Phys. A 62: 369-372. Stranski, L. N. and Krastanow, L. (1939). Akad. Wiss. Wien. Muth.-Nuturwiss. K l lib 146, 797. Thust, A,, Coene, W. M. J., O p de Beeck, M., and Van Dyck, D. (1996). Ultramicroscopy, 64: 21 1-230. Tillmann, K., Thust, A., Lentzen, M., Swiatek, P., Forster, A,, Urban, K., Gerthsen, D., Remmele, T., and Rosenauer, A. (1996). Phil. Mug. Lett., 7 4 309. Treacy, M. M. J. and Gibson, J. M. (1986). J . Vuc. Sci. Teclinol., B 4(6): 1458-1466. Volmer, M. and Weber, A. (1974). Z. Phys. Chem., 119: 118. Woggon, U., Langbein, W., Hvam, J. M., Rosenauer, A,, Remmele, T., and Gerthsen, D. (1997). Appl. Phys. Lerr., 11: 377.
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOL 107
Hexagonal Sampling in Image Processing R . C . STAUNTON Depcirtment of Engineering. Universrly of Wuruwk Coventry CV4 7 A L UK
.
.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A . Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . B . Processor Architectures . . . . . . . . . . . . . . . . . . . . . . C. Binary Image Processing . . . . . . . . . . . . . . . . . . . . . D . Monochrome Image Processing . . . . . . . . . . . . . . . . . . 11. Image Sampling on a Hexagonal Grid . . . . . . . . . . . . . . . . . A . The Hexagonal Packing of Sensory Elements in the Eye . . . . . . . . B. Hexagonal-Shaped Sensor Elements . . . . . . . . . . . . . . . . . C. Two-Dimensional Sampling Theory . . . . . . . . . . . . . . . . . D . Noise and Quantization Error . . . . . . . . . . . . . . . . . . . E. Practical Aspects of Digital Image Acquisition . . . . . . . . . . . . F . Measurement of Two Dimensional Modulation Transfer Function and Bandlimitshape . . . . . . . . . . . . . . . . . . . . . . . . I11. Processor Architecture . . . . . . . . . . . . . . . . . . . . . . . A . Single Instruction, Single Data Computers (SISD) . . . . . . . . . . . B. Parallel Processors . . . . . . . . . . . . . . . . . . . . . . . C. Two- and Multidimensional Processor Arrays . . . . . . . . . . . . D . Pyramid Processors . . . . . . . . . . . . . . . . . . . . . . . E. Pipelined Processors . . . . . . . . . . . . . . . . . . . . . . . F . Hexagonal Image-Processing Pipelines . . . . . . . . . . . . . . . IV . Binary Image Processing. . . . . . . . . . . . . . . . . . . . . . . A . Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . B. Measurement of Distance . . . . . . . . . . . . . . . . . . . . . C. Distance Functions . . . . . . . . . . . . . . . . . . . . . . . D. Morphological Operators . . . . . . . . . . . . . . . . . . . . . E . Line Thinning and the Skeleton of an Object . . . . . . . . . . . . . F . Comparison Between Hexagonal and Rectangular Skeletonization Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . V . Monochrome Image Processing . . . . . . . . . . . . . . . . . . . . A . The Hexagonal Fourier Transform . . . . . . . . . . . . . . . . . B. Geometric Transformations . . . . . . . . . . . . . . . . . . . . C. Point Source Location . . . . . . . . . . . . . . . . . . . . . . D . Image-Processing Filters . . . . . . . . . . . . . . . . . . . . . E. Edge Detectors . . . . . . . . . . . . . . . . . . . . . . . . . F . Hexagonal Edge-Detection Operators . . . . . . . . . . . . . . . . G. The Visual Appearance of Hexagonal Edges and Features . . . . . . . VI . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .
232 232 234 235 235 236 236 238 239 245 245 253 259 259 262 263 266 271 275 279 279 280 280 281 281 282 289 289 289 290 290 292 293 294 299 302
231 Volume 107 ISBN 0-12-014749-1
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright g f i I999 by Academic Press All rights of reproduction in aiiy form reserved . ISSN 1076-5670/99 530.00
232
R. C. STAUNTON
I. INTRODUCTION This chapter argues the case for the hexagonal sampling of images. Historically, square sampling has always predominated even though at each stage in the development of digital image processing over the last thirty years good hexagonal alternatives have been advanced. Advantages have been shown for the hexagonal scheme in sampling efficiency, processing algorithms and parallel processors; these will be discussed in the following sections. A . Sumpling
Classically, the brightness at a point in a continuous two-dimensional (2-D) field, 6 = f ( x , y) where x and y are the horizontal and vertical distances of the point from the origin. This field can be considered to be sampled by a grid of delta functions to produce a spatially discrete set of brightness values, and these brightness values themselves can be discretized to form what is usually considered to be a digital image (Gonzalez and Woods, 1992). Figure 1 shows two of many possible regular grids of delta functions that can be used for spatial sampling. The vertical spacing of each has been chosen to be identical. If the sampled brightness is included as an orthogonal vector at each point then a digital image is formed. This image may be viewed on a TV monitor if the 2-D space is tiled by picture elements (pixels), where each pixel is associated with one sampled value and is filled with the sampled brightness value. Referring to Fig. la, if t , = t 2 , then the image may be completely tiled by square pixels; in Fig. lb, if t , = 2 / 3 t , , then the image may be completely tiled by regular hexagons or rectangles. The hexagonal tiling shown in Fig. 2 has resulted in the term “hexagonal sampled image.”
a
*
*
*
I- ,,-I
t tJ
FIGURE1. Sampling grids: (a) square; (b) hexagonal.
HEXAGONAL SAMPLING I N IMAGE PROCESSING
233
a
b
C
FIGURE2. Image tilings: (a) square grid with squares tiles; (b) hexagonal grid with hexagonal tiles; (c) hexagonal grid with rectangular tiles.
Various aspects of hexagonal sampling are investigated in Section 11. These include the evolution of hexagonal arrays of biological sensors, the effect of the choice of sampling grid and tiling on the appearance of the image, and the light-gathering properties of the sensor. To show an advantage of one sampling scheme over another in terms of the information
234
R. C. STAUNTON
content, the bandlimiting of the image by the sensor system must be investigated. An example of the measurement of the bandlimiting characteristics of some TV camera-frame grabber systems is given at the end of the section.
B. Processor Architectures Much image processing is accomplished using single-instruction singledatum (SISD) computers (Flynn, 1966). A single-processor PC is an example of such a computer. Here, images are stored in semiconductor memory, disk files, or during computation in an array. For square sampled images the data map directly into a 2-D array, each element of which can be accessed by a pair of row and column pointers that directly relate to the original position of the pixel. For hexagonally sampled images the data can again be readily stored, but the mapping of the data onto a square array within a program requires some care as described in Section 111. With multiprocessor systems, the processing task is divided and distributed among the processor elements (PE). This can simply be accomplished by organizing the PEs in a pipeline and assigning each a different task such as smoothing, edge detection, etc. Other architectures that readily allow such task divisions include hypercubes and shared memory machines. Another way to divide the task is to assign each PE to a local area of the image and then to allow it to sequentially apply separate tasks to that area. With this arrangement, the sampling grid shape can affect the way the PEs are interconnected for communication. Image-processing tasks can be categorized at low, middle or high level (Luck, 1987). Low-level processes have input data that are associated with the original sampling grid and their output is also associated with it. The Sobel edge detector (Gonzalez and Woods, 1992) is an example of such a process. Middle-level processes again take data that are associated with the grid, but the output is often symbolic and not locked to the grid. A Hough transform (Illingworth and Kittler, 1988) that determines the angle of a straight line is an example of such a process. High-level processes have both input and output data that are not locked to the sampling grid. For example, in optical character recognition a process may take as an input a set of features including stroke end points and junctions, and output the ASCII code of the character. The effect of the sampling grid on the structure of multiprocessor systems in discussed in Section 111. A comparison between the processing of rectangularly and hexagonally sampled images by a pipeline processor has been presented.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
235
C. Binary Imuge Processing With a binary image one brightness level is often used to distinguish foreground objects and the other is used to distinguish the background. However, in realistic images containing noise, some pixels invariably are incorrectly classified. With binary images many processing algorithms are concerned with how pixels are connected and hence a tile or pixel model of the image is used. Connectivity is easily defined for the hexagonal scheme (Rosenfeld, 1970) and holds for either tiling shown in Fig. 2. Hexagonal connectivity between the set of pixels in the object and the set in the background can both be defined as 6-way connected. If the cluster of hexagons in Fig. 2(b) are considered as an object, then connectivity between a central pixel and each of the six surrounding neighbors is identical apart from the orientation of the border between the pixels. For the square scheme, pixels can be considered to be part of an object if they are either 4-way connected, that is, along a vertical or horizontal border, or 8-way connected where corner-to-corner connectivity is allowed. Background pixels can also be either 4- or 8-way connected, but if the foreground is 8-way connected, the background must be 4-way connected or visa versa; otherwise foreground and background features may cross over one another. Many hexagonal processing algorithms for binary image processing have been researched and published. Section IV discusses some of the advantages and disadvantages of hexagonal operators and their square counterparts. A comparison between hexagonal and rectangular skeletonization programs is presented. D. Monochrome Image Processing With gray-scale images, pixel, sampling point, and other models of the image structure have been used in the development of processes. Hexagonal counterparts of well-known square-grid process have been designed, and accuracy and computation speed comparisons have been made between the two schemes. Many of the hexagonal algorithms designs have exploited the equidistance between neighboring sampling points rather than any notional pixel shape. Figure 2b shows a regular hexagonal grid with a circle imposed on the six nearest-neighbor sampling points that surround a central point. Algorithms often utilize masks of coefficients that are convolved with local areas of the image. Coefficient weights are often a function of distance from the center, and thus the symmetry of the hexagonal scheme can result in simplified processing. Some square scheme masks used with such convol-
236
R . C. STAUNTON
ution operators are separable but this does not apply to hexagonal scheme operators. This can partly remove the advantage of a hexagonal operator in some cases. Small area masks can be efficiently convolved with the image in the spatial plane, but greater efficiency can be achieved with large masks by initially transforming the mask and data to the Fourier plane. Efficient hexagonal scheme transforms (Rivard, 1977) have been developed that compare favorably with the square-system fast Fourier transform (FFT). In Section V some simple hexagonal processing algorithms for gray-scale image processing are presented and their advantages and disadvantages with their square counterparts are discussed. The design of a simple hexagonal grid edge detector is discussed and its operation compared with that of the square-grid Sobel operator. Finally, comments on the visual appearances of hexagonally and rectangularly sampled images are made.
11.
IMAGE SAMPLING ON A
HEXAGONAL GRID
A . The Hexagonal Pucking of Sensory Elements in the Eye
Biological and opthalmic observations on the human eye indicate that a hexagonal packing of retinal sensory elements has evolved in nature. This was a motivation for the study of hexagonal sampling schemes for computer vision covered in this chapter. Behind the eye, ganglion cells and neurons connect to the retinal sensory elements and to each other to provide processing of the image focused on the retina. Further image processing occurs in the visual cortex and the brain. Models of biological image processing have lead to the development of computer architectures such as artificial neural networks and pyramid processors for computer vision. However, in this section, the discussion is limited to the sensor element structure. Helmholtz includes an anatomical description of the eye in his Treatise on Physiological Optics (Helmholtz, 1911, 1962). The higher orders of life have eyes capable of distinguishing both light and darkness and also form, hence, the eyes can have one of two forms. The first, common among insects, is a composite eye, in which sensory elements separated by opaque septa cover the surface of the eye. The elements at the surface of the eye are usually of a hexagonal and sometimes of a square shape. The second form of eye, as with the eyes of many vertebrates, has a lens that focuses light onto a retina. A section of a retina is shown in Fig. 3. The retina is comprised of rod and cone sensory elements. In the human eye there are approximately 100 million elements of the smaller rod type, and 5 million cones that are distributed among the rods in varying densities depending on the particular
237
HEXAGONAL SAMPLING IN IMAGE PROCESSING
Nerve fibers
FIGURE 3. The human retina: R -rods; C
~
cones; G -ganglion
cells.
part of the retina (Wandell, 1995). In the so-called “yellow spot” only cones are found, whereas towards the periphery of the retina there are only rods. The rods primarily initiate low-level light vision and the cones initiate high-level light vision. Behind the surface layer of rods and cones are layers of fine fibers connecting these elements to a layer of ganglion cells. These cells perform many processes, one of which is to pass information to the optic nerve. Thus the retina is a complicated array of different types of sensory element and has a number of layers associated with detection and interconnection; possibly some image processing is also performed (Watson and Ahumada, 1989). From the anatomic drawings in the Helmholtz treatise, it can be observed that the roughly circular sensory elements tend to pack together eficiently, which leads to a closely packed hexagonal lattice. Opthalmic experiments reported in Helmholtz’s second volume prove this to be the case. In one experiment, Helmholtz set up a grating of light and dark lines of equal thickness viewed at various distances and under differing lighting conditions to measure the spatial resolution of the eye. His results indicated that two bright lines could only be distinguished if an unstimulated retinal element existed between the elements on which the images of the lines fell. This is in accordance with Nyquist’s sampling theorem (Nyquist, 1928). He also noted that for grid spacings close to the resolution limit of the eye, the lines appeared wave-like or modulated with repeated thick and thin sections as shown in Fig. 4. From this effect he inferred that the cone sensors, the only type of sensor in the high-resolution part of the retina, were packed in a hexagonal pattern.
238
R. C . STAUNTON
d
A
FIGURE4. The wave-like appearance of parallel lines when viewed close to the eyes resolution limit and the hexagonal sensor pattern that produces this effect. (Source: He/mholtz Treatise. Helm, 1911).
Images of sections through cone sensors in the yellow spot of a human retina (Curcio et ul., 1987) show a roughly hexagonal shape for each cone, and sections through regions containing only rods show a hexagonal shape for each rod (Wandell, 1995). However, where cones exist in mixed regions their shape becomes more circular (Wandell, 1995).
B. Hexagon-Shaped Sensor Elements For any vision system, be it electronic, photochemical (Mitchell, 1993), biological or other, there are certain design parameters that can be optimized to increase its usefulness for a particular purpose or in a particular environment. Scenes with low light level can be best imaged using sensors with active areas that completely tile the image plane and that have long integration times or low shutter speeds. These techniques together with the use of large area sensors will increase the brightness signal-to-noise ratio (SNR). However, a smaller number of larger sensors will result in a lower spatial accuracy and a longer integration time in motion artifacts or missed events. The 2-D shape of the sensor elements and the geometry of the sampling grid will have an effect on the efficient acquisition of the image. The sensor element shape and any analogue signal processing by, for example, the lens, will 2-D bandlimit the signal before digitization. In the general case, the
HEXAGONAL SAMPLING IN IMAGE PROCESSING
239
oversampling of a spatial frequency bandlimited signal will not provide any increase in information, just more data.
C. Two-Dimensional Sampling Theory Before a computer can process an image, the image must be sampled, and then the quantity sampled digitized. Real-world scenes can be considered as continuous 2-D brightness fields. These brightness fields can be transformed to the Fourier plane and their spatial-frequency components analyzed. The magnitude of these transformed images can be considered as 2-D signals and plotted against vertical spatial frequency and horizontal spatial frequency (Gonzalez and Woods, 1992), and their spectra analyzed. The phase information can be analyzed in a similar way. The image of the scene is focused onto a detector and then sampled. The sampling process can be considered as a 2-D convolution between the continuous image and a grid of delta functions. For 1-D signals, the sampling theorem (Nyquist, 1928) states that if a signal is to be perfectly recovered from its sampled version, then there must be no frequency component in the pre-sampled signal that is greater than one-half the sampling frequency. A more recent theorem (Petersen and Middleton, 1962) allows consideration of multidimensional signals, and for a 2-D image can be stated as follows: A brightness function whose Fourier transform is zero outside all but a finite area of the Fourier plane can be everywhere reconstructed from its sampled values, provided that this finite area and its periodic extensions in the Fourier plane are nonoverlapping. Any real-life continuous scene will contain spatial-frequency components throughout the spectrum and the direct sampling of such a brightness field would result in frequency aliasing where frequencies above half the sampling frequency will be folded about the half-sampling frequency and superimposed on the lower-frequency components. This aliasing results in corruption of the discrete image and makes it impossible to perfectly reconstruct the continuous image. It is important to bandlimit the 2-D signal before sampling so that in the sampled signal the magnitude of its spectrum tends towards zero before components from periodic extensions of the spectrum interfere with the signal and cause aliasing. Figure 5 shows an example of the spectrum of a discrete 2-D signal. The central hill at the origin is identical to the spectrum of the continuous signal, and the other hills are some of the closer periodic extensions of this. Here, the hills do not overlap so there will not be any aliasing. However, the gaps between the hills are indicative of inefficient sampling in that the vertical and horizontal sampling frequencies could be reduced by a factor of approximately two before aliasing would occur. The
240
R. C . STAUNTON
FIGURE5. Example of the spectrum of a 2-D discrete signal
spectrum shown in Fig. 5 results from an image sampled on a square grid. The periodically extended hills are located on a square grid in the Fourier plane with each centered on integer multiples of the horizontal and vertical sampling frequencies. Other sampling grids will lead to other extension patterns in the Fourier plane. The conical shape of each hill has a circular cross section and is known as a circularly bandlimited signal (Mersereau, 1979). If the cross section is taken at the base of the hill, then all the signal information will be contained within this 2-D bandlimit region. The efficient packing of these all-inclusive band regions has been studied (Petersen and Middleton, 1962). Figure 6 shows the 2-D spectrum periodicity for an octagonal bandlimited signal on a skewed grid. The regions are quite separate and the sampling efficiency could be increased by reducing the sampling frequency in the U and V directions. However, such octagonal regions will not pack together to completely tile the plane as do some other shapes located at certain positions. Image sampling is usually achieved with a periodic sampling grid, and that grid is often square, but sometimes rectangular and occasionally hexagonal. The skewed sampling grid has been shown to be the general periodic grid (Petersen and Middleton, 1962) of which the foregoing are only special cases. We can now determine which is the most efficient grid. The minimum number of sampling points required to completely cover the image so that no information is lost must be found. This number will be a function of the grid geometry and the bandlimit of the image signal. If the bandlimit shape can be found and it completely tiles the Fourier plane, then we will have 100% efficiency. In theory there are many shapes that will
HEXAGONAL SAMPLING IN IMAGE PROCESSING
24 1
FIGURE 6. An octagonal band region, shown hashed, and some of its periodic extensions o n a skewed grid.
completely tile a plane including a square, a rectangle, a hexagon, an octagon with a small square extension in one corner, and a triangle that is alternately inverted. In practice, the bandlimit shape will be determined by the shape and characteristics of the sensor and any optical preprocessing by, for example, the lens. Theorists often choose a circular bandlimit shape to work with because then the spatial frequency is limited equally in each direction throughout the image plane. This means that a feature presented to the imaging system and detected at one angle would be equally well detected if presented at any other angle. Early work (Petersen and Middleton, 1962) has shown that circularly bandlimited images can be most efficiently sampled on a regular hexagonal grid as the bandlimit regions pack optimally in the Fourier plane. Such a packing is shown in Fig. 7a. Petersen and Middleton (1962) quote an efficiency of 90.8% for the regular hexagonal grid compared with a maximum efficiency of 78.5% for the square grid.
242
R. C. STAUNTON
Q X
FIGURE7. Tilings of the Fourier plane: (a) circular; (b) square; (c) regular hexagonal.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
243
Mersereau (1979) has calculated that 13.4% fewer samples are required when a circular bandlimited signal is sampled on a hexagonal grid than when sampled on a rectangular grid. He continued to investigate bandlimit shapes in the Fourier plane. If a 2-D continuous image is given by f,(x,y ) , where x is the horizontal and y is the vertical distance from the origin, then a discrete rectangularly sampled image can be described by fd(n1,
n,) =m,t,, n,t,),
(1)
where t , and t , are the horizontal and vertical sampling intervals as shown in Fig. la, and n , and n, are integer indexes to the image array. If F,(Qx, 0,) is the continuous image fE(x, y ) transformed to the Fourier plane, then the image is bandlimited within a shape S if F,(Q,, "J = 0,
(Qx,
a,) 3 s.
(2)
For the continuous image to be completely recoverable from a rectangularly sampled image, it must be bandlimited within the rectangular region defined by w1 < nlt,
w 2 < n/t2,
(3)
where w1 is the horizontal, and o2is the vertical bandwidth in radians m - l . If square sampling has been employed, then w 1 = w 2 and the band region will be square as shown in the crosshatched region in Fig. 7(b). For the discrete image the Fourier plane will be tiled with periodic extensions of this base region with each square centered on coordinates that are 2n multiples of ol,where n is an integer. With a square band region, it is interesting to note that the image will have a frequency response at i-45"that is $ times that for the horizontal direction. A hexagonal sampling theorem has been developed (Mersereau, 1979). A hexagonally sampled image can be described by
f d n i , n 2 ) = f,((n
1
- %/2)t3, n2t2),
(4)
where t, and t , are defined in Fig. lb, n , is an integer index along the horizontal axis, and n, is an integer index along an oblique axis at 120" to the horizontal. The vertical spacing of this grid (t,) has been chosen to be the same as the vertical spacing of the previous rectangular grid. If the hexagonal grid is regular, then t 3 = 2/$t,. For the continuous image to be completely recoverable from a regular hexagonally sampled image, it must be bandlimited within the hexagonal region defined by w 2 < n/t,
w 3 < 4n/3t3,
(5)
where, as shown in Fig. 7c, w g is the horizontal, and w 2 is the vertical
244
R. C. STAUNTON
bandwidth in radians m-'. Substituting for t , in Eq. ( 5 ) , o3< 2 7 t / 8 t 2 , and the maximum values of o2and o3are related by 2
The horizontal extent of the band region is larger than the vertical extent. The hashed regions shown in Figs. 7b and c represent the largest band regions for images that can be sampled on square and hexagonal grids. In practice, the image may be bandlimited to any arbitrary shape, but if this fits within the appropriate hashed region, then the bandlimiting will be sufficient to enable the image to be perfectly reconstructed. Considering a circular bandlimited image, then as shown in Figs. 8a and b, the band region
n
n
FIGURE8. Utilization of available bandwidth by a circular bandlimited image sampled on various grids: (a) square; (b) regular hexagonal; (c) rectangular.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
245
can be made to fit exactly within the square or hexagonal region by adjusting the common grid parameter t,. The circle more completely covers the hexagonal region than the square and there is less wasted bandwidth. The circularly bandlimited image of radius w 1 can be sampled by either grid, the maximum spatial frequency will be equal in any direction within the sampled image, and frequency aliasing will not occur. The vertical spacing of each grid is identical t,, but the horizontal spacing on the hexagonal grid is larger, resulting in a 13.4%' saving in sampling points and an advantage for the hexagonal grid. Fig. 8c shows a rectangular band region containing the circular region of radius wl. The rectangular case has been fully analyzed elsewhere (Mersereau, 1979), but graphical observation indicates poor utilization of the available bandwidth. On the other hand, if the image was square bandlimited, the square grid would have an advantage. The bandlimiting of the image must be investigated before an advantage for one grid can be identified. D. Noise and Quantization Error Noise from a number of sources can corrupt the image. Before sensing, lowand high-frequency lighting can modulate the image. Atmospheric distortion, rain, and vibration of the sensor can also add noise. Electronic noise can be additive or multiplicative, and introduced at the sensor or by the electronics. Quantization error will be introduced in both the spatial and brightness digitizations of the image. The average quantization error can be estimated (Kamgar-Parsi, 1989) and its effect on various image-processing operations evaluated. Quantization error can be estimated for hexagonal grids of sensors (Kamgar-Parsi, 1992). The average error and the distribution of a function on an arbitrary number of independently quantized variables can be estimated and used to compare the relative noise sensitivity of hexagonal and square sampling grids. It has been shown (Kamgar-Parsi, 1992) that depending on the image-processing operation, the effects of hexagonal quantization error can be between 10% below to 5% above that for a square sampling grid quantization error. Finally, it is concluded here that there is little difference between the effects of the quantization error for the two systems.
E. Practical Aspects of Digital Image Acquisition Digital image acquisition systems generally provide several serially organized functions including: (a) continuous (analogue) image forming; (b)
246
R. C. STAUNTON
antialias filtering; (c) spatial discretization; (d) analogue-to-digital conversion (ADC); and (e) signal processing. In addition, reconstruction, the forming of a continuous image at the output of a digital system, is also considered by some researchers (Burton et al., 1991) when estimating the quality of a system. Functions (a) and (b) can readily be considered together and are sometimes referred to as the image-gathering section (Burton et al., 1991). Functions (c) and (d) cover spatial and brightness discretization and are often referred to jointly as digitization. Function (e) refers here to processes such as amplification or impedance matching within the electronics of the system. The transfer functions of the image-gathering, sampling and reconstruction sections can be analyzed separately and then cascaded to determine the total effect on the reconstructed image as a part of the design. Sometimes it is possible to make a total system measurement (Staunton, 1998). By making this separation between the analogue and digital sections of the system we can consider that the image-gathering components bandlimit the analogue image before digitization (Staunton, 1996b). The shape of the bandlimit region can be determined, and, as discussed in Section II(C), must be known before the most efficient sampling grid can be chosen. Antialias filtering is important if the image is to be perfectly reconstructed. Such filters of various orders can readily be designed for time-varying voltage signals, and Fig. 9 shows the modulus of the gain against frequency for such a 1-D filter together with the sampling and folding frequencies. The slope in the cutoff region is determined by the order, and the design will typically require aliased components to be reduced to less than the resolution (1/2") of the ADC above the folding frequency, where n is the number of bits in the ADC output.
IGainl 1
A
ADC resolution
J
1/2" - - - - - - - - - - -
,
I
FIGURE9. One-dimensional antialiasing filter.
*frequency
HEXAGONAL SAMPLING I N IMAGE PROCESSING
247
For an imaging system the nature of the antialias filter will be determined by the physics of the imaging being undertaken. It will be 2-D and the magnitude of its gain can be plotted as a series of 2-D contours in the Fourier plane. Ideally, to avoid aliasing, the magnitude of the gain contour that indicates that the filter output is below the resolution of the ADC should coincide with the baseband spatial-frequency limit imposed by the sampling grid. The baseband region is shown crosshatched in Figs. 7b and 7c for square and hexagonal grids. In practice, a circular bandlimit region that lies within the ideal band region will be used to ensure equal resolution in each direction. Often the antialiasing filter cutoff frequency is determined only by the focusing and limitations of the lens and the receptive area of the sensor. An example of an optical design is given in Section II(F). The sensor array is discrete. It samples the image, but the finite receptive area of each sensor also smooths it. The sampling function can be considered as the convolution of the continuous image with a grid of Dirac delta functions. This can be expressed mathematically by Eqs. (1) and (4), or in a vector form (Ulichney, 1987; Burton et ul., 1991) by s(x) =
c 6(x
-
Vn),
(7)
n
where n is a 2-D integer column vector, 6x is a delta function, and V is a 2-D sampling matrix defined by v 1 and v2, which are linearly independent column vectors, where
The angle between v 1 and v 2 sets the geometry of the sampling grid, that is, 90" for rectangular and 120" for regular hexagonal, and their moduli set the distance between samples. Figure 10 illustrates the geometry of the regular hexagonal grid. Images can be formed from scenes reflecting or emitting electromagnetic radiation from any or several parts of the spectrum. No use is made of the frequency information in monochromatic images, but for color images and other multidimensional images, brightness planes are stored for each of several frequency bands. Visible, infrared (IR), x-ray, radio and ultraviolet images are commonly captured and processed. Other image sources employ ultrasound, seismic waves, surface-point contact measurements and atomicparticle emissions. In each case a large sensing area can increase the sensitivity of the detector and improve the signal-to-noise ratio (SNR), but may reduce the maximum spatial frequency that can be captured. Focusing devices (lenses) can improve the situation. In many imaging cases the sensor
248
R. C. STAUNTON
FIGURE10. Regular hexagonal grid with sampling vectors.
transforms an energy signal into a voltage signaI that can be further processed. The sensor designer may begin with the idea of completely tiling the image plane with sensors, as in the retina of the human eye, and then leading the electrical connections away from the rear. Hexagonal packing may be advantageous as has been found with radar systems (Sharp, 1961), point contact measurement (Whitehouse and Phillips, 1985). and for medical gamma cameras. Figure 11 shows a hexagonal-faced photomultiplier tube from such a camera. A completely tiled sensor array can be analyzed using a pixel model. Each sensor element can be considered to provide the brightness information for one pixel. This implies that the sensor is a perfect integrator over its entire surface, that there is no signal leakage between sensor elements, and that there is no radiation scattering within the array that can result in more than one element responding to a single photon. In practice, these three conditions are seldom true. With integrated sensor technologies such as charge coupled devices (CCD) and CMOS, it is not easy to make large numbers of electrical connections to the rear of the array, and circuits are often laid out alongside the sensor elements to effect data transfer. The active areas of the sensor can be kept large compared to the communications and power circuits, but the pixel model is effectively further compromised. A wafer scale image-processing system has been proposed with connections made to the rear of the wafers, but this did not extend to the hundreds of thousands of connections required for pixel-to-pixel transfers (Nudd et al., 1985).
HEXAGONAL SAMPLING IN IMAGE PROCESSING
249
FIGURE11. Hexagonal-faced photomultiplier tube.
1. CCD TVCamera
CCD image arrays can be 1-D or 2-D, with 2-D image capture being achieved with a 1-D array by scanning the object past it. Images are often large, with 2-D arrays of 512 x 512 or 682 x 512 (4:3 aspect ratio) the most readily available. These sensors discretize the image spatially, but the brightness value remains an analogue value. There are various array architectures (Batchelor et a/., 1985), but the interline transfer (ILT) device is the most popular. The image is focused onto the sensor area and during the acquisition phase the elements store an electric charge that is inversely proportional to the intensity of the light falling on them multiplied by the exposure time. In the readout phase the electric charge is transferred to storage registers that run parallel to the columns of sensor elements. This arrangement is illustrated in Fig. 12. Once the charge is transferred, the sensor elements can begin to receive the next image and the storage registers can begin to communicate the current image to the camera electronics. The registers are analogue devices
250
,-
R. C. STAUNTON
Sensing element
Output shift register
>
Video output
FIGURE12. Interline transfer CCD sensor.
and rely on multiphase clocks to shift the charges and synchronize the process. The column storage registers shift the data a row at a time into an output register, which in turn shifts the data to produce the raster scan output stream to the camera electronics. The electronics should include a reconstruction filter to correctly reproduce the image (Oakley and Cunningham, 1990), as well as amplification and impedance matching circuits. The camera output is therefore a time-varying continuous voltage signal. Considering a frame of this signal as a 2-D image, then it is horizontally continuous and vertically discrete. There are many errors associated with CCD sensors (Schroder, 1980). In particular, there will be light reflection and charge leakage within the array, and frequency bandlimiting caused by the electronics. The area of the light-sensitive elements can be maximized with respect to the shifting elements, but complete tilings of the surface are not possible. The tile and grid shape can be chosen by the designer. Square and rectangular shapes predominate for both, but a small (8 x 8) hexagonal tiled, hexagonal grid sensor array has been fabricated (Hanzal et al., 1985). Large RAM devices are often fabricated with cells on a hexagonal grid to save space. The technology to fabricate hexagonal grids exists. The sensor array discretizes the image. At this stage the bandlimiting of the image can be analyzed so that the best shape can be chosen for the sensor element, and to ensure that there is no signal aliasing. The modulation transfer function (MTF) of the individual components, that is, the atmosphere, the lens and the CCD elements, can be estimated theoretically using simplistic models, and a composite figure is obtained. The MTF is
25 1
HEXAGONAL SAMPLING IN IMAGE PROCESSING
analogous to the modulus of the frequency response of a system for processing time-varying signals. If the distance between the object and the lens is not great the MTF of the atmosphere can be neglected (Tzannes and Mooney, 1995). The ideal sensor element integrates the light intensity over its active area and can thus be considered a lowpass spatial filter. If the element is rectangular, then its horizontal 1-D MTF can be found by Fourier transforming the square profile l-D window of width xm
The spatial cutoff frequency is given by
The model is simplistic and provides only a l-D MTF. Techniques exist for measuring the M T F of individual sensor elements within an array (Sensiper et al., 1993). The lens is the final component to be analyzed. Its primary purpose is to focus the image, but in addition it acts as a lowpass spatial filter and can reduce aliased components. Both diffraction and aberration limiting occur within the lens (Ray, 1988). Diffraction limiting results in a high spatialfrequency cutoff
where 1is the wavelength of the electromagnetic (EM) radiation and N is the f-number of a circular aperture. A smaller aperture thus results in a lower cutoff frequency. If the aperture is circular, then the resulting 2-D bandlimiting will be circular. A l-D profile through a circular 2-D MTF can be calculated (Gaskill, 1978) m-
l.
(12) There are various aberrations that limit the frequency response of a lens. For monochromatic light these are spherical aberration, coma, astigmatism, curvature of field and distortion (Ray, 1988). The lens designer uses multiple elements to correct these aberrations, but the lenses that are often used in cost-effective TV systems still exhibit such defects. Aberration limiting
252
R. C. STAUNTON
results in a cutoff frequency that is proportional to the f-number of the aperture, with a wide aperture resulting in a low cutoff frequency. The cutoff frequency can be calculated for thin lenses (Black and Linfoot, 1957), but calculations are complicated by the choice of definition for “in focus” and by the compounding of thin elements. A computer-aided design (CAD) system or practical measurements should be employed. Figure 13 shows MTFs for an ideal sensor element plotted using Eq. (9), a diffraction-limited lens (f8,visible wavelength) plotted using Eq. (12), and the product of these two that can be considered as the system MTF. The frequency axis has been normalized to the Nyquist frequency of the array. The frequency-limiting components of this system are not providing sufficient filtering to remove aliased components, and the response is still greater than 0.4 at the Nyquist frequency. These simple theoretical techniques are limited in that they do not include aberration-limiting or 2-D information. Practical methods of measuring the 2-D MTF exist, and the results can be compared with the theoretical calculations.
5 Normalized frequency (f/fnyq) FIGURE13. MTFs for: (a) ideal sensor element; (b) diffraction-limited lens ( f-8, visible wavelength); (c) system M T F (product of a and b).
253
HEXAGONAL SAMPLING IN IMAGE PROCESSING
E Measurement of'2- D Modulution Trunsfer Function unnd Bundimit Shupe The MTF of continuous optical processing systems can be measured using traditional techniques (Ray, 1988), but these fail with digital acquisition systems due to signal aliasing. Various techniques have been researched to overcome the problems introduced by discrete sensor arrays. The simplest method is the knife-edge technique, and is suitable for use here to measure the bandlimit region and the filtering of aliased components. The method involves a shifting technique to produce a high-resolution profile across an image edge, and this renders the technique insensitive to geometric distortions. Geometric distortion information is important for applications such as image restoration, and for these, alternative techniques should be used (Zandhuis rt al., 1997; Boudin et ul., 1998). The measurement can be extended to include the frame-grabber digitizer as indicated in Fig. 14. The measurement now encompasses two discrete stages, the sensor array and the frame-grabber ADC. The array digitization is 2-D, but the ADC is operating on a partly discrete raster-scanned image and only digitizes in the horizontal direction. The bandlimit region measurements can be analyzed to show the contributions from each system component. The knife edge is provided by a long straight-edge object, the image of which is dark on one side and bright on the other. It is focused onto the sensor array and a TV frame is grabbed. The MTF is calculated from the stored image that is a smoothed version of the input step. The technique works by aligning an edge slightly off vertical or horizontal. In this way, the straight edge cuts each element along the line of the edge so that it records a slightly different brightness value than its neighbor. Assuming the edge to be straight, edge profiles along the edge can be aligned and a single
Discrete
Continuous
CCD Array - Discrete Discrete
Frame Memory
Camera
-1
2DMTF FIGURE14. TV camera-digitizer system.
254
R. C. STAUNTON
high-resolution profile known as the edge spread function (ESF) is assembled from them. As this is high resolution, this edge contains nonaliased information beyond the sampling frequencies of the array and the ADC. Early implementations of the technique (Reichenbach et a!., 1991) were limited to MTF measurements in the vertical and horizontal directions and required several parallel edges. The use of spatial domain calculations (Tzannes and Mooney, 1995) enabled a single edge to be used, and the consideration of plane waves and interpolation has enabled 2-D measurements to be made (Staunton, 1996b; 1997a; 1998). It is important to set up the acquisition system to be as linear as possible for the technique to be effective. The automatic gain control of the camera must be disabled, and the gamma correction removed. The ESF can be differentiated and transformed to the Fourier plane to give the transfer function of the system, the modulus of which is the 1-D MTF for the particular orientation of the edge profile; MTFs can be obtained for several edge-profile orientations and combined to form a 2-D M T F (Staunton, 1997a; 1998). A comparison of measured 2-D MTFs has been made between six acquisition systems (Staunton, 1998). The systems were made from combinations of three cameras and two frame grabbers. The component specifications as obtained from the manufacturers data sheets are as follows: Camera A: 2/3 in. CCD array. Square element shape. Sample spacing: 10pm horizontal and vertical. Resolution: 756 x 581 elements. Lens: Fixed focal length, 16mm. Camera B: 1/2 in CCD array. Resolution: 752 x 582 elements. Lens: Fixed focal length, 16 mm. Camera C: 1/3 in CCD array. Resolution: 750 elements horizontal, vertical not stated. Lens: Fixed focal length, 16mm. Frame Grabber X: Frame store: 512 x 512. Aspect ratio: 1:l. Frame Grabber Y: Frame store: 512 x 512. Aspect ratio: 4:3. Figure 15a shows measured and simulated 1-D MTFs for one acquisition system. The measured MTF cuts off at a lower frequency than the simulated one. This is to be expected as the simulation of the MTF of the lens did not include aberration limiting, and only ideal CCD array characteristics were used. The measured response at the Nyquist frequency is still 0.2 and significant aliasing will occur. Figure 15b shows a typical high-resolution ESF from which the MTF would have been calculated. Figure 16 shows I-D MTFs obtained for edge profiles oriented in 15" steps from 0" to 90" to the horizontal. The cutoff frequency increases with the angle of the profile, reaching a maximum for a vertical profile. The reduced cutoff frequency in the horizontal direction could be caused by
255
HEXAGONAL SAMPLING IN IMAGE PROCESSING
a
)
FIGURE15. Camera A, frame grabber X (a) a typical MTF; (b) a typical ESF. (Reprimed from IEE Proc. Vision, Imuye and Signal Processing, 145(3): 229-235. Staunton. R. C . (1998). Edge operator error estimation incorporating measurements of CCD TV camera transfer function, with permission from the IEE Publishing Department.)
256
R. C. STAUNTON
FIGURE16. The 1-D MTFs obtained from edge normals at angles of 0" to 90" to the horizontal. Camera A, frame grabber X. (Reprinted from I E E Proc. Vision, Image and Signal Processing, 145(3): 229-235. Staunton, R. C. (1998). Edge operator error estimation incorporating measurement of CCD TV camera transfer function, with permission from the IEE Publishing Department.)
filtering in the camera electronics, or by an antialiasing filter in the frame grabber. These circuits only operate on the raster-scanned signal. Figure 17 shows a quadrant of a 2-D MTF where the results for edge profiles at angles other than those given in Fig. 16 have been found by interpolation. Figure 18 shows slices through the 2-D MTFs for each camera-framegrabber system. The slices are located at the -3dB modulation level and have been normalized to the vertical Nyquist frequency of the CCD array of camera A. The vertical cutoff frequency for each combination is between 0.37 and 0.69, whereas the horizontal cutoffs are between 0.27 and 0.48. The vertical cutoff is limited mainly by the lens and the CCD element area, whereas horizontally, the camera and frame-grabber electronics also provide limiting. The different horizontal and vertical charge-shifting registers in the
HEXAGONAL SAMPLING IN IMAGE PROCESSING
257
FIGURE17. Quadrant of a 2-D MTF interpolated from the data in Fig. 16.
CCD array (Fig. 12) may also lead to differences in the horizontal and vertical responses. The system combinations-Camera A, grabber X; Camera A, grabber Y ; Camera C, grabber X; Camera C, grabber Y-each show an increase in cutoff frequency with increasing edge-profile angle. The bandlimiting is not circular. The horizontal cutoff frequency of Camera A is probably being limited by a reconstruction filter in the output electronics of the camera as the cutoffs are very similar for connections to grabber X and grabber Y. The horizontal cutoffs for Camera B and Camera C are nearly identical when connected to the same frame grabber. The differences here are dependent on the frame grabber, and could be caused by filtering in the input circuitry of the grabber.The traces for the systems including Camera B are nearly circular and thus there is an advantage in using a hexagonal sampling grid. The grid pattern can be realized by the digitization circuits of the grabber by adding a half-sampling period at the beginning of each line in alternate TV fields. If the square CCD element shape was adjusting the shape of the 2-D MTF, then a deviation would be expected in the trace at 45". No such deviations were observed, indicating that other limitations were dominant. This has shown that square CCD elements do not necessarily require a square sampling grid for optimum performance.
258
R. C. STAUNTON
Normalized frequency, angle (degrees)
90
0.8 '....
0.7
60 .* . .
0.6 0.5 \
0.4
'
.
.
.:P
\
.-. ,
30
0.3
0.2 0.1
0 0
0
FIGURE 18. Polar plots of the - 3 dB modulation points of the 2-D MTFs obtained from edge normals at angles of 0" to 9 0 to the horizontal: (a) Camera A, grabber Y; (b) Camera A, grabber X; (c) Camera B, grabber Y; (d) Camera B, grabber X; (e) Camera C, grabber Y; (0 Camera C, grabber X. (Reprinted from I E E Proc. Vision, Image and Signal Processing, 145(3): 229-23s. Staunton, R. C. (1988). Edge operator error estimation incorporating measurements of CCD TV camera transfer function, with permission from the IEE Publishing Department.)
The systems containing Camera A or Camera C ideally require more antialias filtering in the vertical direction. This would also even up the horizontal and vertical responses. Such filtering is difficult to achieve physically without defocusing the lens or allowing vertical charge leakage between CCD elements. The images produced by the systems containing Camera B are nearly circularly bandlimited and can be sampled most efficiently on a hexagonal grid.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
259
111. PROCESSOR ARCHITECTURE The objective of digitizing the image is usually so that it can be processed using a digital computer. This section considers the storage of image data and the spatial relationship between the data. In particular, the square and hexagonal sampling schemes are compared and the advantages and disadvantages of processing them with computers of various architectures are discussed. A detailed comparison of some specific image-processing algorithms is given in Sections IV and V. Parallel computer architecture is a large research area. The parallel processing of images is a smaller area, and the parallel processing of hexagonally sampled images is even smaller. However, most machines can process hexagonal images, but with varying degrees of efficiency. Surveys of parallel computer architectures include that of Fountain (1987), which provides an in-depth study of systems up to 1986. A special 1988 issue of the Proceedings of the I E E E on computer vision, edited by Li and Kender (1988), provides survey papers on architecture (Cantoni and Levialdi, 1988; Maresca et a!., 1988). There has been a special section in the I E E E Trunsuctions on Puttern Anulysis and Machine Intelligence, on computer architecture (Dyer, 1989). A more recent general survey that includes a new taxonomy of processors has been published (Ekmecic et ul., 1996). A . Single Instruction, Single Data Coniputers ( S I S D )
This is the conventional computer (Flynn, 1966). The program and data are stored in memory, and the memory is addressed in the correct order so that each particular datum is accessed and operated on as required. As reviewed in Section II(C), a circularly bandlimited image that has been hexagonally sampled will contain 13.4% less data for an equivalent information content than a square sampled image, but the addressing of the data will be less straightforward. Programs running on such computers need to have the image data or their subsets stored in indexable arrays because the value of an output pixel is often a function of several pixels in the input image. A 2-D square sampled image can be mapped one-to-one into an integer-indexed array where an increment of the index represents a step of one sampling distance in the image. The indexing of a hexagonal image stored in an array is less straightforward. A hexagonal pattern could be set up in a square array by filling only every other cell and shifting this pattern by one cell on alternate rows. Then the array would be twice as large, require double increments of the row address pointers, and the warp introduced would mean that either
260
R. C. STAUNTON
the horizontal or vertical increments would no longer be equivalent to the sample spacing. If the hexagonal image stored in the array is addressed with 60” or 120” axes, then indexing is possible (Mersereau, 1979). An example of such indexing is shown in Fig. 19. For practical use within the computer program the pixel addresses can be mapped to complex numbers (Bell et d.,1989). Alternatively, for small local area calculations, the hexagonal data can be mapped directly into a square array and different convolution masks used depending on whether the central pixel of the area is on an odd or even scan line (Staunton and Storey, 1990). Figure 20 shows a 7-neighbor hexagonal local area where six neighbors are equidistant from a central element, and the position shifting of the neighbors that occurs as the central element is located on either an odd or even scan line within a square 3 x 3 array. With such a scheme two sets of convolution masks are needed for each imageprocessing operation, although each is applied to only one-half of the image. A simple 3-integer coordinate scheme has been researched (Her, 1995) and is known as the symmetrical hexagonal coordinate frame. Three integers point to each pixel. The frame overcomes difficulties experienced with other coordinate systems found when designing some types of processing operator. Figure 21 illustrates the coordinate indexing, with the center of the image shifted to the origin at x = 0, y = 0, z = 0. With this scheme y points to the image scan line number, and x to the individual pixels along the scan 1ine.The image is planar, and if the origin is located centrally,
x +y
+ z = 0.
Y
FIGURE
19. Hexagonal pixels and indexing scheme.
(13)
F)
HEXAGONAL SAMPLING IN IMAGE PROCESSING
z Y
u A
x
V
w
x
26 1
w
FIGURE20. A 7-neighbor hexagonal local area: (a) the neighbor's positions within the image; (b) odd-row array positions; (c) even-row array positions.
The image plane cuts through a 3-D Cartesian space, and each indexed image point coincides with an integer-indexed point in the 3-D space, as illustrated in Fig. 22. This can be useful when processing operators, and especially geometric transformation matrices are being designed. However, loading the image into a 3-D array would lead to poor memory utilization. Thus for good memory utilization, the image should be loaded into a 2-D array that is indexed by the oblique (60")system. Now, allowing for the shift of the origin to the center of the image, x and y for the three-integer coordinate scheme are identical to the two coordinates in the oblique scheme as displayed in Fig. 19. In conclusion, a hexagonally sampled image can be efficiently stored within the memory of a SISD computer. When stored in program arrays the image can be efficiently indexed and data pointers easily calculated. In the
z=o
-2.02
-l,O,l
x=o
FIGURE 21. Three-integer index scheme.
262
R. C . STAUNTON 1
FIGURE22. Three-integer index scheme sampling points embedded within a 3-D Cartesian coordinate grid.
general case the 3-integer index scheme is the most efficient, but the oblique axis and the method involving the direct mapping of data into a square array, and the shifting of convolution masks may also be used.
B. Parallel Processors This section contains a survey of some of the computer architectures that have been used for image processing that have either been designed to process hexagonally sampled images, or that have hexagonal interconnections between processors, but that have been designed to process high-level information that is not locked to a sampling grid. Two-dimensional arrays of fine- and coarse-grain processors are discussed, as are pipeline, vector, pyramid, hypercube, and shared-memory devices. A more general review of parallel architectures for image processing has been published (Downton and Crookes, 1998). If the total image-processing task is considered, parallel processing can be applied in various ways: (a) The image can be divided into local areas, possibly overlapping, and a processor assigned to each area; (b) processes can be pipelined so that one processor completes the first task on the whole image and passes the resultant image on to the next processor for further processing, while waiting for the next image; (c) a pyramid of planes of 2-D arrays of processors can be constructed in which partly processed images are passed up to the next level for further processing, with a reduction in the number of processors and interconnections at each level; and (d) a particular task may be readily performed on a general-purpose array, hypercube, vector, or shared-memory processor.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
263
Computational or execution bandwidth can be defined as the number of instructions processed per second, and the storage bandwidth as the retrieval rate of data from memory (Flynn, 1966). Latency can be defined as the total time associated with a process from excitation to response for a particular data. In practical terms, the latency is the number of processor clock cycles that elapse between the input of a datum and the output of the processed result.
C. Two- and Multi-dimensional Processor Arruys An array of processor elements (PEs) is a group of elements that operates in parallel to process a set of data. Consider initially a simple imageintensity transform where each member of a data set bi,j is multiplied by a scaling constant K . Each transformed pixel q j = Kbi,j. An array of PEs of size i x . j could transform the entire array in one clock operation period. However, in many image-processing problems, ai,j would be a function of pixels within a local or global area. To facilitate these operations, interconnection is provided between PEs. The topology of the interconnections determines the dimensionality of the PE array. Figure 23 shows two examples of 2-D interconnection topology. The 8-way interconnection of PEs has also been realized. Three-dimensional interconnection involves the vertical stacking of such 2-D planes, or the formation of a torus (Li and Maresca, 1989). Multidimensional interconnection topologies are also realizable. The PEs in an array can be single instruction, multiple data (SIMD), or multiple instruction, multiple data (MIMD); M I M D implies a high level of processor autonomy,
FIGURE23. Two-dimensional processor array interconnection topology: (a) rectangular; (b) hexagonal.
264
R. C. STAUNTON
but some autonomy is possible for PEs within the SIMD definition (Maresca et a/., 1988). 1. Fine-Grain Arrays
Fine-grain arrays are more likely to be used for low-level image processing where there is an advantage in associating the array structure with the original data structure and applying connections between elements in local areas. Hexagonal interconnections where each processor connects to its six nearest neighbors have been realized. Some of these machines are listed in the following. Fine-grain array machines are especially useful for morphological image processing. Simple SIMD PEs are often used because arrays are large, many PEs can be integrated onto each VLSI device, and the single instruction processing reduces communication overhead. Figure 24 shows a fine-grain array together with some of the other units required to realize a system. The control unit broadcasts image-processing instructions to the PEs in the array, and receives back busy, finished signals from each. Many array processors are used with a raster-scanned input device such as a T V camera. The TV picture is captured by a frame grabber and then reformatted so that efficient array loading can be achieved. Loading such images requires the hardware overhead of the reformatting logic and a loading time overhead. Loading is often achieved by transferring a complete column of data from the reformatting logic to the array and rippling this and subsequent columns across the array until all columns are filled. In the Clip4 array (Fountain, 1987) the data and control paths are separated so that image loading can be performed concurrently with image processing. This requires an addi-
camera
Frame
Reformatter
Store
Buffer
I I 1
FIGURE 24. A fine-grain array system.
! :
HEXAGONAL SAMPLING IN IMAGE PROCESSING
265
tional hardware overhead. Some examples of fine-grain arrays that incorporate hexagonal interconnections are as follows, Clipl. The Clip4 system (Fountain, 1987) was a fine-grain SIMD processor that embodied the features of Fig. 24. I t was developed from Clip2 and Clip3, which also allowed hexagonal interconnections between PEs. The Clip4 chip, which is used to assemble the array, was designed in 1974, and was limited by the fabrication technology available. Limitations included the number of transistors per device (SOOO), the packaging (40 pin), and the clock speed (5 MHz). The resulting device contained eight PEs and has been used to build arrays of from 32 x 32 to 128 x 128 elements. The arrays can be connected in square or hexagonal 2-D meshes. The processor data width was 1-bit, and as an example of the processing speed, an %bit addition could be performed in Sops. The processor could perform Boolean operations; a 32-bit RAM was provided in each PE, and input gating was used on the near-neighbor input connections to facilitate efficient morphological operator implementation. Individual PEs could be switched off by certain processes, and a global propagation function allowed data to be passed through the array 50 times more efficiently than if propagation was limited to near-neighbor-only communication. Clip4 arrays have been used for many applications (Duff, 1985). A process involving the measurement of the rate of growth of biological cell cultures was possible for a large number of samples, as computation could be performed in less time than it took physically to change the sample (Fountain, 1987). Illiac I l l . The Illinois pattern recognition computer, Illiac 111 (McCormick, 1963), allowed 4-, 6-, and 8-way interconnections between PEs. It was used for analyzing bubble chamber traces. The PSC Circuit. This was a programmable systolic processor that had three 8-bit input channels and three 8-bit output channels (Fisher et al., 1983). Each PE contained an arithmetic and logic unit (ALU), multiplier, microcode store and sequencer, and RAM. It could be connected two, four and six way into arrays. Silicon Retina with Correlation-Based, Velocity-Tuned Pixels. This is a hexagonal architecture implemented on a CMOS chip (Delbruck, 1993). Visual motion computation is implemented using an analogue space-time algorithm. Analog Neural Network. This has been used for image processing (Kobayashi et ul., 1995). Various interconnection topologies were researched including hexagonal.
266
R. C . STAUNTON
Kydon (Bourbarkis and Mertoguno, 1996). This is a multilayer imageunderstanding system. The processors in the lower-level arrays are connected in a hexagonal mesh. 2. Coarse-Grain Arrays With coarse-grain arrays, one PE will be associated with many data, or large local areas of pixels. In some systems memory may be shared among PEs. The PEs are likely to be sophisticated microcomputers, and considerable processing and communication autonomy will be devolved to them. The array is likely to be a MIMD processor. Communication overhead limits the number of processors that can be inserted in an array to obtain faster processing. For some processes, such as low-level image processes, it may be advantageous to divide the image space and assign one PE to each local area. For higher-level processes, the computer programmer may perceive an advantage in redistributing the processing in a different way across the array. This is easier with a shared-memory system. Each of the relatively sophisticated communication channels supported by the PEs require significant chip area for their implementation. This results in early devices being limited to 4-way interconnection, as, for example, in the transputer (Inmos Ltd., 1989). With the development of hypercubes, etc., connectivity has increased again. 6-way interconnectivity has been reported for a system referred to as HARTS (Dolter et al., 1991). The optimum interconnectivity of these arrays is not primarily a function of the sampling grid of an original image. Arrays of PEs are used to speed up processing, but as the number of PEs added increases, the communications bandwidth limits the increase in speed. The interconnection topology can be chosen to optimize processing speed.
D. Pyramid Processors
A typical pyramid architecture is shown in Fig. 25. At the base of the pyramid, level 1 is the input image. This is connected upwards to level 2 so that 4-level-1 pixels connect to 1 level-2 pixel. This is known as a quadrature pyramid. Binary, hexagonal, 16-way, and other connection systems have also been realised. In its simplest form the structure may be a pyramid of memory elements so that reduced resolution images are stored at each level, as with the pipelined parallel machine (Burt et al., 1986). Pyramids of PEs are also realizable with architectural variations in the types of P E at each level and in the autonomy of control. Processor elements communicate between neighbors within their level, and also pass data upwards to their associated PEs at the higher level. Some processes require that data pass
HEXAGONAL SAMPLING IN IMAGE PROCESSING
267
Sun Workstation
FIGURE 25. A typical pyramid architecture.
both up and down the pyramid (Watson and Ahumada, 1989). In addition, PEs can work autonomously within the pyramid, or control can be passed down layer to layer from the apex. In one type of pyramid, the PEs are of the same type in each level and the arrays can be coarse grained (Handler, 1984), or more usually fine grained (Tanimoto et al., 1987). In another type, different PEs will be incorporated at the different levels (Nudd et al., 1989), where level 1 is populated by a 256 x 256 array of SIMD PEs, level 2 by a 16 x 16 MIMD array, level 3 by a 8 x 8 transputer array, and level 4 by a host Sun workstation. Pyramid processors are efficient architectures for image-understanding systems. The input is any general image at level 1 and the output would be a description of the scene in the form of, for example, a list of objects at the highest level.
268
R. C. STAUNTON
1. Hexagonal Pyramids
Hartman and Tanimoto (1984) investigated a hexagonal pyramid data structure for image processing. Level 1 was tiled with hexagons, but each hexagon was subdivided into six equilateral triangular pixels. Four triangles were then combined to give a single equilateral triangle at the next level, as shown in Fig. 26, where level-1 triangles [a, b, c, d] combine to give a level-2 triangle [A]; PEs could also be arranged in such a scheme, but the basic triangular pixel scheme is difficult to sample directly with a raster-scanned device as the image line spacings would be uneven. Resampling hexagonally sampled data to the triangular scheme could be achieved relatively easily. A hexagonal pyramid structure that models the processing structure of the human visual cortex has been researched (Watson and Ahumada, 1989). Anatomically, behind the hexagonally packed retinal sensors are a layer of retinal ganglion cells, which, in the center of the retina, connect one-to-one with the sensors (Perry and Coney, 1985). The ganglion cells can also be considered to be connected on a hexagonal grid. The 2.106 ganglion cells connect to the visual cortex that contains approximately lo9 neurons. Physiological experiments have shown that between the retina and the visual brain the image undergoes a sequence of transformations, and sets of cells in the cortex can be identified with these various transforms. The research considers a transform performed by the ganglion cells and a subsequent one performed within the cortex. The ganglion cells transfer spatial and brightness information. Their transfer function is broadband and
\
level 1
level 2
FIGURE 26. Hartman and Tanirnoto’s pyramid structure.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
269
they provide local adaptive gain control. The transform within the cortex is different. The cells are narrowband and employ a so-called hybrid spacefrequency code to convey the position, spatial variation and orientation of a region. The process in this group of cells has been modeled by a hexagonal orthogonal-oriented quadrature pyramid. The iinage transform performed in the cortex can be considered as image coding and the aim of the research was to model the transform with a pyramid constructed from elements that were themselves modeled on known physiological components. The pyramid had a hexagonal lattice input layer, the transform was invertible, and the overall process was found to be efficient. The input image is passed on by the retinal ganglion cells to the lowest level of the pyramid. This level can be considered to be tiled with hexagonally shaped pixels. The transformation to the next highest level in the pyramid involves taking a group of seven of these pixels (the shaded area in Fig. 27) and producing one output pixel that contains a vector of values from a set of seven kernels, one of which produces the average brightness value of the local area, and the other six of which are bandpass and localized in space, spatial frequency, orientation, and phase. Each low-level pixel only
FIGURE 27. A 7-element local area that produces one value in a reduced resolution image.
270
R. C. STAUNTON
FIGURE28. The hexagonal pyramid structure. (Generated using the program listed in the Appendix of Watson and Ahumada’s paper (1989).)
contributes to one next-level pixel, so the next-level contains only oneseventh the number of pixels, and so on, until the apex of the pyramid is reached. The resulting hexagonal pyramid structure is shown in Fig. 28. In this figure, the input image lattice is represented by the vertices and centers of the smallest hexagons and the highest level, which is also the lowest resolution image, is represented by the largest, thickest line hexagon. At the highest level there may only be one pixel, but the vector associated with it encodes all the image information and can be decoded back down the pyramid to reconstruct the original image.The model produces results that agree reasonably closely with physiological measurements, but some modifications, such as using larger kernels, are needed to produce a better match.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
27 1
E. Pipelined Processors With pipelined processors there is a single stream of data from the memory or input device, and the stream passes serially through several PEs, each of which performs a different operation on the data before the data are finally sent to their destination. This is shown diagrammatically in Fig. 29. At a particular time PE3 is operating on data (0), PE2 is operating on data (l), and PE1 is operating on data (2). Each PE is operating on a different data set and computing a different process. The pipeline can be termed a parallel processor. Programming flexibility can be compromised by such an arrangement. Events are separated in time, as with the SISD processor, but the sequence of instructions performed in a single pipe does not allow branching or easy rescheduling of instruction order. Efficient computation is achieved by applications where the same set of instructions must be applied to large sets of data. With images captured under controlled lighting conditions, local imageprocessing operations can be sufficient. Local image-processing operations do not require a knowledge of the complete frame of an image, but only of a group of adjacent pixels. If these operations are performed on a pipeline processor there is no need to store a complete image. Such a processor may operate directly on the serial data stream from the digitized output of a raster-scanned device. The PEs must operate in real time, but as the operations are very simple only a few lines of the image must be stored in line-length digital-shift registers within each PE. Figure 30 shows a pipeline processing element that stores two lines of the video image. In this example, three bytes of each of the two line-storage registers, together with three bytes from the previous line, form a 3 x 3 local image area on which processing is performed. Real-time processing operations are performed on this array of elements and a new value for the center pixel is calculated and output to form the output video stream. Storing more lines of an image enables larger local areas to be used. For example, a 5 x 5 pixel
FIGURE29. A pipeline processor.
212
R. C. STAUNTON
Video
,in
Operator processor
Video
[3 X 31 mask
FIGURE 30. A pipeline processor element
area could be used by storing four video lines. Processes of increased complexity can be achieved by cascading a number of PEs in a string. The pipelined processor operates in real time, although an increasing latency is introduced by the successive line delays. Many pipelined image-processing machines containing PE architectures based on that of Fig. 30 have been reported in the literature. Few are limited to contain only this simple PE design but also have general-purpose ALU and lookup table elements. If the “Warp” system (Annaratone et al., 1986; Crisman and Webb, 1991) is considered as a pipeline, then each PE can also contain local memory to aid multipass algorithm calculation. However, this feature causes Warp to be classified as a l - D M I M D array. Some pipelined systems are listed in the following, together with notes on those designed to process hexagonally sampled images. 1. Pipelined Systems Early pipelined image processors have been reviewed (Preston et a/., 1979). The basic PE architecture of Fig. 30 is evident in these systems, but of those early pipelined image processors, only the Cytocomputer was capable of
273
HEXAGONAL SAMPLING IN IMAGE PROCESSING
real-time processing. Some more recent systems are reviewed in what follows. Recirculating pipeline systems are characterized by having only short pipelines of PEs, and by individual PEs in the line being of a different hardware construction. Frame stores are used to enable data to be recirculated through the pipeline as shown in Fig. 31, and many video and system buses are employed for data communication and system control. These systems are capable of performing a wide variety of complicated image processes, some of which can be classified as being in the mid-level vision range. For example, the convex hull process has been realized (Bowman, 1988). First, data are scanned horizontally out of one frame store, processed, and restored in the second frame store. The data are then scanned vertically out of the second frame store, processed, and restored in the original frame store. Cytocomputer (Lougheed and Mccubbrey, 1980). The PE design conforms closely to Fig. 30, but with a programmable operator function. The hardware of the PEs in the pipeline is identical. The operator processor is limited to morphological and logical operations on a 3 x 3 local area. Scan line lengths up to 2048 pixels can be accommodated, and PEs are constructed on individual circuit boards from large scale integration (LSI) and very large-scale integration (VLSI) components. Cyto-HSS (Lougheed, 1985). This is a recirculating system based on a pipeline of Cytocomputer PEs.
I
Frame store
* -m
PE sum
-
I
I
PE convolve
PE lookup table
+
-
Frame store
274
R. C. STAUNTON
PIPE (Luck, 1986; 1987). Each PE contains three lookup table operators, image-combining units, a 3 x 3 arithmetic or Boolean local operator, and output crossbar switching logic. Images with resolutions from 256 x 256 to 1024 x 1024 can be processed. The crossbar switching enables data to process normally along the pipeline, to be switched in reverse direction along the pipe, or for the PEs to operate independently. Morphological and filtering operations are possible. University of California machine (Ruetz and Broderson, 1986). This system provides a custom chip set for the designer. Each P E function is realized by a different VLSI chip. Advantages include real-time operation and potential cost reduction through the use of VLSI, but the nonprogrammability of PEs has led to a dynamic inflexibility and a requirement to design a different chip for each PE function. University of Strathclyde (McCafferty et al., 1987). The system uses LSI components, operates in real time and employs sophisticated image-processing algorithms for edge detection. University of Belfast (McIlroy et al., 1984). This system contains a real-time P E incorporating LSI logic devices that perform the Roberts edge-detection algorithm. TITAN (Lenoir et al., 1989). In this design the P E has been implemented on a gate array. It is capable of several binary and gray-level morphological operations. The local operator size is 4 x 3 pixels.
Elor Optronics Ltd. (Goldstein and Nagler, 1987). This is a pipeline processor system for detecting surface defects in metal parts. Each PE is a single-board SIMD computer. Kiwivision (Bowman and Batchelor, 1987). There are three PEs in a recirculating pipeline. Each PE performs a different set of operations. The first PE is a 16-bit ALU, the second, a general-purpose local filter, operating on a 3 x 3 local area, and the third, a lookup table processor. In Kiwivision ZZ (Valkenburg and Bowman, 1988), a pipeline of Datacube PEs feed an Inmos transputer array. Datacube (Datacube Inc., 1989). A series of single board PEs have been produced that can be configured as a recirculating pipeline.
PREP (Wehner, 1989). Here, several parallel recirculating pipelines are used to speed processing by operating on distinct areas of the image. IDSP (Minami et al., 1991). This is a four-pipeline system implemented on a single VLSI chip. Additions and subtractions are allowed between data in each pipe. Applications: Video codec.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
275
Cheng Kung University (Sheu et al., 1992). This system uses a pipeline architecture to perform gray-scale morphological operations. It is suitable for VLSI implementation. Pipeline Processor Farm (Downton et al., 1994) System. This contains several pipes. Applications include general image processing and coding. Chung Cheng Institute (Lin and Hseih, 1994) Modular System. This contains three pipelines. It works in real time on 512 x 512 images. It is suitable for VLSI implementation. Applications: Template matching. New Jersey Institute of Technology (Shih et al., 1995). This pipeline architecture has been implemented as a systolic system. Applications: Recursive morphological operations. Jaguar (Kovac and Ranganathan, 1995). This is a fully pipelined singlechip VLSI device used for color JPEG compression of images of up to 1024 x 1024 pixels. Texas Instruments Pipelines (Olson, 1996). These have been discussed and with particular emphasis on how to program them. Yonsei University (Lee et al., 1997) Real-time System. This is used for HDTV applications and can perform edge detection. Another paper (Lee et al., 1998) includes a discussion on de-interlacing and color processing. F. Hexagonal Image-Processing Pipelines
GLOPR (Golay, 1969) was a pipeline for processing hexagonally sampled images. It operated on a seven-element local area that was passed to it from a host computer. It contained delay lines and could process images up to 128 x 128 in size at 3pS per pixel operation (Preston et al., 1979). It was used extensively for processing medical images, and was produced commercially as the Perkins-Elmer, Diff3. It could also perform many imageprocessing tasks including the basic morphological operations (Preston, 1971). A University of Warwick pipeline system that can process hexagonal or square sampled images has been designed (Storey and Staunton, 1989,1990; Staunton and Storey, 1990). The specification required operation at the video rate, construction from reconfigurable hardware PEs, and a VLSI implementation. A lack of resources has allowed only a simulation of the pipeline to be completed. The PE was designed at a functional level that could be configured to provide one of a number of image-processing operations. The initial device
276
R. C. STAUNTON
operated on a 3 x 3 local area. The processed images could be viewed directly on a TV monitor or transferred to a computer for high-level image processing. The PE has been designed to operate on sampled images up to 512 x 512 in size. Figure 32 shows a simple pipeline comprising an analogue-to-digital converter, two PEs, a digital-to-analogue converter and a control unit; PE (1) is performing edge detection and P E (2) binary line thinning. A novel feature of the PE design was that an image-processing operation such as convolution, edge detection, median filtering, gray-level morphological, or a binary operation could be completely performed with a single PE in one pass of the image data. Figure 33 shows the P E input and output signals. The clock is at the pixel rate. There are two 8-bit image data input channels to each PE. In the figure, one channel is connected to the output of the previous P E in the pipeline, and the other to a second source, which could possibly be the output from a second pipeline. The two channels are combined arithmetically inside the PE. Within the PE the image datum is clocked at the pixel rate through the various processing stages. A pair of unprocessed data is clocked in, and a processed datum clocked out from the PE with every clock pulse. The bandwidth of the PE is equal to the video rate, but the pipelining of the processes within the PE introduces a latency equal to an integer number of pixel clock periods. The PE hardware is based on that shown in Fig. 30. A block diagram of the basic PE is shown in Fig. 34. The video image enters the PE as a raster-scanned 1-D stream and the 3 x 3 local image area is assembled by employing two TV line length, 8-bit wide, digital-shift registers. The image adder allows two separate images to be combined at the PE input. sync Video source
v
video
ADC
v
A
Pipeline control
--
PEU) edge detection
* *
PE (2) binary
7 I
+m
DAC
monitor -z
thin
High level
FIGURE32. A simple pipeline processor consisting of two PEs.
277
HEXAGONAL SAMPLING IN IMAGE PROCESSING 2nd image input channel 8 bit
n
image data
8 bit
Previous
I
8 bit
PE
TV sync
PE
- image data
V
1 bit
I bit
control data 7
Next
TV sync
control data 2
V
PE h
2 bit-
To enable hexagonally sampled images to be processed, the horizontal sampling spacing was increased by a factor of 2/$, and the first sampling point on alternate lines delayed by half-a-point spacing. By definition only even-numbered scan lines are delayed and the first image line is numbered one. The number of points per scan line is reduced by the 2 1 4 factor, giving typical image sizes of 721 x 625 or 443 x 512 pixels in comparison with the equal resolution square-sampled image sizes of 833 x 625 and 512 x 512. With hexagonal sampling the data rate is reduced by the 2 1 4 factor from 13.0MBytess-' to 11.3 MBytess-'. For use in recirculating pipelines the only system modifications are to the initial image frame-grabbing module. Data can be processed by the pipeline at the designed 13.0 MBytes s - rate and thus-stored hexagonally sampled images can be processed in 13.4% less time than equivalent square images. Changes to the PE architecture were minimal so as not to affect hexagonal processing, and extra taps were added to the line delays to reflect the reduced number of pixels per line. The operator processor was also modified. For square-sampled data, some operations performed by the processor require the convolution of the nine image pixels comprising the local area with one or more 3 x 3 arrays of constants stored within the processor. The equivalent for hexagonally sampled data requires convolution with a seven-element array. With the foregoing system modifications for hexagonal data, the position of the central pixel with respect to the six neighbors changes within the grid of nine input pixels on alternate scan lines. This is illustrated in Fig. 20.
278
R. C . STAUNTON
Video
bit
1
Image 1 Video Image2
bit
1
Image adder
8 bit
Line Delays to Assemble a 3 x 3 Local Area
1
multiplication Multiplier array A 9X, coefficients c Contains array of nine 9 bit 8 x6-bit multipliers
Pre-stored multiplication Multiplier array B coefficients
I Selectable operator:
9 bit
sort & select, parallel binary
Contains array of nine
Video output
!
control
x 6 bit 8 x6 bit multipliers
co!trol
PE Control
Control
bit
3 information
FIGURE 34. A block diagram of the pipeline processor element.
It was necessary to store an extra set of convolution coefficient arrays within the PEs operator processor and to toggle between sets on alternate lines. This required the line-synchronization signals to be detected and the extra control signal to be processed by the control unit. The convolution coefficient magnitude range was identical to the square range as was the scaling capability provided. In practice the amount of scaling was less as fewer coefficients are employed. For the processing of the hexagonal edge detector, changes were needed to the square-system edge-detection hardware module to reflect the different magnitude equations. For the other operators implemented within the PE the modifications were minimal.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
279
In conclusion, with a pipeline processor, the processing time for a real-time video image will be unaltered for a particular operation regardless of whether square or hexagonal digitization is employed. One image is processed in one frame time, although a latency delay period is introduced by the string of PEs in the pipeline. Even so, there are still advantages for hexagonal digitization in pipelined systems despite the requirement of some extra control information. The line delays can be reduced in length by 13.4% and the PE master clock can be reduced by the same factor. The shorter line delays reduce process latency and the size of the circuit. In a recirculating pipeline system hexagonally sampled images will be able to be processed in 13.4% less time than square-sampled images. As the local image area contains only seven elements for a hexagonally sampled image, many of the processing modules would be simpler than for a square-system PE. For example, only seven multipliers would be required as opposed to nine in each multiplier array moduie.
IV. BINARYIMAGEPROCESSING With hexagonal binary image-processing operator design, the simple sixway connectivity definition is exploited, and usually an equivalent hexagonal operator will be smaller and more easily computed than its square grid counterpart. Many hexagonal processing algorithms for binary image processing have been researched. As discussed in Section II(C) there will be fewer samples (pixels) covering a given area, but if the hexagonal operators are simpler, or the processes are recursive, greater savings may be possible. The basic binary image-processing operations are described in textbooks (Davies, 1990). Some processes for the hexagonal grid are reviewed in what follows.
A . Connectivity
In determining if a group of pixels is connected together to form an object, a definition of connectivity must first be stated. On a hexagonal grid, all neighboring sampling points, with associated pixels touching a central pixel, are equidistant from the central sampling point. If the pixel shape is hexagonal, then all the nearest neighbors touch the central pixel along equal length sides. This scheme is known as 6-connectedness. Hexagonal grids with rectangular pixels, as shown in Fig. 2(c) can also be defined as 6-connected.
280
R. C. STAUNTON
On a rectangular grid, there are four nearest-neighboring pixels, but four additional pixels touch the central pixel at each corner. There are two definitions of connectivity: 4-connectedneq where only edge-adjacent pixels are neighbors; and 8-connectedness, where corner adjacent pixels are also considered as neighbors.
A problem arises because the connectivity of the background pixels can also be considered. Now, if the four-connected definition is used on both foreground and background, some pixels will not appear in either set. A simple closed curve should be able t o separate the background and object into distinct connected regions, but this is not the case. Again, if the 8-connected definition is used, some pixels will appear in both sets. One solution is to use 4-connectedness for the object, and 8-connectedness for the background. Another is to define a 6-connectedness that involves only two corners. The hexagonal systems unambiguous definition is more convenient. Connectivity is an important consideration in many image processes, especially where groups of pixels are being considered for membership of a particular feature, or the edges of a feature are being traced out and coded. The use of connectivity in shrinking and edge-following algorithms has been explored (Rosenfeld, 1970). Consideration has been given to the more general topological properties of digitized spaces, and in particular to connectivity and the order of connectivity (Mylopoulis and Pavlidis, 1971). B. Measurement of Distance Useful measurements include the distance between points, the dimensions of a part, the area of an object, and its perimeter, etc. (Rosenfeld and Kak, 1982). Connectivity evaluation, counting and edge following also are important operations. C. Distance Functions
Distance functions are used in shape analysis. The distance of each pixel in an object from the boundary of the object is measured and overlaid on the binary image of the object, and this information is then analyzed (Rosenfeld and Pfaltz, 1968). Metrics using a 4-, 6-, or 8-way connectivity have been compared. The 8-way distance involves a $ step for diagonals. The hexagonal 6-way function was found to give a better approximation to Euclidean distance than the other functions (Luczak and Rosenfeld, 1976).
HEXAGONAL SAMPLING IN IMAGE PROCESSING
28 1
D. Morphological Operators
Mathematical morphology is an approach to computer vision based onsearching for the shape of an object or the texture of a surface. Morphological operators are applied repeatedly to the image to remove irrelevant information and to enhance the essential shape of the objects within the scene. These methods are based on set theory. Operator design and application have been considered by various researchers (Matheron, 1975; Serra, 1982, 1986, 1988; Haralick et al., 1987). Hexagonal sampling grids and morphological image processing have been strongly linked since they were first introduced. Hexagonal parallel pattern transformations involving morphological operations have also been reported (Golay, 1969; Preston, 1971). The main reason researchers chose hexagonal sampling was to avoid the ambiguous connectivity definitions between pixels on a square array. One of the most active researchers in this area, Serra (1982, 1986, 1988), makes extensive use of the hexagonal grid, preferring it to the square because of the connectivity definition, its large possible rotation group on the grid, and the simple processing algorithms that result.
E. Line Thinning and the Skeleton of an Object The skeleton or medial axis of a shape can be used as a basis for object recognition. In particular, it is often used in optical character recognition systems. There are several steps involved in the process:
*
Thresholding: The gray-level image is converted to a binary image in such a way as to maintain the shape. Thinning: The shape, which may have a width of several pixels, is analyzed, or eroded, to find a one-pixel thick line that fits centrally within it. Line tracking: The thinned lines are chain coded. Line segmentation: The chain-coded information is converted to vector form. This point is the limit of the skeleton-forming process. Subsequent processes analyze the vectors to identify junctions and then the object.
Variations on this procedure exist and a large number of algorithms developed for the processes at each step. Surveys of these algorithms have also been reported (Smith, 1987; Lam et al., 1992). There are two main classes of thinning algorithm, namely, iterative and methodical. In iterative methods a local area on the edge of the object is examined, and the central
282
R. C. STAUNTON
pixel of the area is removed if certain rules designed to preserve the connectivity of the final skeleton are obeyed. The process is repeated on the image in a way that removes pixels equally from both sides of the object until no further pixels are changed. The resulting skeleton is connected, its pixels are a subset of the original object’s pixels, and it can be sufficient for many recognition tasks. Jang and Chin (1990) have used mathematical morphology to formally define thinning, and produced a set of operators that are proved to produce single-pixel thick connected skeletons. However, this resulting skeleton may lie only approximately in the correct place. The methodical algorithms aim to ensure a correctly positioned skeleton, but the iterative methods can produce sufficiently accurate skeletons for many applications; as they compute efficiently and can be easily realized using local operators, they have been applied to many problems. Deutsch (1972) reports similar thinning algorithms developed for use with rectangular, hexagonal, and triangular arrays, and has compared their operation. The triangular algorithm produced a skeleton with the least number of points, but was sensitive to noise and image irregularities. The hexagonal algorithm was the most computationally efficient, produced a skeleton with fewer points than the rectangular algorithm, and was easily chain coded. Deutsch concluded that of the three algorithms, the hexagonal was optimal. Other hexagonal skeletonization algorithms have been reported (Meyer, 1988; Staunton, 1996a).
I? Comparison Between Hexagonal and Rectangular Skeletonization Programs A comparison (Staunton, 1996a) has been made between an algorithm designed for the rectangular grid (Jang, 1990) and a similar one designed for the hexagonal grid. There are many rectangular grid algorithms for the iterative removal of border pixels. Jang and Chin’s (1990) was used for this comparison as it was designed using a mathematical framework based on morphological set transforms. Using these it can be proved that the final skeleton will conform to most of the following properties:
1. It will contain a number of single-pixel width lines. 2. Each skeletal element will be connected to at least one other. The skeleton will contain no gaps. 3. Skeletal legs will be preserved. 4. It will be accurately positioned. 5. Noise-induced pixels will be ignored, that is, limbs will not be formed towards single-pixel edge protrusions.
Fi fpJ Fi F'Fi F! 283
HEXAGONAL SAMPLING IN IMAGE PROCESSING
X
X
X
D'
x
l
X
x
02
x
E'
X
o
o
x
o
o
x
D4
D3
X
E2
o
x
X
X
E4
E3
FIGURE35. Rectangular scheme thinning templates D = {D',D2,D 3 , D"},and E = { E l , E 2 , E 3 , E"}. (Reprinted from Staunton, R. C. (1996a). An analysis of hexagonal thinning algorithms and skeletal shape representation, Pattern Recognition, 29(7): 1131 1146; Copyright (1996), with permission from Elsevier Science.) ~
It has been possible (Staunton, 1996a) to use similar mathematics to design a hexagonal algorithm that was close in operation to Jang and Chin's (1990). In each case the analysis led to the design of a set of thinning templates as shown in Fig. 35 for the rectangular algorithm and Fig. 36 for the hexagonal algorithm. Further analysis proved that the templates can be applied in parallel pairs to the image for the hexagonal case, and parallel triplets for the rectangular case. If the skeleton is to be positioned correctly the pairs must be applied in a particular order, as shown in Fig. 37 for both the rectangular and hexagonal algorithms. The templates can alternatively be applied sequentially, and for the hexagonal case this produces a better preservation of skeletal legs and a slightly more accurate positioning of the skeleton. However, the parallel application of the templates resulted in a converged skeleton in approximately half the time required for the sequential application of the templates.
1. Skeletal Quality Figure 38 shows some examples of the skeletons of four geometric shapes and a sample of text digitized on a rectangular grid. Figure 39 shows the same examples digitized on a hexagonal grid. These images can be com-
284
8111' R. C. STAUNTON
@ 0
0
p& F'
F4
0
X
X
F2
F3
@@ X
FS
0
X
F6
FIGURE36. Hexagonal scheme thinning templates F = [ F ' , F 2 , F 3 , F4, F 5 , F 6 ) . (Reprinted from Staunton, R. C. (1996a). An analysis of hexagonal thinning algorithms and skeletal shape representation, Pattern Recognition, 29(7): 1131 - 1146; Copyright (1996) with permission from Elsevier Science.)
pared and evaluated with respect to the five good skeletal qualities listed earlier in this section. Properties 1 and 2 hold for each algorithm. Property 3 concerns the preservation of skeletal legs. These are not preserved to 90" corners by the rectangular algorithm, and there is some shrinkage to the corners introduced by the hexagonal algorithm. This shrinkage was less with the sequential version of the hexagonal algorithm (Staunton, 1996a). Property 4 concerns the accurate positioning of the skeleton. Both algorithms have made good attempts at positioning the major axes of each shape, but few minor axes have been preserved by the rectangular algorithm. Considering the skeleton of Fig. 38a, the single component could have been produced by any one of many rectangular shapes, but the information remaining can only indicate the position of the shape and the orientation of its major axis. The hexagonal skeletons contain more limbs and more information. The extra limbs within the rectangular shapes digitized on the
. HEXAGONAL SAMPLING IN IMAGE PROCESSING
285
S
--m I 5 Pass 1: D1, D2, E
'
Pass 2: D2, D3, E2
Pass 3: @, D4,E3
Pass 3: F6, F I
4
Pass 4: D4,D',E4
4 Converged? J, Yes
Pass 4: F3, F 4
4
Pass 5: F5,F 6
4
Pass 6 : F2, F 3
S'
C Yes S'
(a>
(b)
FIGURE37. Template application order for thinning algorithms: (a) rectangular; (b) hexagonal.
hexagonal grid give an indication of their original size. Other rectangular algorithms have been researched that retain skeletal branches (Guo and Hall, 1989). The triangular shapes of Figs. 38b and 39b have both been processed to produce good skeletons. The rectangular algorithm has shifted the center of gravity, or branch junction, towards the base of the triangle. The hexagonal algorithm has produced an unusual step pattern in the lower left limb, and some limb shortening. Both algorithms have produced good skeletons of the text. The long thin strokes and the acute corners have not resulted in missing legs using the rectangular algorithm. The text images are from real scanned text, whereas the geometric shapes were computer generated. Property 5 states that
286
R.C . STAUNTON
FIGURE38. Skeletons produced by the new rectangular algorithm.
noise-induced pixels should be ignored. The text images contain single pixel protrusions in the edges of each letter that can be defined as noise. The hexagonal algorithm was insensitive to these noise pixels, some of which can be observed in Fig. 39e on the cross stroke of the “A” and the bottom of the
HEXAGONAL SAMPLING I N IMAGE PROCESSING
287
FIGURE 39. Skeletons produced by the hexagonal algorithm
“B.” The rectangular algorithm was sensitive to the noise and limbs were formed to these pixels. A 2-pass morphological filter was designed to remove the noise that resulted in the acceptable skeletons seen in Fig. 38e, but introduced a processing overhead.
288
R . C. STAUNTON
2. Program Eficiency Both the rectangular and hexagonal algorithms can be computed on parallel machines, or alternatively, for a SISD machine the templates to be applied in parallel can be logically combined and then computed. The hexagonal templates compute more quickly as they have only seven elements as opposed to nine for the rectangular ones. The removal of a complete layer of pixels from the outside of an object can be referred to as an iteration of the algorithm. For each application of the rectangular 4-pass scheme illustrated in Fig. 37, the “corner” templates are applied twice, and the “edge” templates once, whereas each application of the 6-pass hexagonal scheme applies each template twice. Table I compares the number of applications of the algorithm required to produce a converged skeleton from the images presented in Figs. 38 and 39. In each case the hexagonal algorithm is at least as efficient as the rectangular algorithm. For real rectangularly sampled images edge-noise removal requires the equivalent of an additional two passes of the algorithm. Counting passes,the average hexagonal computation requires only 80% of those for the rectangular computation. If the reduced time to compute the smaller templates (Figs. 35 and 36) is considered, the hexagonal computation will require only 63% of the time of the rectangular computation to calculate the average skeleton. The test images were obtained by subsampling a doubleresolution rectangularly sampled image in such a way that each shape contained the same number of pixels whether sampled on the rectangular or hexagonal grid. If a regular hexagonal grid had been used, then 13.4% fewer pixels would have been required for each shape, and the time to calculate a skeleton would have been reduced to 5 5 % of that to calculate it on the rectangular grid.
TABLE I A COMPARISON OF THE NUMRER OF PASSES REQUIREDTO FORMSKELETONS BY THE RECTANGULAR AND HEXANGULAR ALGORITHMS
Image Vertical rectangle Horizontal rectangle Triangle Corner Text
Rectangular algorithm passes
Hexagonal algorithm passes
20 23 27 12
20 16 24 12
19
15
HEXAGONAL SAMPLING IN IMAGE PROCESSING
289
In conclusion, both schemes produced good quality skeletons, although there were differences in skeletal attributes. The hexagonal scheme could compute the average skeleton in 55% of the time required to compute it with the rectangular scheme.
V. MONOCHROME IMAGE PROCESSING This section contains a review of gray-scale operators that have been designed, to work on hexagonally sampled images. As they are griddependent, they can be defined as low-level processes (Section I). Some operations can be computed more efficiently in the Fourier domain and thus hexagonal transforms have been developed; others are applied in the spatial domain. Where possible, comparisons have been made between these and similar operators designed for the rectangular grid.
A. The Hexugonul Fourier Transform
A hexagonal Fourier transform and hexagonal fast Fourier transform (HFFT) have been developed (Mersereau, 1979; Mersereau and Speake, 1981; Dudgeon and Mersereau, 1984; Guessoum and Mersereau, 1986). It was found that the HFFT required 25% less storage of complex variables than the rectangular fast Fourier transform (RFFT), and that it computed more efficiently. The algorithm is based on the Rivard procedure (Rivard, 1977), rather than the decomposition of the 2-D kernel into 1-D F F T method. Decomposition to 1-D FFTs is not possible in the hexagonal case. This alternative procedure is a direct extension of the 1-D FFT algorithm to the 2-D case, which can increase the computational efficiency of the RFFT by 25%. Mersereau has shown that his HFFT increased computational efficiency by an additional 25% in comparison to the Rivard RFFT. B. Geometric Trun.$brmations
Geometric transformations have been researched using a three-integer coordinate frame (Her, 1995) as described in Section 1II.A. The coordinate frame is easy to use and the symmetry of the grid has enabled the design of simple efficient operators. Operators have been designed for: rounding that finds the nearest integer grid point to a point calculated with real coordinates; translations and reflections; scalings and shearings; and rotations.
290
R. C . STAUNTON
C. Point Source Locution This task, also known as star tracking, involves the tracking of a moving point light source across the array. The image of the source is a blurred spot. The centroid of the spot is calculated to within subpixel accuracy to give the position of the source. Accuracy is improved if the sensor array has a high fill factor, that is, the sensor elements tile the image window as completely as possible. For a 100% fill factor, a hexagonal array of hexagonally shaped sensors has been shown (Cox, 1987) to out-perform a square array of square-shaped sensors. Detection error and sensitivity to noise is reduced, and computational load and data storage are reduced by 24%. For lower fill factors the advantages of a hexagonal array are less pronounced (Cox, 1989).
D. Image-Processing Filters 1. Linear Filters A series of general-purpose hexagonal FIR and IIR filters have been developed (Mersereau, 1979; Mersereau and Speake, 1983) and compared to rectangular filters with similar frequency responses. The hexagonal filters were found to be superior in terms of computational efficiency, and as they could be designed with 12-fold symmetry, they had a more circular frequency response. These filters concern 2-D signal processing, in general, as opposed to only image processing. Savings of up to 58% in memory and similar gains in computational efficiency were reported for hexagonal filters compared to their rectangular counterparts. Considering filters for image processing in more detail, the regular hexagonal structure leads to easy spatial plane local operator design. The local area can be defined to include the central pixel and any number of concentric “shells” of pixels at increasing distances from the center. All the members of a particular shell can be assigned equal weighting factors in many local operator designs. For example, consider a 4-shell Gaussian filter operating on a hexagonal grid, where four weighting factors are initially calculated as shown in Fig. 40, and the final algorithm will be of the form of Eq. (14): P , = kil,l + 1
6
6
6
C1 i 2 , p + m 1 i3.q + n rC= l
p=
i4,r,
(14)
q= 1
where k, 1, m and n are filter weights associated with the four shells, and i denotes image points. Four multiplications and 19 additions are required for the computation of each output pixel.
29 1
HEXAGONAL SAMPLING IN IMAGE PROCESSING
n
m
m
Z
l
n
n
m
l
k l
n
m
l l
m
f
e
d
e
f
e
c
b
c
e
d b a b d
n
m n
e
c
b
c
e
f
e
d
e
f
FIGURE40. Four-shell hexagonal and 6-shell square local operators.
In comparison, a similar filter on a square grid (5 x 5) requires six different weighting factors and a correspondingly more complicated algorithm of the form of Eq. (15). Ps = ail,l + b
4
C p= 1
4
4
i2,p
+C C p= 1
i3,p f
dC p= 1
4
4
i4.p
+e C p= 1
i5.p
+f
C
i6,p7
(15)
p= 1
where a, 6, c, d, e, andf are filter weights associated with the six shells. Six multiplications and 25 additions are required for the computation of each output pixel. Both filters are convolved with a similar image area, but, in general, 13.4% fewer points will be required for the hexagonal filter than for the square. However, in this case, the square-system operator kernel is separable, giving an alternative computation algorithm of the form of Eq. (16):
Now, 6 multiplications and 8 additions are required for the computation of each output pixel. The hexagonal operator requires only 4 multiplications, compared with 6 for both rectangular algorithms, but the number of additions is larger than that for the separable kernel rectangular method. Computational efficiency will be determined by the architecture of the computer arithmetic and logic unit, and depend upon whether the filter coefficients are integer or real numbers.
292
R. C. STAUNTON
2. Nonlinear Filters This class of filters includes designs such as the median filter and gray-scale morphologic filters (Sternberg, 1986; Haralick et GI., 1987). Hexagonal grid median filters should be more computationally efficient than their squaregrid counterparts, because for the same area of support, 13.4% fewer values exist. This will significantly simplify the sorting procedure.
E. Edge Detectors Edges correspond to intensity discontinuities in the image. These discontinuities may correspond to the edges of an object, but unfortunately sometimes they do not. For example, the edge of a shadow is likely to be detected. Many algorithms have been researched, but here some of the simplest are compared. Differential operators model local edges by fitting the best plane over a convenient size of neighborhood. In square arrays two orthogonal operators are applied to a pixel and from the response of these, the magnitude m of the gradient of the plane and the edge angle, a, can be calculated: rn = (th2 + tr12)”2, (17) u = arctan(tu/th),
(18)
where t u and t h are the responses of operators designed to respond maximally to vertical and horizontal edges. Fig. 41 shows Sobel operators designed to be convolved with a 3 x 3 area of the image. For edge detection, the response magnitude is compared with a threshold to determine if a significant edge exists. The Sobel operator has a computational processing time advantage over some other operators as only integer arithmetic is required and the local area in which it operates is relatively small. It has been shown by some researchers to be the optimum 3 x 3 operator (Davies, 1984; Staunton, 1997b).
1
2
1
1 0
0
0
0
2 0 -2
-1
-2
-1
1 0
-1
-1
FIGURE 41. Sobel differential operators with 3 x 3 area
293
HEXAGONAL SAMPLING IN IMAGE PROCESSING
F. Hexagonal Edge-Detection Opercrtors Hexagonal operators have been researched (Staunton, 1989). The regular hexagonal data structure leads to easy local operator design. The central element of the local area is surrounded by shells of elements. Figure 42 shows a set of edge-detection operators exploiting only the inner shell of neighbors, and these are of a comparable order to the 3 x 3 operators in Fig. 41. These hexagonal operators will respond maximally to edges at 60” angular intervals from the horizontal. The weighting functions of the shell elements are chosen as 1 or -1 to reflect the regular structure of the grid of sampling points. Davies’ design principle (Davies, 1984) indicates “1” to be nearly optimal. Again only integer arithmetic is required for computation. If these masks are used as differential operators, the slope magnitude m becomes relatively complicated compared with Eq. (17). The equation of m is derived as follows. The output of each of the three hexagonal operators, as shown in Fig. 42, can be represented as a vector. An edge can be modeled by a plane, and the three vectors, t,, t,, t,, lie within this plane. Assuming orthogonal x and y axes, t , is aligned with the y axis, t , is at 60” to t,, and t , at 60” to t , . The resultant vector. m can be found:
m
t,
=
+ t , + t,.
(19)
Examination of Fig. 42 indicates the simple relationship t ,
m
$
= -( t ,
+ t 2 ) i+ -3 ( t l
2
-
=t,
t2)j.
2
The slope magnitude, m is
m
0
-1
[3(tT
1
0
+ t: - t , t , ) ] t ’ 2 .
-1
1 0
-1
=
-1
0 0
0
1
1
1
0
0
-1
FIGURE 42. Hexagonal differential edge-detection operators.
-
t,, giving
294
R. C. STAUNTON
The angle that m makes with the x axis is known as the edge angle a
a
= arctan
(G=). t, t, -
A comparison between the computational efficiency and accuracy of local edge-detection operators in the two systems has been made (Staunton, 1989).The hexagonal system detector was found to compute more efficiently than the square-system Sobel detector as the mask weights are fewer in number and are all unity. On a SISD computer the hexagonal program is computed in 55% of the time required by the Sobel program. The accuracy of the two detectors was found to be equivalent, with the hexagonal being more accurate with one type of sensor model, and the square more accurate with a second type. G . The Visual Appearance of Edges uiid Features
The visual appearance of monochrome images is illustrated here using hexagonal and rectangular sampled images of a sand core used for metal casting. The core contained three small surface scratch defects that can be seen in Figs. 43 and 44. The illumination employed divided the image of
FIGURE 43. Rectangular sampled sand core image, 64 x 64 resolution.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
295
FIGURE44. Hexagonal sampled sand core image, 64 x 64 resolution.
each defect into a bright and a dark (shadow) segment. There is one large circular defect, a long thin defect, and a small defect with dimensions comparable with the pixel size. The core was 14cm high. The large circular defect has a diameter of 6mm, the long thin scratch has dimensions 30mm by 2 mm, and the small circular defect has a diameter of 1 mm. On comparing the square and hexagonal images in Figs. 43 and 44, the defects can generally be seen more clearly in the hexagonal. The offsetting of pixels on alternate lines enables the eye to trace their outlines more readily at this resolution. The large circular defect appears more circular, and the light and dark segments are more easily discerned. The long thin defect is more easily discernible as a connected component. The object edges, which in these examples are near vertical, are easier to localize in the hexagonal image. Long repeating brightness step sequences are observed in the rectangular image, whereas, a small castellated effect is observed in the hexagonal. In an attempt to segment the defects from the remainder of the image, the two images were then edge detected using the optimum Sobel and corresponding hexagonal operators introduced in Section V(E). The threshold level was set manually so that the resulting edge images contained, where possible, connected edges around the defects, and so that the number of false detections was minimized. Fig. 45 shows the resulting square edge-detected image, and Fig. 46 the resulting hexagonal image. The large
296
R. C. STAUNTON
FIGURE45. Rectangular sampled sand core image edge detected, 64 x 64 resolution
FIGURE 46. Hexagonal sampled sand core image edge detected, 64 x 64 resolution.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
FIGURE 47. Rectangular sampled sand core image, thinned, 64
297
x 64 resolution
circular defect appears to be square in overall shape in the rectangular image and there is a small disconnection in the outline. In the hexagonal image, it appears more circular, and the structure, such as the central dividing line between the light and dark segments, is more easily discerned. There is also a small gap in the outline. The long thin defect has a break in its outline in the square image, whereas the outline is complete in the hexagonal. The equal width of this defect along its length is more discernible in the hexagonal image. The presence of the small defect is indicated by a small group of edge pixels in each image. There are also more detected false edge points in the square image. These are seen as unconnected black pixels in various parts of the image. Figs. 47 and 48 show the square and hexagonal edge-detected images after thinning. The same points as in the foregoing, concerning the defects are still evident. The near vertical object edges appear as gradually increasing steps in the rectangular image, whereas in the hexagonal image a castellated effect is visible.
298
R. C. STAUNTON
FIGURE 48. Hexagonal sampled sand core image, thinned, 64 x 64 resolution.
1. Human Interpretation Interpretation of the images depends on the individual observer, and the resolution of the image being viewed. At the low 64 x 64 resolution of the forementioned images, features are easily discerned in the hexagonal images, and their true shapes, whether circular or rectangular, can be more easily estimated. At the higher resolutions of 256 x 256 and 512 x 512, the aliasing effects at the object edges are less troubling to the eye and may be undiscernible at even higher resolutions. With the offsetting of pixels on alternate lines in the monochrome hexagonal images, the human eye may be able to estimate the boundaries between features more accurately as the pixel boundaries do not align to form long vertical features as in the square system. However, with the binary line images the pixel off setting may appear troublesome to the human eye. This has been reported by other researchers (Preston et al., 1979). Machine interpretation will not depend on the visual appearance of the image, but on the efficiency of the higher-level processes. High-level processing will be easier if a detected edge contains fewer gaps.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
299
VI. CONCLUSIONS This chapter has reviewed research on the sampling of images on a hexagonal grid, the processing of hexagonally sampled images by single and multiprocessor computers, and the computation of image-processing operations on both binary and monochrome hexagonally sampled images. In the following, conclusions are drawn on each of these areas, and an attempt is made to answer the questions of when research should be conducted using hexagonally sampled images, and when it may be commercially advantageous to implement a hexagonally sampled image-processing system. The hexagonal packing of sensors together with a hexagonal sensor shape is found in eyes. Evolution has favored the hexagon. Some manmade sensors have hexagonal shapes, and others circular or rectangular shapes. Each of these shapes has been shown to pack together efficiently on a hexagonal grid. A high fill factor, or complete tiling of the area can lead to a high signal-to-noise ratio; however, for integrated sensor arrays, fill factors below 100% are necessary as communication circuits are required on the surface of the chip to transfer the image signals to the processor. Two-dimensional sampling theory was reviewed, with consideration being given to the aliasing of high-frequency components and the necessity to band limit analog signals before digitization to prevent this. If signals are circularly bandlimited, then their high-frequency information content is limited equally for any direction within the image. This is advantageous, as a feature detected when presented at one orientation to the sensor array can, in theory, be equally well detected when presented at any other orientation. If signals have been circularly bandlimited, then the hexagonal grid is more efficient than the square as 13.4% fewer sampling points will be required to give equal high-frequency information. This reduction in the number of sampling points for the hexagonally sampled images leads to reductions in image-storage requirements and faster subsequent processing. A circular bandlimit has many advantages, and it was shown to be achievable for two CCD TV camera-frame grabber systems. The first discrete stage in such a system is the CCD sensor array, and if a circular bandlimit is to be achieved, then the lens and the active area of the sensor that integrates the brightness signal focused on it will be the main frequencylimiting components. The modulation transfer function (MTF) of the lens can be modeled most simply by the diffraction limit. More sophisticated models include aberration limits, but these are best evaluated using a specialist CAD system at the time the lens is designed. Simple 1-D sensor models regard the sensor as a spatial window. Transforming this window to the Fourier domain results in a spectrum that is a sinc function of distance.
300
R. C . STAUNTON
The theoretical MTF of the system can be found by combining the lens and sensor MTFs, but this was found to overestimate the cutoff frequency. A knife-edge method for measuring the MTF of these discrete systems was outlined and applied to six TV camera-frame grabber systems. A circular bandlimit was found for two systems, and an elliptical bandlimit with a high vertical cutoff frequency was found for the others. Methods to reduce the vertical frequency response to make the bandlimiting more circular were discussed. Once a hexagonal image has been acquired it can be processed using a conventional SISD or a multiprocessor computer. Hexagonal images can be processed by most computer architectures capable of processing square images. With some architectures the structure of the processor interconnections is fixed by the image-sampling grid. When processing hexagonal images, each processor will be connected to six neighbors, and when processing square images, each processor will be connected to four or eight neighbors. For some machines of this type it is possible to set up the connections to enable both types of sampled image to be processed. With architectures based on the sampling pattern, the processing task is divided between processors using spatial criteria. Other divisions are possible and can make it easier to use general-purpose multiprocessors such as sharedmemory, hypercube, or pipeline systems. It depends on the application how the task is best divided between processors. Communications need to be established between the processors, and within a 2-D plane, general-purpose systems employing six-way (hexagonal) communications have been realized. Within a computer program a square image will map directly into a 2-D array, and 2-integer indexing is possible. For hexagonal images several indexing methods have been proposed, but the 3-integer scheme appears to be the most efficient for general use. Hexagonal pyramid systems are interesting to research first, because they can be used to model processing within the human visual cortex, and second, because the structure enables the efficient processing of low-, medium-, and high-level operations on arrays of different processors at each level in the pyramid. An example of a pipeline processor that was capable of processing both square and hexagonal images was given. Small changes to each processor element were required to enable this dual role. For hexagonal-only processing, less data storage (13.4%), a lower clock rate for real-time operation (13.4%), fewer multipliers and adders, but twice as many convolution masks were required compared to square image processing. For recirculating pipelines faster processing was possible with hexagonal images. The processing of binary hexagonal images was reviewed. It is preferred by some researchers to square processing due to its simple definition of connectivity, its large possible rotation group, and the 13.4% reduction in
HEXAGONAL SAMPLING IN IMAGE PROCESSING
30 1
the number of sampling points. Advantages have been found for hexagonal images with distance measurement, distance functions, morphologic operators, and skeletonizing programs. Two similar skeletonizing algorithms, one for hexagonal and one for square images were compared. Both were designed according to the same criterion, and had been proved to produce good-quality skeletons. On a single-processor computer, the hexagonal program was found to calculate the skeleton in 55% of the time required by the square program. The processing of monochrome hexagonal images was reviewed. Hexagonal FFT algorithms have been researched (Mersereau and Speake, 1981; Guessoum and Mersereau, 1986), and in a comparison with a similar square program, a hexagonal program was shown to require 25% less storage of complex variables and to exhibit a 25% increase in computational efficiency. Geometric transforms have been researched (Her, 1995) and shown to compute efficiently when a 3-integer image indexing scheme was used. Hexagonal and square convolution filters have been compared (Mersereau, 1979; Mersereau and Speake, 1983). Due to the symmetry of the filter weight masks savings of up to 58% in memory and computations were demonstrated. The convolution mask weights tend to be arranged in equal value shells around a central value. Fewer shells are required to cover a particular area if hexagonal sampling is used. The details of the design of a hexagonal edge detector and its comparison with the square-system Sobel detector have been presented. Again the symmetry of the hexagonal convolution masks that leads to unit weight coefficients resulted in a detector with a similar accuracy to the Sobel detector, but that could be computed in 55% of the time required by the Sobel detector. In Section V, a pair of resampled hexagonal and square-grid images was used for a visual comparison of edges and features. The hexagonal sampling enabled defects to be seen more clearly and their size better estimated. It is possible that the eye was better able to estimate boundaries more accurately when the pixels were offset in the hexagonal image. After edge detection and line thinning, the hexagonal edges were better connected, but the “zipper” effect caused by the offsetting of pixels in a binary brightness thin vertical line was not as pleasing to the observer as the single-pixel thick lines in the square-edge map. To answer the question on when hexagonally sampled images should be used, the following conclusions can clarify the choice: The quality of circularly bandlimited images is similar between hexagonally and square-sampled images. This has been shown theoretically in terms of information content, and in practice by observation.
302
*
R. C . STAUNTON
Hexagonally sampled images can be processed by most types of computer. A circularly bandlimited hexagonal image requires 13.4% less storage than a square image. For image processes of a similar quality, a hexagonal process may compute in only 55% of the time required by the square process.
Hexagonal sampling and processing will always be important when modeling processes in human vision. For general research the position is less clear as vast libraries of software and a large choice of hardware is available to support the square scheme. This support is important if new ideas are to be tested and published quickly. Processing speed is important for real-time applications. At present, if a computer is not fast enough the researcher can rely on a faster one shortly becoming available. For this group of researchers switching to hexagonal processing may enable them to stay one jump ahead of the computer technology. The author has found that researching hexagonal processes at the same time as square processes can often lead to a deeper understanding of the problem. Commercially, the higher processing speed and reduced storage requirement of hexagonally sampled images may be attractive. The printing of images and text on a hexagonal grid has already been done (Ulichney, 1987). Other self-contained products such as document scanners could well be produced at a lower cost if hexagonal processing was employed.
REFERENCES Annaratone, M., Arnold, E., Gross, T., Kung, H. T., Lam, M. S., Menzilcioglu, O., Sarocky, K., and Webb, J. A. (1986). Warp architecture and implementation. Proc. IEEE 13rlz Int. Syniposium on Computer Architecture, 346- 356. Batchelor, B. G., Hill, D. A., and Hodgson, D. C. (1985). Automared Visual Inspection, Bedford, UK: IFS Publications Ltd. Bell, S. B. M., Holroyd, F. C., and Mason, D. C. (1989). A digital geometry for hexagonal pixels. Image und Vision Computing, 7(3): 194-204. Black, G . and Linfoot, E. H. (1957). Spherical aberration and the information content of optical images. Proc. Roy. Soc. A, 2 3 9 522-540. Boudin, J, P.. Wang, D., Lecoq, J. P., and Xuan, N. P. (1998). Model for the charged coupled video camera and its application to image reconstruction. Optical Engineering, 37(4): 1268- 1274. Bourbakis, N. G. and Mertoguno, J. S. (1996). Kydon: An autonomous multi-layer imageunderstanding system: Lower layers. Engineering Applirutions of Arrijcial Intelli(gence, 9( I): 43-52. Bowman, C . C. and Batchelor, 8 . G . (1987). Kiwivision a high speed architecture for machine vision. Proc. SPIE. 849: 42-51.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
303
Bowman, C. C. (1988). Getting the most from your pipelined processor. Proc SPIE, 1004: 202-210. Burt, P. J., Anderson, C. H., Sinniger, J. O., and van der Wal, G. (1986). A pipelined pyramid machine. I n Pyramidal Systems f o r Computer Visiorl (V. Cantoni and S. Levialdi, eds.). Berlin: Springer Verlag, pp. 133-152. Burton, J., Miller, K., and Park, S. (1991). Fidelity metrics for hexagonally sampled digital imaging systems. J . Imuging Technology, 17(6): 279-283. Cantoni, V. and Levialdi, S. (1988). Multiprocessor computing for images. Proc. I E E E , 76(8): 959-969. Cox, J . A. (1987). Point source location using hexagonal detector arrays, Optical Engineering, 2611): 69-74. Cox, J. A. (1989). Advantages of hexagonal detectors and variable focus for point source sensors, Optical Engineering, 28( 1 1 ): 1145- 1 150. Crisman, J. D. and Webb, J. A. (1991). The warp machine on navlab, I E E E Trans. P A M I , 13(5): 451 -465. Curcio, C. A,, Sloan, K. R., Packer, O., Hendrickson, A. E., and Kalina, R. E. (1987). Distribution of cones in human and monkey retina: Individual variability and radical asymmetry. Science, 236: 579-582. Datacube Inc. (1989). Maxuideo System, Peabody, MA. Davies, E. R. (1984). Circularity a new principle underlying the design of accurate edge orientation operators, Image and Vision Computing, 2(3): 134-142. Davies, E. R. (1990). Machine Vision: Theory, Algorithms, Practicalities. London: Academic Press. Delbruck, T. (1993). Silicon retina with correlation-based, velocity-tuned pixels. I E E E n u n s . Neural Networks, 4(3): 529-541. Deutsch, E. S. (1972). Thinning algorithms on rectangular hexagonal and triangular arrays. Communications A C M , 15(9): 827-837. Dolter, J. W., Ramanathan, P., and Shin, K. G. (1991). Performance analysis of virtual cut-through switching in HARTS: A hexagonal mesh multicomputer, I E E E Trans. Computing, 40(6): 669-679. Downton, A. C., Tregidgo, R. W. S., and Cuhadar, A. (1994). Top-down structured parallelization of embedded image-processing applications, I E E Proc. Vision Image and Sign Processing, 141(6):431-437. Downton, A. and Crookes, D. (1998). Parallel architectures for image processing, I E E Electronics Communication Engineering J . , lO(3): 139- 151. Dudgeon, D. E. and Mersereau, R. M. (1984). Multidimensional Digital Signul Processing, Englewood Cliffs, NJ: Prentice-Hall Inc. Duff, M. J. B. (1985). Real Applications on Clip4. In Integrated Technology .for Parallel Image Processing ( S . Levialdi, ed.). London: Academic Press, pp. 153- 165. Dyer. C. R. (1989). Introduction to the special section on computer architectures and parallel algorithms, I E E E Trans. P A M I , ll(3): 225-226. Ekmecic, I., Tartalja, I., and Milutinovic. V. (1996). A survey of heterogeneous computing: Concepts and systems. Proc. IEEE, 84(8): 1127-1 143. Fisher, A. L., Kung, H. T., Monier, L. M., Walker, H., and Dohi, Y. (1983). Design of the psc: a programmable systolic chip, Proc. 3rd Calrech Conj.' on VLSI, pp. 287-302. Flynn, M. J. (1966). Very high speed computing systems. Proc. I E E E , 54(12): 1901-1909. Fountain, T. J. (1987). Processor Arrays Architecture und Applications. London: Academic Press. Gaskill, J. D. (1978). Linear Systems, Fourier Transforms, and Optics. New York: Wiley. Golay, M. J. E. (1969). Hexagonal parallel patern transformations, IEEE Trans. Computers, 18(8): 733-740.
304
R. C. STAUNTON
Goldstein, M. D. and Nagler, M. (1987). Real time inspection of a large set of surface defects in metal parts, Proc. SPIE, 849: 184.- 190. Gonzalez, R. C. and Woods, R. E. (1992). Digital Irnuye Processing. Reading, MA: Addison We sI ey . Guessoum, A. and Mersereau, R. M. (1986). Fast algorithms for the multidimensional discrete Fourier transform. I E E E A S A P , 34(4): 937-943. Guo, Z . and Hall, R. W. (1989). Parallel thinning algorithms: Parallel speed and connectivity preservation. Cornmuriications A C M , 32( 1): 124- I3 I . Handler, W. ( 1984). Multiprozessoren fur breite answendungsgebiete erlangen, general purpose array. G I NTG Fachtagung Architektur und Betrieb von Rechensystemen Informatik Fachbetrichte. Berlin: Springer-Verlag, pp. 195-208. Hanzal, B. R., Joseph, J. D., Cox, J. A,, and Schwanebeck, J. C. (1985). PtSi hexagonal detector focal plane arrays, Proc. S P I E , 570: 163- 17 1. Haralick, R. M., Sternberg, S. R., and Zhuang, X. (1987). Image analysis using mathematical morphology, I E E E Trans. P A M I , 9(4): 532-550. Hartman, P. and Tanimoto, S. (1984). A hexagonal pyramid data structure for image processing, I E E E Trans. S M C , 14(2): 247-256. Helmholtz, H. L. F. (191 I). Handbuch der Physiologischen Optik. Volume 2, Hamburg, Germany: Verlag von Leopold Voss. Helmholtz, H. L. F. (1962). Treatise on Physiological Optics. Volume 2 (Translated by J. P. C. Southall), New York: Dover Publications. Her, I. (1995). Geometric transformations on the hexagonal grid, I E E E Trans. Image Processing, 4(9): 1213-1222. Illingworth, J. and Kittler, J. (1988). A survey of the Hough transform, Computer Vision Graphics and Image Processing, 44: 87- 116. Inmos Ltd. (1989). T h e Transputer Databook, Second edition, Bristol, UK. Jang, B. K. and Chin, R. T. (1990). Analysis of thinning algorithms using mathematical morphology, I E E E Trans. P A M I , 12(6): 541 -551. Kamgar-Parsi, B. and Kamgar-Parsi, B. (1989). Evaluation of quantization error in computer vision, I E E E Trans. P A M I , ll(9): 929-940. Kamgar-Parsi, B. and Kamgar-Parsi, B. (1992). Quantization error in hexagonal sensory configurations, I E E E Trans. P A M I , 14(6): 665-671. Kobayashi, H., Matsumoto, T., and Sanekata, J. (1995). Two dimensional spatio-temporal dynamics of analog image processing neural networks, I E E E Trans. Neural Nerworks, 6(5): 1148-1164. Kovac, M. and Ranganathan, N. (1995). Jaguar-a fully pipelined VLSI architecture for J P E G image compression standard, Proc. I E E E , 83(2): 247-258. Lam, L., Lee, S. W., and Suen, C. Y. (1992). Thinning methodologies, a comprehensive survey, I E E E Trans. P A M I , 14(9): 869-885. Lee, J. W., Yang, M. H., Kang, S. H., and Choe, Y. (1997). An efficient pipelined parallel architecture for blocking effect removal in HDTV, I E E E Trans. Consumer Electronics, 43(2): 149- 156. Lee, J. W., Park, J. W., Yang, M. H., Kang, S. H., and Choe, Y. (1998). Efficient algorithm and architecture for post-processor in HDTV, I E E E Trans. Consumer Electronics, 44( 1): 16-26. Lenoir, F., Bouzar, S., and Gauthier, M. (1989). Parallel architecture for mathematical morphology, Proc. SPIE, 1199 471-482. Li, H. and Kender, J. R. (1988). Special issue on computer vision scanning the issue, Proc. I E E E , 76(8): 859-862. Li, H. and Maresca, M. (1989). Polymorphic torus architecture for computer vision. I E E E Trans. P A M I , 12(3): 233-243.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
305
Lin, T. P. and Hsieh, C. H. (1994). A modular and flexible architecture for real-time image template matching, I E E E Truns. Circuits and Systems: I-Fundumenlal Theory and Applicutions, 41(6): 457-461. Lougheed, R. M. and Mccubbrey, D. L. (1980). The cytocomputer a practical pipelined image processor, 7th. Int. Symposium in Coniput. Architecture, pp. 271-277. Lougheed, R. M. ( 1985). A high speed recirculating neighborhood processing architecture. Proc. SPIE, 534: 22-33. Luck, R. L. (1986). Using PIPE for inspection applications, Proc. SPIE, 730 12-19. Luck, R. L. (1987). Implementing an image understanding system architecture using pipe, Proc. SPIE, 489: 35-41. Luczak, E. and Rosenfeld, A. (1976). Distance on a hexagonal grid, IEEE E m s . Comput., 25: 532-533. Marescd, M., Lavin, M. A., and Hungwen, L. (1988). Parallel architectures for vision, Proc. I E E E , 76(8): 970-981. Matheron, G. (1975). Rundom Sets und Inreyrul Guometry, New York: Wiley. McCafferty, J. D., Fryer, R. J., Codutti, S., and Monai, G. (1987). Edge detection algorithm and its video rate implementation, Imuge and Vision Computing, 5(2): 155-160. McCormick, B. (1963). The Illinois pattern recognition computer- Illiac 3, IEEE Trans. Electronic Computers, 12(6): 791-X 13. McIlroy, C. D., Linggard, R., and Monteith. W. (1984). Hardware for real time image processing, IEE Proc. Part E, 131(6):223-229. Mersereau, R. M. (1979). The processing of hexagonally sampled two dimensional signals, Proc. IEEE, 67(6): 930-949. Mersereau, R. M. and Speake, T. C. (1981). A unified treatment of Cooley-Tukey algorithms for the evaluation of multidimensional DFT, IEEE Trrrns. ASSP, 29(5): 1011- 1018. Mersereau, R. M. and Speake, T. C. (1983). The processing of periodically sampled multidimensional signals, IEEE ASSP, 31( 1): 188- 194. Meyer, F. (1 988). Skeletons in digital spaces. In Imuye Analysis and Mathematicul Morphology, Volume 2: Theoretical Advances (J. Serra, ed.). London: Academic Press. Minami, T., Kasai, R., Yamaauchi, H., Tashiro, Y.. Takahashi, Y., and Data, S . (1991). A 300-mops video signal processor with a parallel architecture, IEEE J . Solid-state Circuits, 26(12): 1868- 1875. Mitchell, J. W. (1993). The silver halide photographic emulsion grain. J . Imaging Science and Technology, 37(4): 331-343. Mylopoulos, J. P. and Pavlidis, T. (1971). On the topological properties of quantized spaces: I1 connectivity and order of connectivity. J . Assoc. Comput. Machinery, 18(2): 247-254. Nudd, G. R.,Grinberg, J., Etchells, R. D., and Little, M. (1985). The application of three dimensional microelectronics to image analysis, In Inreyrared Technology ,fiw Purallel Imuge Processing (S. Leviadi, ed.). London: Academic Press, pp. 256-282. Nudd, G. R., Atherton, T. J., Howarth, R. M., Clippingdate, S. C., Francis, N. D., Kerbyson, D. J., Packwood, R. A,, Vaudin, G. J., and Walton, D. W. (1989). WPM: A multiple-simd architecture for image processing, IEE 3rd Inr. Con/: on Imuge Proc., Wurwick, U K , Publication No. 307, 161- 165. Nyquist, H. (1928). Certain topics in telegraph transmission theory, Trans. AIEE, 47: 6 17-644. Oakley, J. P. and Cunningham, M. J. (1990). A function space model for digital image sampling and its application to image reconstruction, Computer Vision Graphics and Image Processing, 49: 171-197. Olson, T. J., Taylor, J. R., and Lockwood, R. J. (1996). Programming a pipelined imageprocessor, Computer Vision unrl Image Understunding, 64(3): 35 1-367.
306
R. C. STAUNTON
Petersen, D. P. and Middleton, D. (1962). Sampling and reconstruction of wave number limited functions in n dimensional euclidean spaces, Information and Control, 5: 279-323. Preston, K. (1971). Feature extraction by Golay hexagonal pattern transforms, IEEE Trans. Computers, 20(9): 1007- 1014. Preston, K., DUE, M. J. B., Levialdi, S., Norgren, P. E., and Toriwaki, J. (1979). Basics of cellular logic with some applications in medical image processing, Proc. IEEE, 67(5): 826-856. Ray, S. F. (1988). Applied Photographic Optics. London: Focal Press. Reichenbach, S. E., Park, S. K., and Narayanswamy, R. (1991). Characterizing digital image acquisition devices, Optical Engineering, 30(2): 170-177. Rivard, G. E. (1977). Direct fast Fourier transform of bivariate functions, IEEE Trans. ASSP, 2 5 250-252. Rosenfeld, A. and Pfaltz, 1. L. (1968). Distance functions on digital pictures, Putt. Rec., 1: 33-61. Rosenfeld, A. (1970). Connectivity in digital pictures, J . Assoc. Comput. Machinery, 17(1): 146- 160. Rosenfeld, A. and Kak, A. C. (1982). Digital Picture Processing. Volume 1, New York: Academic Press. Ruetz, P. A. and Broderson, R. W. (1986). A custom chip set for real time image processing, Con$ ICASSP Tokyo, 801-804. Schroder, D. K. (1980). Extrinsic silicon focal plane arrays, In Charge Coupled Devices (D. F. Barbe, ed.). New York: Springer-Verlag. Sensiper, M., Boreman, G. D., Ducharme, A. D., and Snyder, D. R. (1993). Modulation transfer function testing of detector arrays using narrow-band laser speckle, Optical Engineering, 32(2): 395-400. Serra, J. (1982). Image Analysis and Mathematical Morphology. London: Academic Press. Serra, J. (1986). Introduction to mathematical morphology, Computer Vision Graphics and Image Processing, 3 5 283-305. Serra, J. (1988). Image Analysis and Mathematical Morphology. Volume 2, Theoretical Advances. London: Academic Press. Sharp, E. D. (1961). A triangular arrangement of planar-array elements that reduces the number needed, IRE Pans. Antennas Propagat., 3 445-476. Sheu, M. H., Wang, J. F., Chen, A. N., Suen, A. N., and Jeang, Y. L., Lee, J. Y. (1992). A data-resuse architecture for gray-scale morphologic operations, IEEE Trans. Circuits and Systems, II-Anulog and Digital Signal Processing, 39( 10): 753-756. Shih, F. Y. and King, C. P., Pu, C. C. (1995). Pipeline architectures for recursive morphological operations, IEEE Trans. Image Processing, 4(1): 11-18. Smith, R. W. (1987). Computer processing of line images a survey, Putt. Rec., 20(1): 7-15. Staunton, R. C. (1989). The design of hexagonal sampling structures for image digitization and their use with local operators, Image and Vision Computing, 7(3): 162-166. Staunton, R. C. and Storey, N. (1990). A comparison between square and hexagonal sampling methods for pipeline image processing, Proc. SPIE, 1194: 142-151. Staunton, R. C. (1996a). An analysis of hexagonal thinning algorithms and skeletal shape representation, Putt. Rec., 29(7): I131 - 1146. Staunton, R. C. (1996b). Edge detector error estimation incorporating CCD camera limitations, IEEE Norsig94 Signal Processing Conference, Espoo, Finland, pp. 243-246. Staunton, R. C. (1997a). Measuring the high frequency performance of digital image acquisition systems, IEE Electronics Letters, 33(17): 1448-1450. Staunton, R. C. (1997b). Measuring image edge detector accuracy using realistically simulated edges, IEE Electronics Letters, 33(24): 2031-2032.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
307
Staunton, R. C. (1998). Edge operator error estimation incorporating measurements of CCD TV camera transfer function, I E E Proc. Vision, Image and Signal Processing, 145(3):229-235. Sternberg, S. R. (1986). Greyscale morphology, Computer Vision, Graphics and Imuye Processing, 35: 333-355. Storey, N. and Staunton, R. C. (1989). A pipeline processor employing hexagonal sampling for surface inspection, 3rd Int. ConJ on Image Processing and Its Applicutions, IEE Conference Publication No. 307, 156-160. Storey, N. and Staunton, R. C. (1990). An adaptive pipeline processor for real-time image processing, Proc. SPIE, 1197: 238-246. Tanimoto, S. L., Ligocki, T. J., and Ling, R. (1987). A prototype pyramid machine for hierarchical celIuIar logic. In Purallel Computer Vision (L. Uhr, ed.). Boston: Academic Press, pp. 43-83. Tzannes, A. P. and Mooney, J. M. (1995). Measurement of the modulation transfer function of infrared cameras, Optical Engineering, 34(6): 1808-181 7. Ulichney, R. (1987). Digital Ha!ftoning, Cambridge, MA: MIT Press. Valkenburg, R. J. and Bowman, C. C. (1988). Kiwivision I1 a hybrid pipelined multitransputer architecture for machine vision, Proc. SPIE, 1004 91 -96. Wandell, B. A. (1995). Foundations of Vision, Sunderland, MA: Sinauer Associates Inc. Watson, A. B. and Ahumada, A. J. (1989). A hexagonal orthogonal oriented pyramid as a model of image representation in visual cortex, I E E E Trans. BME, 36(1): 97-106. Wehner, B. (1989). Parallel recirculating pipeline for signal and image processing, Proc. S P I E , 1058: 27-33. Whitehouse, D. J. and Phillips, M. J. (1985). Sampling in a two-dimensional plane, J . Physics A, Math. Gen., 18: 2465-2477. Zandhuis, J. A,, Pycock, D., Quigley, S. F., and Webb, P. W. (1997). Sub-pixel non-parametric PSF estimation for image enhancement, I E E Proc. Vis. Image Signal Process., 144(5): 285-292.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS. VOL. 107
The Group Representation Network: A General Approach to Invariant Pattern Classification JEFFREY WOOD lSlS Group. Depurtment of Electronics rind Cotnpuirr Science. University of Southnpoti. Soictliampion SO1 7 1BJ . U.K .
1. Pattern Classification and the Invariance Problem . . . . . . . . . . . . I1. Group Representation Theory . . . . . . . . . . . . . . . . . . . . . . A. Irreducible Representations . . . . . . . . . . . . . . . . . . . . . B. Direct Sum and Tensor Product of Representations . . . . . . . . . . C . Homomorphisms and Intertwining Spaces . . . . . . . . . . . . . . D . Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E . Frobenius Reciprocity . . . . . . . . . . . . . . . . . . . . . . . . F. Special Classes of Representations . . . . . . . . . . . . . . . . . . 111. Linear and Nonlinear Concomitants . . . . . . . . . . . . . . . . . . A. Linear Concomitants . . . . . . . . . . . . . . . . . . . . . . . . B. Transmutation . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
310 313 313 . 316 . 316 320 324 . 325 . 329 330 331 C . Fixed Weight Group Representation Networks . . . . . . . . . . . . . 335 D . Redundancy of Noninduced Representations . . . . . . . . . . . . . . 341 1V. Adaptivity in Group Representation Networks . . . . . . . . . . . . . . . 344 A. Parameterized Homomorphisms . . . . . . . . . . . . . . . . . . . 345 B. Parameterized Homomorphisms for Induced Representations . . . . . . . 348 C . Number of Parameters and Parameter Reduction . . . . . . . . . . . . 354 D . Algorithm for Group Representation Network Construction . . . . . . . 356 E . Symmetry Networks . . . . . . . . . . . . . . . . . . . . . . . . 361 V . Practical Considerations and Simulations . . . . . . . . . . . . . . . . . 362 A. Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 362 B. The Learning Process . . . . . . . . . . . . . . . . . . . . . . . . 363 C . Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 D . Discriminability . . . . . . . . . . . . . . . . . . . . . . . . . . 364 E. Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 VI . The Computational Power of the Group Representation Network Model . . . 370 A. Polynomial Invariants . . . . . . . . . . . . . . . . . . . . . . . 372 B. Construction of Basic Invariants . . . . . . . . . . . . . . . . . . . 376 VII . The Group Representation Network and Other Invariant Classification Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 A. Integral Transform Invariants . . . . . . . . . . . . . . . . . . . . 378 B. Fast Translation-Invariant Transforms . . . . . . . . . . . . . . . . 384 C . Invariant First Order Networks . . . . . . . . . . . . . . . . . . . . 388 D . Higher-Order Neural Networks . . . . . . . . . . . . . . . . . . . . 389 E. Moment Invariants . . . . . . . . . . . . . . . . . . . . . . . . . 390 VIII . Summary and Open Questions . . . . . . . . . . . . . . . . . . . . . 391 Proof of Theorem 111.1 . . . . . . . . . . . . . . . . . . . . . . . . . 395 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
309
Volume 107 ISBN 0-12-014749-1
Crown copyright 1995. Published with permission of DERA on behalf of the Controller of HMSO. All rights of reproduction in any corm rrscrved . 1076-5670/99 $30.00
310
JEFFREY WOOD
I. PATTERN CLASSIFICATION AND
THE
INVARIANCE PROBLEM
Any computational scheme for the modeling of an unknown function requires prior knowledge (even if that knowledge amounts only to an estimate of the complexity of the function to be modeled). Furthermore, the more prior knowledge is used by the modeling algorithm, the greater its performance is expected to be. This is a crucial consideration in classification problems such as pattern recognition. In this context, prior knowledge can take many forms: for example, in an image classification problem, that the objects of interest are dark; or in a signal processing problem, that the important part of the signals has frequency in a given bandwidth. Often, the incorporation of prior knowledge involves construction of the pattern classifier in such a way that certain properties of the input data are transparent to it. We will be concerned here with the problem in which the prior knowledge to be included is that there are certain transformations of the input domain under which the classification remains unchanged. Rephrased, the pattern classifier should be constructed in such a manner that it ignores the application of such input transformations; that is, it should be invariant under those transformations. More specifically, we are going to suppose that the transformations concerned are linear, and that they form a group. Although slightly restrictive, the first assumption covers many problems of interest; the second assumption is quite natural, and amounts only to supposing that each invariance transformation is invertible. The problem now becomes one of invariance under a given group (or more precisely, under a given group representation). Examples of group invariance problems include the following:
1. Recognition of a signal, independently of linear translations along the time axis. 2. Character recognition, the classification being unchanged by translation in two dimensions. 3. Image identification, independently of two-dimensional (2-D) translation, rotation, and scaling. 4. Classification of an object specified by three-dimensional (3-D) coordinates, invariant under the 3-D rigid motion group (i.e., the group of linear transformations that can be performed on a physical object). 5. Checking for membership of a binary string in a given cyclic code, where cyclic permutation will not change the membership property of a string.
THE G R O U P REPRESENTATION NETWORK
31 1
6. The bipolar parity problem, in which the classification of an n-bit string is invariant under multiplication by - 1 of an even number of bits. 7. Recognition of a graph, independent of permutations of the vertices (i.e., the graph isomorphism problem). The author proposes a highly general model for the group invariance problem. This model is called the group representation network (GRN). In principle, a GRN can be constructed for any linear transformation invariance problem, though to date the supporting theory has only been developed for the case of a finite (or compact) invariance group. This universality makes the GRN particularly useful for problems for which there exist no established methods for producing invariant pattern classifiers: that is, those for which the invariance group is ‘‘unusual.” Keeping this basic intention in mind, our approach to the invariance problem will be discussed and the contents of this chapter will be described. There are two principal ways of solving the invariant pattern classification problem. The first is to extract a set of features from the inputs that are invariant under the given group, and then to process these features using some standard pattern classifier. Examples of this method include Fourier analysis or integral transform-based methods (Caelli and Liu, 1988; Sheng and Arsenault, 1986; Wechsler, 1987), and the use of moment invariants (Hu, 1962; Li, 1992). The second method is to build an adaptive invariant, that is, a function that is parameterized (and can thus be adapted to learn a desired mapping) and that remains invariant under the prescribed transformations for ail values of these parameters. The second method includes a number of neural network-type approaches, such as higher-order networks (Giles and Maxwell, 1987; Spirkovska and Reid, 1992). The first method is conceptually easier but involves certain difficulties. When performing an initial feature extraction, there is the problem of choosing the features. These features have to be invariant under the transformation group, and they must contain enough information for the pattern classification process. In addition, it is important that the extracted features do not have additional invariances that will render the classification problem impossible. Often these problems are solved by choosing the features to be a complete set of invariants, that is, a set of invariants under the transformation group which allow an arbitrarily accurate distinction between any two inputs not in the same orbit. However, the construction of a complete set of invariants is, in general, a difficult task. Moreover, the principle of Occam’s Razor dictates that it is important not to use too many features, as this may result in an unnecessarily complex classification system. We will instead advocate the second method of constructing adaptive invariants. Our approach will be to identify a class of basic building blocks
312
JEFFREY WOOD
or modular units. These units will each be parameterized, and, furthermore, each will have a property (in some sense) of transmitting the group action from its inputs to its outputs. This will be explained in detail in Section 111. By connecting these modular units together in a very flexible fashion, we will be able to build invariant systems of arbitrary functional complexity. These are group representation networks (GRNs). We can match the complexity of the GRN to the complexity of the invariance problem. Furthermore, although our construction method requires knowledge of the group’s representations, it does not require knowledge of the group’s invariants. As information on the representations of most groups is more readily available or more easily constructed than information on the group’s invariants, the general problem of invariant pattern classification becomes more tractable with this approach. Another advantage of our approach is that it generalizes readily to problems where the action of the group on the system output (the classification domain) is linear but not trivial. This is a problem of group concomitance rather than one of invariance. An example might be a signal processing problem in which a translation of the input is required to result in the same translation applied to the output. Section I1 provides background material on group representation theory. Section 111 introduces the GRN, and defines and analyzes the simple “modular units” comprising it. This section also shows that a GRN can be naturally viewed as an artificial feedforward neural network. The next section introduces the adaptivity into the GRN and provides formulas for its parameterized structures (the weight matrices), and an algorithm for general GRN construction. Section V discusses the practical issues of learning, discriminability, and generalization in a GRN, and describes several simulations in which GRNs are shown to have better learning and/or generalization performances than comparable neural networks without in-built invariance. Section VI presents the conjecture that any group invariant can be approximated to an arbitrary desired degree of accuracy by some GRN. In support of this, it is shown that any polynomial invariant under a real finite-dimensional representation of a finite group can be computed exactly by a GRN. Section VII adds further weight to the conjecture by demonstrating that many standard invariants used in pattern recognition (e.g., integral transform invariants) can be viewed as GRNs. Section VIII provides a summation. This material is taken largely from the author’s thesis (Wood, 1995). The concept of a group representation network (GRN) is based on the symmetry networks of Shawe-Taylor (1989; 1993), which form an important subclass of GRNs. The structure of a simplified GRN model was discussed in Wood
THE GROUP REPRESENTATION NETWORK
313
and Shawe-Taylor, 1996a. The results relating other invariant pattern classification techniques to GRNs have been presented in Wood and Shawe-Taylor, 1996b. The general form of the GRN model and the discussion of computational power in Section VI have not previously appeared in the literature, other than in the form of a thesis (Wood, 1995).
11. GROUPREPRESENTATION THEORY
This section contains a discussion of some elements of group representation theory, which is central to the approach of this paper. For further information, consult any book on representation theory, for example Cohn (1989), Fulton and Harris (1991), or Ledermann (1977). Another very appropriate summary of representation theory and its application to image processing problems is given by Lenz (1990). While the discussion here is limited to finite groups, it also applies (through the use of tools such as Haar integrals (Lenz, 1990)) to finitedimensional representations of compact groups, including, for example, the natural representations of the classical Lie groups U(n), O(n), SU(n), SO(n). A. Irreducible Representations
We assume the reader is familiar with the notions of a (not necessarily commutative) group, a subgroup, and a group action. We will also need the notion of conjugacy: a group element g~ G is said to be conjugate to any element of the form s-’gs, S E G. A representation of a group is essentially a linear action of that group on some vector space. More formally, a representation of the group G over the field F is a mapping A from G to the set GL(V) of all invertible linear operators on some vector space V over F , which satisfies: VY,,
92EG
A(g 1 9 2 )
= 4 9 ,M(9 2).
(1)
Thus a representation of a group defines a linear action of the group on some vector space. Note that to specify a representation, it is sufficient to specify it for a set of generators of the group.
Example II.1 1. The natural actions of the classical Lie groups, for example, the action of SO(2) on the space V = R2 by rotation about the origin. 2. The natural “permutation representations” of the cyclic groups C,, acting by permutation of components on the space R”.For example,
3 14
JEFFREY WOOD
the group C , = {e, gl,g;}, defined by g: according to the representation:
=k ; ;I. 1 0 0
= e,
acts on the space V = R3
3, =I ; ;I.
0 0 1 49,)
=(; ;
0 1 0
We will often use the “cycle notation” for this and other groups: g1 = (1 2 3), g: = (1 3 2). This notation, which refers to a permutation group’s abstract structure, means that g1 sends 1 to 2, 2 to 3, and 3 back to 1, whereas gt does the opposite.
3. C , has a natural permutation representation similar to that just discussed for C,. Another representation of C , is the one-dimensional (1-D) complex representation A(e) = (I),
A((1
A((1 3)(2 4)) = ( - l),
2 3 4)) = (11, A((1 4 3 2)) = ( - I ) ,
acting on the space V = G. 4. Consider the symmetric group S, of all permutations of n objects. One representation of S, is given by permutation of the components of R”, as in Example 2 for the cyclic group. However, the group S , has another action on the set X of all unordered pairs (i,j) of distinct elements of { 1,. . . ,n}: a group element g takes (i, j ) to (g(i), g( j)). We can interpret this action as the permutation of the edges of an n vertex graph induced by a permutation of the vertices. The given action induces an action of S, on the set of all functions f from X to R, according to (gf)(i, j) = f ( g - l ( i ) , g-’( j)). This is a linear action of S,, that is, a representation of the group. In the case n = 4, there are six distinct unordered pairs of elements, and so the vector space in question is the space I/ = R6.We can identify the natural basis vectors with the unordered pairs (1,2), (1,3), (1,4), (2,3), (2,4) and (3,4), respectively. With this correspondence, and using the standard cycle notation again, the representation is generated by
A((1 2 3 4 ) ) =
0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
I
o ’
0 1 0 0 0 0 1 o o /
\ o o o
THE GROUP REPRESENTATION NETWORK
i
315
1 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 1 5. The group R (with group operation addition) acts on the set V of all functions from R to C by translation, that is, for any real z, an operator A(z) on this set of functions is defined for each u : R t* C by V ~R E
(A(r)u)(t)= u(t - z).
(2)
This defines a representation A of the group R. Another representation B is defined on the same set of functions by vs E R
(B(z)u)(s) = e2n'sru(s).
(3)
The representations of C , and S, that we have discussed are examples of permutution representations, which are discussed fully in Section 1I.F. Representation theory is often couched in terms of modules (the module corresponding to the representation A in the foregoing is the vector space V on which A acts together with the given group action), but because we will deal exclusively with modules in which a basis has been chosen, it seems more natural to work with the matrices of the representation themselves. For any representation A , we denote by Q ( A ):= V the corresponding module. A finite-dimensional representation is one for which the space V is finite-dimensional. All the examples in the preceding list, except for the last, are finite-dimensional. We will deal here almost exclusively with finitedimensional representations. We will also assume that the field F is fixed and has characteristic zero (e.g., F = R, C). A representation A of a finite group G is called irreducible if Q(A) has no proper nontrivial subspace which is mapped into itself by each g~ G under the action A. Otherwise, A is called reducible.
Example 11.2 For any group, the trivial representation A(g) = 1 for all Y E G is irreducible. Many groups also have alternating representations, which are 1 -D, irreducible, and defined by the property that A(g) = -t 1 for all g E G. One-dimensional representations are of course always irreducible. For the group C,, the irreducible representations are all I-D, and are of the
316
JEFFREY WOOD
form Rk(gI1)= (e'nr(k) for k = 1 , . . . , n. For example, the irreducible representations of C , are generated by R,(g,) = 1, R,(g,) = 1 , R,(g,) = - 1 and R,(g = - z. The natural representation of the 2-D rotation group SO(2) is irreducible. It is clear that IRz does not have an invariant subspace under this representation. Consider the representation of the symmetric group S,, which acts on IR2 according to: the permutation (1 2 3) is a rotation of 120" about the origin, and the permutation (1 2) is a reflection in the y-axis. This is irreducible, as again there is no stabilized subspace. The natural representation of C , discussed in Example 11.1.2 is reducible, because the span of the vector (1,1, l)T is mapped into itself by cyclic permutation. The representation of the additive group IR discussed in Example 11.1.5 is reducible because the subspace of affine functions v(t) = at + h is mapped into itself by the group action. 1)1!!1
4.
5.
6.
7.
B. Direct Sum and Tensor Product of Representations Consider two finite-dimensional representations A and B of a group G on respectively. Then the direct sum of A and B, vector spaces V and denoted A 0 B, is a representation of G on V 0 W defined by:
The tensor product of A and B, denoted by A 0 B, is a representation of G on V 0 W defined by ( A 0 N g ) := A(g) 0
(5)
where 0 on the right-hand side denotes the tensor product (Kronecker product) of linear transformations (matrices). We similarly define the tensor powers @ * A of any representation A . Note that matrices P , Q, R, and S of appropriate dimensions satisfy ( P 0Q)(R 0 S ) = P R 0Q S .
(6)
C. Homomorphisms and Intertwining Spaces Two representations A and B of a group G are said to be equivalent if there exists an invertible linear transformation T satisfying Vgg G
TA(g) = B(g)T
(7)
THE GROUP REPRESENTATION NETWORK
317
This means that A and B can be thought of as the same action on the same space, but defined with respect to different bases. A representation A over a complex field is unitary if the representative matrices A(g) are unitary; when the underlying field is real, the representation is said to be orthogonal and we have A(g)T = A(g)-' = A ( g - ' ) for all gE G. The following result is standard (e.g.. Cohn, 1989; Lenz, 1990): Lemma 11.1 Any jinite-dimensional complex representation of a ,finite group is equivalent to a unitary representation. any finite-dimensional red representation of a jinite group is equivalent to an orthogonal representation.
Let G be a group and let A and B denote two representations of G. A concomitant from A to B is a function 4 : R(A) F+ R(B) with the property: vg E G
4
4.
(8) If the concomitant is a linear map (of vector spaces), then it is called a homomorphism or intertwining operator from A to B. A group invariant can be seen to be a concomitant from a given representation of G to the trivial representation. Two representations are equivalent if and only if they are isomorphic (i.e., there is an invertible homomorphism from one to the other). 0
A(d
=
B(g)
Example 11.3 1. A famous example of an intertwining operator is the discrete Fourier transform (DFT). One of the defining properties of the D F T is that cyclic translations in the original signal domain become componentwise phase shifts in the spectral domain, for example, in the fourdimensional (4-D) case:
hDFT
/ 0 0 0 1 \ 1 0 0 0 0 1 0 0 0 1
I 1 0 0 1 0 0 0 -1 0 0
o\
i, J=l: 4jhDFT5 0
(9)
where hDFTis the matrix representing the 4-D DFT operator. It follows that similar identities hold for other cyclic translation operators, and Eq. (9) essentially states that the D F T is a homomorphism from the natural permutation representation of the cyclic group C , (see Example 11.1.2) to the direct sum of all irreducible representations (see Example 11.2.3). The DFT is also invertible, that is, an isomorphism, and these two representations of C , are thus equivalent. Analogous results hold for a D F T of any order/for any cyclic group. This observation is fundamental to group-theoretic harmonic analysis (Clausen and Baum,
318
JEFFREY WOOD
1993; Hewitt and Ross, 1979). We will return to the D F T in Section VII. The continuous Fourier transform has a similar property with respect to certain representations of the additive group R, as indicated in Example 11.1.5. 2. Consider three representations of the symmetric group S,. The first is the natural 3-D permutation representation A . Second, we have the trivial representation B, and third the irreducible 2-D representation C given in Example 11.2.5, where (1 2 3) is a 120" rotation and (1 2) a reflection. These representations are generated by A((1
2
J,
0 0 1 3)) = (o1 0 0
=(;
0 1 0
AK1
2))
-
1 3 4
B((1 2 3)) = B((1 2)) = 1,
= o - 2 $
2 -1
3+$ l+$
1+$
1
0 -2-2$
T H E G R O U P REPRESENTATION NETWORK
319
The invertible linear operator
is therefore an isomorphism from A to B 0C . Homomorphisms from A to B or from A to C can be extracted as submatrices of W 3. As another example of a noninvertible homomorphism, consider the representation A of S , given in Example 11.1.4 as the action induced by the permutation of edges on a 4-vertex graph:
i A((1
0 0 0 1 0
2 3 4)) =
0 0 0 0 1
1 0 0 0 0
0 0 0 0 0 1
0 1 0 0 0
0 1 0 0
0 0 1 0
\ o o o
i
1 0 0 0
0 0 0 1
0 0 0 0
\
0 0 1 0 ’ 0
oo/
‘i
0 0 0 0 ’
As a second representation of S,, consider the natural permutation representation given by
0 0 0 1 1 0 0 0
0 1 0 0 1 0 0 0
0 0 1 0
0 0 0 1
A homomorphism from A to B is given by the following matrix as can easily be checked (it is sufficient to show that WA((1 2 3 4)) = B((l 2 3 4))W and WA((1 2)) = B((l 2 ) ) W ) :
(10) PZ
P2
PI
P2
P1
320
JEFFREY WOOD
where p 1 and pz can be any real values. In fact, W can be viewed as a general incidence matrix for the fully connected 4-vertex graph. The set of all homomorphisms from A to B can easily be seen to be a subspace of the vector space of all linear transforms from Q(A) to Q(B). This set is called the intertwining space of the pair ( A , B), and is denoted by Hom,(A, B), or by Hom(A, B ) if the group is understood from the context. The dimension of the intertwining space of (A, B ) is called the intertwining number of ( A , B). For a finite group, the intertwining number of ( A , B) is the same as that of (B, A ) and so we can simply refer to the intertwining number of A and B. Schur’s Lemma is another classical result of representation theory (e.g., Cohn, 1989; Ledermann, 1977).
Lemma 11.2 Schur’s Lemma Let A und B be irreducible ,finite-dimensional representations of u ,finite group G over a field of characteristic zero. Let W be a homomorphism,from A to B. Then: 1. W is either invertible or 0. 2. I f A and B are equul then W
= k l for
some scalar k.
Schur’s Lemma states that the intertwining number of two inequivalent irreducible representations is 0, whereas the intertwining number of two equivalent irreducible representations is 1.
D. Churacters Let A denote a finite-dimensional representation of a group G over the field F . The character of A is a function x [ A ] : G H F , defined by: vg E G
1I[AI(g) = Trace(&)).
(11)
The character is so-called because in many senses it characterizes the representation; for example, it is easy to show that equivalent representations have the same character. The character is also constant on conjugacy classes, for example, for S , it must hold that x[A]((l 2 3))= x[A](( 1 2 4))= )1[A1((2 4 3)) = . ..,etc. Example 11.4 1. For the natural permutation representation A of C, described in Example 11.1.2, x[A](g) = n if g = e, and x[A](q) = 0 otherwise. More generally, for any permutation representation A of a group G, we see that x[A](g) equals the number of basis elements of Q[A] which are fixed by g.
321
T H E G R O U P REPRESENTATION NETWORK
2. Consider the permutation representation B of S,, induced from the permutation of the edges of a 4-vertex graph (Example 11.1.4). The character of this representation is more complex; we have xCBl(e) = 6 X[B1((1
2 3 4))
= xCBI((1
2 4 3)) = XCBI((1 3 2 4))
= XCBI((1
3 4 2)) = X[B1((1
= x[B]((l
4
4 3 2))
3 2)) = 0
XCSl((1 2 3)) = XCBl((1 3 2)) = XCBl((1 2 4)) = XCBI((1 4 2)) = XCBI((1
3 4)) = XCBl((1 4 3))
= x[B]((2
4 3)) = 0
XCBI((1 a 3
4)) = XCBI((1 3x2
=
XCSl((2 3 4))
4)) = XCBI((1 4"
3)) = 2
XCBI((1 2)) = XCBI((1 3)) = XCBI((1 4)) = xCBI((2 3)) =X
C
4))~ = ~ c m 34)) = 2.
3. The irreducible representation A of S , given in Example 11.2.5 has the following character: xlIAl(e) = 2
XCAI((1 2 3))
xCAl((1
= XIIAI((1
2)) = XCAI((1 3))
3 2)) = - 1
= XIIAI((2
3)) = 0.
For any finite group G, the functions from G to a subfield F of G forin a Hilbert space with inner product vflt f 2 :
GH F,
1
G:=
IG(
.f;(g)f'2t(g),
(12)
G
where denotes complex conjugate. The following standard result uses this Hilbert space to characterize intertwining numbers. Lemma 11.3 Let A and B he two,finite-dimensional representations of afinite group G oiler a suhjield of G;then the intertwining number o f A and B is equal to the inner product of x[A] and x[B]. The intertwining number of two representations can generally be readily calculated as the inner product of characters. We will find this invaluable in the design of invariant pattern classification systems.
322
JEFFREY WOOD
Example 11.5 1. Consider the two permutation representations A and B of S, discussed in Example 11.3.3. The intertwining number of A and B is found as follows (using the fact that characters are constant or conjugacy classes):
= 2,
which means that the space of homomorphisms from A to B (or vice versa) has dimension 2. In fact, as we shall see later, all homomorphisms from A to B have the form of Eq. (10) in Example 11.3.3. 2. Let A denote the irreducible representation of S, considered in Example 11.2.5 and Example 11.4.3, and let B denote the trivial representation, which is also irreducible. Now the inner product of the characters of these representations is 1 (~c.41, XCBI>~ = ~ x c A I ( ~ ) X C B I ( AN((^ ~)+~
1 6
+ 2.-
= ~(1.2.1
1.1
2 ~ ) ) X C B I ( (2~ 3))
+ 3.0.1)
= 0.
meaning that the only homomorphism from A to B is 0. This confirms Schur's Lemma. In general, the inner product of distinct characters of irreducible representations must be 0. The following standard results are central to representation theory:
T H E GROUP REPRESENTATION NETWORK
323
Lemma 11.4 Let G be a j n i t e group. 1. G has a finite number of irreducible representations (to within equivulence), equal to the number of conjugacy classes of the elements of G. 2. Any character is constant on conjugacy classes. 3. For two irreducible representations A and B of G,
Furthermore, ( x [ A ] ,x[A])o= 1 fi and only if A is irreducible. 4. Any finite-dimensional representation of G is equivalent to a direct sum of irreducible representations: K
A
E
@(gniRi), i= 1
where n,, . . .,nKE N and R,, .. . ,R Kare the inequivalent irreducible representations of G. Such a decomposition is unique (to within permutation of the indices i). 5. The multiplicities ni in the decomposition Eq. (14) are the respective intertwining numbers of A and Ri, that is, ni = ni(A):= (xC.41, X C R ~ I ) ~ .
(15)
6. The intertwining number of two finite-dimensional representations A and B is given by K
dim Hom,(A, B ) =
n,(A) -ni(B).
(16)
i= 1
Proof The last result is less standard, so we derive it from the others. By Lemma 11.3, the intertwining number is given by the product of the characters of A and B. From the decomposition of Eq. (14), the character of A can be written K
xCAI
=
C ni(A)~CRil,
i= 1
and similarly for x [ B ] . Now we use the linearity of the inner product, and apply law 3 to derive the required result. The results of Lemma 11.4 are very helpful in constructing the irreducible representations of a given finite group, and their characters. The characters of a group are often displayed in a character table, which lists the characters of all the irreducible representations in terms of the value each takes on each
324
JEFFREY WOOD
conjugacy class. For example, for S, we have:
S, Rl R2 R3
R, R5
I
e
(1 2)
1 1
1 -1 1 -1 0
3 3 2
3)
(1 2 1
(1 2 3 4) 1 -1 -1 1 0
1 0 0 -1
(1 2)(3 4) 1 1 -1 -1 2
Note that the first column of the table gives the dimension of the corresponding representation.
E. Frobenius Reciprocity Frobenius reciprocity is a useful tool for relating homomorphisms between representations of a given group to those between representations of its subgroups. Let G be a group, and let H denote an arbitrary subgroup. Let A be a representation of G, and denote by res,GA the restriction of A (as a function) to the subgroup H; this is clearly a representation of H and is called the restriction of A to H. Now let G be a group with subgroup H of finite index m. The right cosets H iof H are the sets H i = Hg E G, g E G. When G is finite, the cosets are of equal size IHI and they partition the group; thus there are m cosets. The cosets are defined by a set of right coset representatives, that is, a set hi E H i for i E 1,. . . ,m. Each right coset representative defines its right coset by H i= Hhi. Now let A denote a representation of H, then the induced representation of G from A , denoted by indZA, is defined by the formula:
,
V ~ E G(indzA)(y) :=
T m , ... where the submatrices
(17)
Tmm
qj are given by:
T j : = {;(high,:
I)
if highJ: E H otherwise.
We also say that indtA is induced from the representation A. The representation induced depends upon the choice of right coset representatives, but all such representations are equivalent.
T H E G R O U P REPRESENTATION NETWORK
325
The following theorem (see e.g., Mackey, 1976) describing a relationship between restricted and induced representations is crucial here. Theorem 11.1 Frobenius Reciprocity Theorem Let G be a ,finite group with szthgroicp H, let A be a ,finite-dirneiisionul representation of G and B he N finite-dimensional representation of H. Then: 1. The intertwining space HomG(A, indEB) is isomorphic to the space Hom,(res;A, B). 2. (x[IA1, %[IindEB])G = (x[IresEA1l %CB1>H.
Examples of an induced representation and Frobenius reciprocity are given in the next section, following a discussion of permutation representations. F. Special Clusses of Represcntutions
Any subgroup H of a given G defines a permutation action of G on the (right) cosets of H in G. This is given for any right coset H i by g ( H J := H i g . A finite-dimensional representation A of a group G is called a permutation representation if every matrix A ( g ) has exactly one ‘1’ entry in each row and in each column, and all other entries are ‘0’. The term “permutation representation” comes from the observation that the representation acts by permuting the vectors of the natural basis. Conversely, consider a finite set X on which G acts by permutation. This action induces a linear action of G on the space of all functions from X to any field F , which (with an appropriate choice of basis) is a permutation representation. This situation is particularly interesting when the action of G on X is transitive, that is, for any pair of elements x1,x2 of X there exists a group transformation mapping xl to x2. In this case, we can find a corresponding action of G on the cosets of some subgroup. To do this, pick an element x of X , and construct the subgroup H of G which fixes x (hx = x for all h E H). The given permutation action is then identical to the permutation action of G on the right cosets of H. The permutation representation P can be constructed by ordering the right cosets of H, say H , = H, H,, . . . ,H,. Now we construct the n x n permutation matrices P(g) element-wise according to
It is easy to see that this is a permutation representation and is furthermore induced from the trival representation of the subgroup H. Such a permuta-
326
JEFFREY WOOD
tion representation, which acts transitively on the natural basis vectors, is said to be transitive. Given a transitive permutation representation P, we can recover a corresponding subgroup and cosets by:
Hj = {gEG(P(g),,j = 1);
H = H,.
(20)
An important example of a permutation representation is the so-called (right) regular representation of G, which is induced from the trivial representation of the trivial subgroup H = { e } , and thus describes the (right) action of G on its own elements. The dimension of the regular representation is equal to the order of the group, and the regular representation is equivalent to a direct sum of irreducible representations, in which each irreducible representation occurs a number of times equal to its dimension. Example 11.5 The natural permutation representation of the cyclic groups C,, discussed in Example 11.1.2, is induced from the trivial representation of the trivial subgroup {e}. This is the right regular representation of C,, and it is equivalent to the direct sum of all n (I-D) irreducible representations, as discussed in Example 11.3.1; an isomorphism is the DFT. Here is an example of Frobenius reciprocity. Take G = S,, the symmetric group of degree 4, and H to be the subgroup
of all elements which fix the object 1. Set H, = H, H, = H (l 2), H, = H(l 3), H , = H(l 4). Take C to be the trivial representation of H; then indEC is a 4-D representation given by
indsC(y)i,j=
1 ifHig= H j 0 otherwise.
which is precisely the natural permutation representation B given in Example 11.3.3. Now take A to be the six-dimensional (6-D) permutation representation of G again in Example 11.3.3, and consider the restriction of A
327
THE GROUP REPRESENTATION NETWORK
to
H,generated by
l o o 0 1 0 0 0
1 0 0 0 0 0
0 0 0 0 1
0 0 1 0 0
0 0 0 ' 1 0
0 1 0 3)) = 0 0 0
1 0 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 0 1
0 0 0 0 1 0
A((2 3 4))
=
I i
4 2
o o o \
1 0 0 0 0
1 'I,
i
Now a homomorphism from resSA to the trivial representation C of H is given by W'=(p,
P1
P2
P1
P2
(21)
P2)>
where p 1 and p2 are any real values. By consideration of the inner product of characters (Lemma 11.3), it is not hard to see that any intertwining operator between these representations must have the form of Eq. (21). Now Frobenius reciprocity states that there is a one-to-one linear mapping between homomorphisms W' as in Eq. (21), and homomorphisms W from A to B = indGC, as constructed in Eq. (10). Such an isomorphism is given in this case by
(PI
PI
PI
P2
P2
P2)-
/PI
P1
P1
P2
P2
P2\
PI
P2
P2
P1
PI
P2
P2
PI
P2
P1
P2
P1
P2
P2
P1
P2
P1
P1
i
1.
The concept of a permutation representation is standard; however we will have recourse to consider certain other classes of group representation, which to our knowledge have not been considered previously in the literature.
Definition 11.1 Let A be a finite-dimensional representation of a group G over the real field. A is called a perm-diagonal representation if every row and column of each A(g) contains exactly one nonzero entry. A is called an inversion representation if every row and column of each A ( g ) contains
328
JEFFREY WOOD
exactly one nonzero entry, which is k 1. A is called a positive representation if every entry of each A ( g ) is nonnegative. A is called a unit-row representation if the entries in each row of each A(g) sum to 1. For any perm-diagonal representation A, we can write A(g) = D(y)P(y), where D(y) is nonsingular and diagonal, and P(g) is a permutation matrix (hence the name, “perm-diagonal”). We denote this by A = DP, and we remark that P is also a representation of G, called the underlying permutation representation of A (Wood, 1995). Furthermore, any positive representation is also perm-diagonal; see Lemma A.l in the Appendix that begins on p. 395. A perm-diagonal representation is said to be transitive if its underlying permutation representation is transitive. We will find that inversion representations are particularly interesting. The concept of an inversion representation is motivated by the following idea: consider a group that acts on a set of coins in two ways-by permuting them and by inverting (flipping) them. This induces an inversion representation of G on the functions from this set of coins to a field F in a similar way that a permutation representation is induced from a permutation action of a group. An important subclass of inversion representations consists of those induced from an alternating representation of some subgroup. Example 11.6 1. A perm-diagonal representation of the group S,:
lo
0
2\
\o
0
lo 2 3))
=
0
-1
__
-3,
0
-I/ 0
-2\
THE GROUP REPRESENTATION NETWORK
329
2. An inversion representation of the group C,:
A((l
3
2)) =
[” “I 0 0 -1
111. LINEARAND
0
NONLINEAR CONCOMITANTS
The most important property of the GRN model we are going to introduce is that it is modular, in the sense that any GRN can be decomposed into some fundamental building blocks. This is more interesting viewed in terms of synthesis: these building blocks can be combined in a very flexible manner to produce GRNs. Our principal requirement of an invariant pattern classifier is that is is invariant under the action of some representation A of a group G. The function computed by a given classifier is therefore a concomitant from A to the trivial representation of G. In fact, the theory that follows can be equally easily applied to the construction of systems with general concomitance properties (Wood, 1995). The obvious way to decompose a concomitant of group representations is into other (in some sense, simpler) concomitants. This can be done in two ways: Given three representations A , B, C of a group G, a concomitant Cbl from A to B and a concomitant ( p 2 from B to C, clearly (b20Cbl is a concomitant from A to C. Alternatively, if q51 is a concomitant from A t o C and (p2 is a concomitant from B to C, then we construct a concomitant from A @ B to C as follows: VVI
E W ) , V,EQ(B),
(b3(V1, V 2 )
= (bI(V1)
+ 42(V2).
These compositions correspond to the graphical structures shown in Fig. 1 . By connecting such concomitants in a conceptually hierarchical structure, we can construct highly complex structures in the form of a directed acyclic graph where each node corresponds to a group representation and each edge to a concomitant between the corresponding pair of representations. This solves the problem of how to combine “basic concomitants” together in a complex fashion, but we still have the question as to what these basic concomitants should be. An obvious starting point is the class of linear concomitants or intertwining operators. By themselves, these are certainly
330
JEFFREY WOOD
FIGURE1. Connecting concomitants in a directed acyclic graph structure.
not enough, as the addition and functional composition of linear operators yields only more linear operators! However, by combining linear concomitants with a special highly nonlinear class of concomitants (to be introduced in Section II1.B) we will be able to construct complex concomitant functions. Our next task is to analyze the structure of linear concomitants. A . Linear Concomitants
As discussed in Section 11, a linear concomitant between two representations A and B of a group G is precisely a homomorphism or intertwining operator from A to B. We consider exclusively the case where G is finite and A and B are finite-dimensional (the extension to the case where G is compact requires a change of summations to Haar integrals (Wood, 1995)). In the case where A and B are irreducible representations, such concomitants are described fully by Schur's Lemma 11.2. In the general case, there is a formula parameterizing all such homomorphisms. Lemma 111.1 Let A and B be two representations of ajinite group G dejined over afield F. Then any homomorphism W from A to B is of the form W=
1 B(s)X,A(s-') S€G
for some linear transform X p over F , and conversely.
Proof Let W be a homomorphism from A to B. From the definition of a 1 homomorphism of group representations, we find that X , = - W satisfies (GI the equation of the lemma, so W is of the required form. The converse is to prove that W defined by Eq. (22) satisfies WA(g) = B(g)W for all g E G. This is a well-known result (e.g., Ledermann, 1977), and can be easily proved by a change of variable in the summation. 0
By making substitutions into Eq. (22), we can now construct arbitrary
T H E G R O U P REPRESENTATION NETWORK
331
homomorphisms between any finite-dimensional representations of a finite group.
Example IZZ.2 Take G = S,, the two-element group {e,y}, and let A = B = the natural 2-D permutation representation. Let X,, denote an arbitrary 2 x 2 real matrix, with entries a, b, c, d. Then the resulting homomorphism W is given by
Observe that interchanging the rows of W is equivalent to interchanging the columns. This means that W is a homomorphism from A to B, as required.
B. Transmutation Unfortunately, the composition of linear concomitants (homomorphisms) of group representations can only result in more linear concomitants. In order to construct more sophisticated concomitants, we need also to consider a class of nonlinear concomitants. The class we are going to consider is the class of functions on finite-dimensional vector spaces that act in a component-wise fashion (with respect to some fixed basis).
Definition IZZ.2 Let I/ be a finite-dimensional vector space over the field F , and let f be a function from F to F . We define a function f :V H I/ to be the component-wise action o f f with respect to the natural basis e , , . . . , e n ; formally V a , , ..., a,,€F,
f-( a , e ,
+ ... + a , e , ) : = f ( a , ) e , + ... +f(a,)e,.
(23)
Now let G be a group, and let A and B be two representations of G on the space I/: The function f is said to transmute the representation A into the representation B i f f- is a concomitant from A to B, that is, VtJE r/; 9 E G,
f-( A ( g ) v ) = B ( d f ( 4 .
(24)
In the case A = B, the function f is said to preserve the representation A . The function f is referred to as a transmutation function (or preservation function). The transmutation condition is this: applying the transformation A(y) and then the function f component-wise gives the same result as applying the function f component-wise and then the transformation B(g) (for all g). It is possible to extend Definition 111.1 to the case where I/ is not finite-dimensional, but is instead a space of functions from some set X to F.
332
JEFFREY WOOD
In this case, the function f is interpreted as composition o f f with a given vector v, that is, f ( v ) = ,fL v, and we also require that V should be closed under such a composition. The rest of Definition 111.1 is unchanged. We will not have much need for this extension of the transmutation concept. Example 111.2 1. Consider again the following inversion representation A of C,:
0
-1
0
[pl $ 0
A((l
3 2))
=
Let B denote the underlying permutation representation of A :
,
B((l
2
3))
=
Then the function f ( x ) = cos x, over the field [R, transmutes A into B (as does any even function). 2. Consider the additive group Z,with two representations A , B generated by: A(1)
=
(i i),
B(l)
=
('
0 O). s
The function f(x) = sgn(x)/x, over the field [R, transmutes A into B. 3. Here is a well-known example of complex transmutation. Take G = C,, and let A be the direct sum of irreducible representations, that is, 1 0 0 0
1 0
0
0 0 0 1
0 0
0
THE GROUP REPRESENTATION NETWORK
I1
0
1; jl I1
A((l
4 3 2))
=
0
0
333
o \
0
o\
Let B denote the direct sum of four copies of the trivial representation, that is, B(y) = I , for all Y E G. Then the complex modulus function f ( x ) = 1x1 transmutes A into B. 4. Now take the additive group G = R, acting on the space I/ of all functions from G? to C.Let A be the representation of G defined for any TEG,V E I/ by v s E G?,
(A(T)V)(S) =
e2n”rV(s).
Let B be the representation of G which is the identity operation for all group elements. Then again the complex modulus function f ( s ) = Is1 transmutes A into B : (f(A(T)V))(4
=(
f’
@
4 T ) W
= I(A(T)Wl =
le””Tv(s)l
=
Iv(s)l
= (B(T)( f
v))(s)
= (B(T)(.f(V)))(SX
as required. The transmutation functions are a general class of nonlinear concomitants. We will see that this class of concomitants is sufficiently rich that we can obtain highly complex structures by combining them with representation homomorphisms. However, the transmutation functions are also restrictive enough that we can provide an almost exhaustive characterization of them in the case F = R, which is the most interesting case for practical purposes. To do this in a straightforward way, it is convenient to eliminate from consideration two classes of functions:
334
JEFFREY WOOD
Definition 111.2 A function f : R H R is called t-exceptional (transmutationexceptional) if one of the following cases holds. 1. f is discontinuous everywhere on either the positive real or the negative real line. 2. There exist zl, t2 E R, z1 4 { - 1,0, l}, z, # 0, such that . f ( z l x )= z2,f(x) for all X E R, but f is not of the form
I
blx" x > 0
VXER
f ( x ) = b2x" x < 0 h,
x=O
for any a, b,, b,, b, E R. I f f is not t-exceptional, it is called t-unexceptional. In the following theorem, which characterizes transmutation, we are going to assume that the transmutation function .f is t-unexceptional. We therefore require that f is continuous at at least one positive point and at least one negative point; not a very stringent requirement! We also wish to eliminate functions satisfying condition (2) in the foregoing, and this may seem rather contrived. In fact it is possible to characterize t-exceptional (case 2) functions, but not succinctly. Comparatively simple t-exceptional (case 2) functions include: z, = 10, z2 = 1, z1 = 2, z2 = 2, z1
= 2,
z2
= 3,
f ( x ) = sin(2zlog,,Ixl) f ( x ) = the largest integral power of 2 not greater than 1x1
,f(x) = 3 s i d z n log2x)+(logzx)
Generally, t-exceptional (case 2) functions have a property that might be described as geometric periodicity, as is the case for the t-unexceptional function f(x) = bx". For a further discussion of these functions, consult Wood (1995). The omission of the t-exceptional (case 2) functions is not significant here, as to include them in our characterization (Theorem 111.1) would require only a modification of the transmutation conditions for general positive and perm-diagonal representations, whereas we will see in Section II1.D that in fact these cases of transmutation are redundant in group representation networks. We will require some further notation. Let pu denote the function, which, applied (where applicable) to a matrix of arbitrary dimensions, raises each entry to the power of a. Similarly, let jju denote the function that raises each entry to the power of a and multiplies by the sign of the original entry. Our first main result is as follows.
THE G R O U P REPRESENTATION NETWORK
335
Theorem 111.1 Let A and B be$nite-dimensional representations of a group G over the Jield R,and let , f he a t-unexceptional function from R to R. Then f transmutes A into B precisely when one of the following holds: 1. A is a permutation representation and B = A. 2. A is an inversion representation, B = A and f is odd. 3. A is an inversion representation, B is the underlying permutation representation of A and f is even. 4. A is a positive representation and some reals a, b,, b, exist such that B(g) = p,(A(g)) for all g E G, and f has the form:
1
b,xa x > 0
VXER
f ( x ) = h2xa x < 0
0
x=o.
5. A is a perm-diagonal representation and some reals a, b exist such that one of the,following holds:
(a) B(g) = p,(A(q)) for all g E G and f has the f o r m f ( x ) = bx" for all x. (b) B(g) = ?,(A(g)) for all gE G and f has the f o r m f ( x ) = sgn(x)bx" for all x. 6. 7. 8. 9.
A is a unit-row representation, B = A and f is afine. B = A and f is linear. B is a unit-row representation and f is constant. f is the zero,function.
Proof See the Appendix.
0
In the previous paper (Wood and Shawe-Taylor, 1996a), only the case of preservation ( A = B ) was considered. As we will see, this omits an important useful case of transmutation. In the preceding list, evidently cases 6-9 are not of any real interest, as the functions f are too simple to be useful in constructing advanced concomitant structures. However, cases 1-5 offer a large class of nonlinear concomitants. Finally, note that while our general discussion proceeds on the assumption that the invariance group is finite, Theorem 111.1 in fact holds for an arbitrary group G.
C. Fixed- Weight Group Representation Networks By combining representation homomorphisms and transmutation functions together in a structured manner, we find that it is possible to construct highly complex functions with a group concomitance property. This can be done using the two composition operations shown in Fig. 1. Furthermore, by composing these linear and nonlinear concomitants together in such way, we naturally obtain a feedforward neural network structure. A feedforward
336
JEFFREY WOOD
neural network is essentially a directed acyclic graph, with input nodes (those nodes with no parents) and output nodes (those nodes with no children). The network acts as an information processor, with each edge having a corresponding multiplicative weight, and a non-input node’s output being determined by summing the node’s weighted inputs and then applying a nonlinear function, the activation function. For more information on artifi1991). cial neural networks consult, for example (Fausett, 1994; Hertz et d., The initial definition of a GRN, which does not include a parameterization to allow adaptation, is as follows.
DeJinition ZZZ.3 A fixed-weight Group Representation Network (GRN) over the group G is a feedforward neural network N having the following properties: 1. The nodes of N are partitioned into layers, the input nodes forming a single layer and each output node being a layer by itself. There are no connections between nodes in the same layer, and no feedback loops in the interlayer connections. Each layer i has an associated pair of representations (Ai, Bi)of G, each of dimension equal to the number of nodes of that layer. The representation A iis called the input representation of the layer, and B, is called the output representation. For each output layer 1, B,= 1, the trivial representation of G. The representations of the network are all over some fixed field F. 2. The connections of the network have weights that are fixed in F . The weight matrix yielding the inputs to layer j from the outputs of layer i is a homomorphism from Bi to A j . 3. All the nodes in a given non-input layer have the same activation function, 1; : F H F , which transmutes A , into Bi.
The representation B, corresponding to the input layer is called the input representation of the network. When F = R, the GRN is said to be real. The input representation of a fixed-weight GRN is the output representation of the input layer, rather than the input representation A , (which is completely redundant in Definition 111.3). The functionality of a GRN is that of its underlying neural network. Normally, we will restrict our attention to real GRNs. Occasionally we will have recourse to consider the complex variety. We will also wish at one point to extend conceptually the notion of the GRN to a network where some of the layers contain an infinite number of nodes, but this is not the usual situation. The group representation network is designed to meet an invariance condition:
T H E GROUP REPRESENTATION NETWORK
337
Theorem 111.2 Let N be ajixed weight G R N over the group G, and let A he the input representation of N. Then the output f ,of N is invariant under the action of A on the input layer, that is, ,fiw any network input vector v we have
that is, f, is a concomitant from A to the trivial representution 1 o f G.
Proof The essence of the proof is as follows. Consider the propagation of an arbitrary input vector 8' through the network N, and then consider the result of inputting the transformed vector A ( g ) v for some g E G. The output of the input layer is transformed by A(g) = B,(y). Now for each layer i of N with connections leading only from the input layer, the homomorphism property of the weight matrix between the two layers induces the action A&) on the input vector to that layer. In other words, if vCi)was the vector of inputs to that layer with v as the network input, then Ai(g)v"' will be the corresponding input to that layer with A ( g ) v as the network input. Next, as the activation function for that layer transmutes A i into Bi,there is a transformation B,(g) induced on the vector of outputs from layer i. Proceeding through the network in a feedforward fashion, we eventually find that the action of A(g) on the network input induces an action B,(g) on the output of each network output layer I, where B, is the output representation of that layer. But each B,(g) = 1, the trivial representation, so the network outputs are unchanged. This proof can easily be formalized by induction through the network structure (Wood, 1995). 0 Thus any GRN, constructed by connecting representation homomorphisms and transmutation functions in a feedforward structure, is invariant under the action of the group on the inputs. To construct fixed-weight GRNs under arbitrary finite-dimensional representations of finite (or compact) groups, it is only necessary to have enough information about the representations of the groups to be able to compute homomorphisms between them. Direct information on the invariants of the group is not needed. We can also consider GRNs in which the output representations of the network are nontrivial (Wood, 1995). In such a case, the GRN is computing a concomitant of group representations, rather than an invariant. This concept will be useful in some later proofs, but it is not part of our definition of a GRN.
Example 111.3 1. Examples of GRNs will appear throughout the remainder of the paper: however we give two examples of fixed-weight GRNs here.
338
JEFFREY WOOD
First, consider the bipolar X O R problem, defined as follows: u1
02
u , X O R v2
-1
-1
-1
-1
1
1 1
1 1
-1
-1
1
Given its size, this problem is notoriously hard to learn (e.g., to train a neutral network on). However, the problem is almost entirely defined by a symmetry: if the two inputs u l , u, are both multiplied by - 1, the output is unaffected. If we can construct a GRN invariant under this operation, and which furthermore discriminates between the input pair ( - 1, - 1) and the input pair (- 1, l), then we have effectively solved the bipolar X O R problem. The invariance group in question is S , = C , = (e,(l 2)), and the input representation o f the desired GRN is given by
This representation is the direct sum of two copies of the group’s alternating representation. We choose a single hidden layer (“layer 1”) with two nodes, for which the input and output representations are given by
The single output node has connections only from the hidden layer, and its representations A , , B, are both chosen to be trivial. The activation function of the hidden layer nodes if fl(x) = O(x - lS), where 0 is the Heaviside function VXER,
O(x)
=
i
1 x 3 0 0 x 0. For brevity we take uk > 0; the proof for uk < 0 is analogous and the proof for vk = 0 is trivial. Thus we obtain:
2 A(g)
f
1
i,juj
= .f(A(g)i , k u k )
= b , ( A ( di , k U k ) a = A(C!)f.k.f(%) n
=
C
B(g)i,,jf(~~j).
j=1
5. Now let A be an arbitrary perm-diagonal representation. We have two cases to consider: = p,(A(g)) for all g E G and some (IE R and .f is defined by f ( x ) = bx" for all x, where b is another constant real. Again let k be
(a) B(y)
such that A(g)q.k = B(g)i,k# 0. Hence: /
n
n
(b) B(y) = Fa(&)) for all g E G and some Q E IR and f is defined by f(x) = sgn(x)bx" for all x, where h is another constant real. Defining k as usual, this time we have:
398
APPENDIX: PROOF OF THEOREM 111.1 =
sgn(A(g)i , k ) A ( g ) 4 . k f ( U k )
=
B(g)i , k f ( U J n
=
1
B(g)i,j,f(Uj).
j= 1
6. Let A = B be a unit-row representation and f be affine; say . f ( x ) = mx + c for two reals m, c. Now we have: /
n
\
n
n
n
j= 1
j= 1
1 ~ ( gi,jmuj+ ) 1 A(g)i,jc (using the unit-row property)
=
n
=
C
A(g)i,jf(Uj)
j= 1
n
=
C
B(g)i,jf(uj)’
j= 1
7. Let A = B be arbitrary and .f be linear; say f ( x ) = mx for some real m. Trivially we see that:
8. Let A be arbitrary and B be a unit-row representation. Let ,f be a constant function, say f ( x ) = c.
n
=
1B(g)i,jc j =
(using the unit-row property)
1
n
=
1
B(g)i,jf(uj).
j= 1
9. Finally, let f be the zero function. Now both sides of Eq. (A.l) are equal to zero.
In each case, therefore, Eq. (A.l) holds for an arbitrary choice of i, g and u. By Lemma A.2, f transmutes A into B. This completes the first part of the proof. For the second part of the proof, we need to show that no other cases of transmutation are possible (that is, for a t-unexceptional function). This requires a series of preliminary results.
399
APPENDIX: P R O O F OF THEOREM 111.1
Lemma A.3 Let f be a junction that transmutes the finite-dimensional representation A into the representution B. Then either B is a unit-row representation or else f passes through the origin.
Proof Assume f transmutes A into B ; hence Eq. (A.l) holds. Substituting v = 0 we obtain: II
Vie1 ... n , g E G
f(0) = f ( 0 )
1 B(g)i,j. j= I
Hence either f (0) = 0 or Cy, B(g)i,j= 1 for all i and g. This is the required result. The rest of the proof proceeds by deduction of the properties of transmutation functions. The following definition will be useful: If a and b are real numbers satisfying the relation:
V XE R
~ ( u x =) bf(X)
+ (1 - h)f (0)
(A.2)
then we say that (a, b) is a transmutation pair for .f: For any given function J we denote by TP(f ) the set of all transmutation pairs for f: Lemma A.4
Let f be a nonconstant,function. Then the following laws hold:
1. (0, O h t1,1>E T P ( f ) 2. (a, 0)E T P ( f )* a = 0 3. (0, b ) E T P ( f )* b = 0 4. (a, b), (a, C) E T P ( f )* b = c 5. (1, b) E T P ( f )* b = 1 6. TP(f)\(O, 0) is a multiplicative subgroup of R x R 7. ( - 1 , h ) E T P ( f ) * b = + 1
Proof
1. This is immediate from the definition of T P ( f ) . 2. Suppose (a, 0) E TP(f ), that is, .f(ux) = f ( 0 ) for any x E R. As .f’ is nonconstant, we must have a = 0. 3. Suppose (0, b) E TP(f ) , that is, f (0) = b f ( x ) (1 - b)f(O), or bf ( x ) = bf(O), for all x E R.As f is nonconstant, b = 0. 4. Suppose both (a. b) and (a, c) are in T P ( f ) . Then we have
+
b( f
(4 - .f(O))
= f ( a x ) - f (0) = 4 f (4 - f(0)).
As f is nonconstant, we must have b = c.
5. Follows directly from laws 1 and 4. 6. Law 1 gives the identity law for groups, and the associative law follows directly from that for real numbers.
400
APPENDIX: PROOF OF THEOREM 111.1
Now let (al,b,) and ( a 2 ,b 2 )denote transmutation pairs. For any real x we now have:
+ (1 - b,)f(O) = b,b,f(x) + h,(l - b,)f(O) + (1 - b,)f(O) = b,b,f(x) + (1 - blb2)f(O),
f(a1azx) = h,f(a,x)
that is, (a 1u2,b,b2) is a transmutation pair. This proves the law of closure. Finally, let (a, b) denote a transmutation pair other than (0,O). By laws 2 and 3, neither a nor b is 0. We now have
f (?)11
(defining Y )
=f(y)
1 b
= - C.f(ay) - (1 -
1
=bf(x)
+ (1
-
b)f(O)l
;)
.f(O),
so (l/u, l/h) is a transmutation pair. This proves the law of inverses. 7. Suppose ( - 1, b) E T P ( f ) .Then by law 6, (1, b2)E T P ( f ) ,and so by law 5 b2 = 1; hence b = f l . 0
Lemma A S Let f be a function that transmutes the finite-dinzensional representation A into the representution B. Let a denote a sum of any number of distinct entries in any row ofa representative matrix of A , and let b denote the sum of corresponding entries in the corresponding matrix of B. Then (a, b) is a transmutation pair f o r f : Proof Let e, denote the ith column of the identity matrix. Suppose a is the sum of some entries in row r of matrix A(g) of A . Define the subset S of (1,. . . ,n } by a = CJ,sA(g)r.J;thus b = ZJesB(g)r,J. Finally, take Y = ZJESxeJ for arbitrary x. Now from Eq. (A.l) with i = r we have:
fk 1 1 A(g)r.jx
=
B(g)r,Jf(x)
+ 1B(g)r,,f(O) 16s
/ES
* f ( ~ ) = bf(x)
il
+ ((
B(Y)r,j) - b )
f(o).
Now by Lemma A.3, either f ( 0 ) = 0 or C;l=, B(LJ),,~ = 1. Hence we can write VX E R
as required.
~ ( u x= ) bf(x)
+ (1 - b)f(O) 0
APPENDIX: PROOF OF THEOREM 111.1
40 1
Lemma A.6 Suppose that f is LI nonconstant function that transmutes some non-perm-diagonal jinite-dimensional representation A into some representation B. Then T P ( f ) is an additive submonoid of IR x R.
Proof Let A ( y ) denote a non-perm-diagonal matrix of A , which must exist. As A ( g ) is not perm-diagonal, there exist two nonzero elements A(g)i,sand A(g)i,tin the same row of A(g). For simplicity of notation throughout this , ~ , similarly for B. proof, we henceforth write a , for A(g)i,S,a , for A ( C J ) ~and By Lemma AS, (a,,, b,) and ( a i f bit) , are transmutation pairs for f- Also, let ( p , 4 ) denote an arbitrary nonzero transmutation pair, for example, (1, 1). Now, into Eq. (A.l) we substitute v = pxe, aisxe,, for arbitrary x (x should be regarded as an indeterminate). This gives us:
+
f("ispx + aitaisx) = bis.f(px) + bit.f(aisX) +
1
bi,j,f(O)
j+s.i
= bisf(Px)
+ h i t f ( a i s x ) + (1 - (his + bjt)),f(O)
(applying Lemma A.3)
* f(ais(px + aitx)) = bisq,f(x) + bis(1 - 4 ) f ( 0 ) + bitbisf(x) + - bis)f(O) + (1 - (his + bit))f(O)
* 'isf((P+air)X) = bis(4+bit).f(x)+(l - (his4+bi,bis))f(0)-(1- b J f ( o ) * bi,f((P + ait)x) = bis(q + bit)f(x) + (bis - (bis4 + bisbit))f(O). As a, # 0, law 2 of Lemma A.4 tells us that b, # 0. Dividing by this gives us:
f ( ( p + air)x) = (4 + bit)f(x) + (1 - ( 4 + bit))f(O).
+
+
Hence ( p ait,4 bit) is a transmutation pair. We have deduced that the transmutation pair set is closed under addition of (ait,bit).This is a nontrivial result because ( a i t ,bit) is not zero. Now let ( p l , y l ) and ( p 2 , q z ) be arbitrary transmutation pairs. We mean to show that their sum is also a transmutation pair. We can assume ( p l , 41) and ( p 2 , q2) are nonzero, because otherwise the result is trivial. By law 6 of Lemma A.4, T P ( f ) is a multiplicative group, and so
(y,%)E TP(f).
Hence from the result already given,
402
APPENDIX: PROOF OF THEOREM 111.1
Again T P ( f ) is a multiplicative group,
is in T P ( f ) , and further-
more
that is, ( P 1 + p z , 41 + q 2 ) E T P ( f ) . In other words, the sum of any two elements in T P ( f ) is in T P ( f ) , so the transmutation pairs are closed under addition. The associative law for addition holds trivially, and by law 1 of Lemma A.4, (0,O) E T P ( f ) ,so T P ( f ) is an additive monoid. 0 Note the rather important condition on the theorem listed here, namely that A is not perm-diagonal. Lemma A.7 I f D is dense in S Vd E D,x E R
c R, f : RH f(dx)
then we can extend the property at
R is continuous at
P
and
d f ( x ) + (1 - d ) f ( O )
0 to obtain
Proof For s = 0, the required result holds trivially. We therefore assume s # 0. Since D is dense in S , for any s E S and given any E > 0 we can write s = d(s, E ) + 6, where 6 < E and d(s, E ) ED, d(s, E ) # 0. Thus s = lime+Od(s, E). Now we have:
= lim f ( ( d ( s , E ) E+O
+ 6)p)
€+O
(since all such limits exist) = sf(P)
+ (1 - s ) f ( O )
(by continuity off at
p).
0
403
APPENDIX: PROOF OF THEOREM 111.1
Corollary A.l such that
Let D denote a dense subset of R+ and f : R H R a function Vd E D
(d, d) E T P ( f ) .
Assume also that f is continuous at at least two points Then for any positive real s, (s, s) E T P ( f ) .
P1 > 0 and p2 < 0.
Proof By Lemma A.7, we have that, for any positive real s, f(sP1) = sf(PJ
+ (1
-
s).f(O)
f ( s B 2 ) = s f ( B 2 ) + (1 Now let x and s be any positive real numbers.
= sf(x)
+ (1 - s)f(O).
For x < 0, we have the same argument using p2 instead of PI. Finally, for x = 0, the same result holds trivially. Thus for any positive real s, we have VX€
R
f ( s x ) = $(x)
+ (1 - s)f(O).
In other words, (s, s) E T P ( f ) for all positive reals s. We are now ready to attack the Proof of the main theorem. Firstly, assume f is constant. By Lemma A.3, either B is a unit-row representation or f is zero, as required. Henceforth, f is assumed to be nonconstant. We have two main cases to consider, depending upon whether or not A is a perm-diagonal representation. We will deal with these rather different cases separately. 1. Suppose A is a perm-diagonal representation. By Lemma A S , ( A ( g )i.j>
ij) E
TP(f)
for any g, i, j. By law 3 of Lemma A.4, B must also be a perm-diagonal
404
APPENDIX: PROOF OF THEOREM 111.1
representation. We now have a number of subcases to consider.
(a) Suppose A is a permutation representation. The matrix entries of A contain only 0 s and 1’s. By laws 3 and 5 of Lemma A.4, the corresponding entries of B must also be 0 and 1, that is, B = A . (b) Suppose A is an inversion representation. We can further suppose that A is not a permutation representation, having already dealt with this case. By laws 4 and 7 of Lemma A.4, k = k 1 must (consistently) appear in the matrices for B wherever - 1 appears in the matrices for A, and again 0 and 1 must appear in B wherever they appear in A. Hence either B = A or B is the underlying permutation representation of A , depending upon whether k = - 1 or + 1, respectively. A cannot be a unit-row representation, because unit-row inversion representations are permutation representations. Consequently, by Lemma A.3, f goes through the origin. When k = -1 (and B = A ) , ( - 1, - 1) is a transmutation pair, so for all x, ,f(-x) = -.f(x), that is, f is odd. When k = 1 (and B is the underlying permutation representation of A), we have that ( - 1 , l ) is a transmutation pair, that is, f ( - x ) = f ( x ) , which means that f is even. This completes the proof for the case when A is an inversion representation. (c) Suppose A is an arbitrary perm-diagonal representation. We can assume that A is not an inversion representation, and hence A is not a unit-row representation, so ,f goes through the origin. By Lemma A.5 each corresponding pair of nonzero elements A(g)i,j, B(g)i , j obeys the law: ‘dx~
f ( A ( g )i,jx) = B ( g ) i , j f ( x ) .
Choose a pair z1 = A(g)i,j,7 , = B(Y),,~ such that 7,$ { - 1,0,1). This must be possible, or else A would be an inversion representation. By law 2 of Lemma A.4, z2 # 0. Here T~ and z2 satisfy the law: v x ER
.f(7
,x) = T 2 f ( X ) .
As f ’ is t-unexceptional, it must be of the following form for some a, b,, b , E R:
1
blxa x > 0
VXER
f ( x ) = b2xa x < 0 0 x=o,
because we also know f ( 0 ) = 0. Now let i, j A(g)i,j 3 0.
E
(‘4.3)
1 ...n and gE G be such that
B(g)i,jf(x)= f ( A ( g )i.jx) = A(g)4.jf(x).
APPENDIX: PROOF OF THEOREM 111.1
405
As f is nonconstant, B ( y ) i , j= A(g)Y,j. When A is a positive representation (and hence by Lemma A.l, perm-diagonal), we can apply the preceding for any matrix entry A(g)i.j,and, hence, B(y) = pa(&)) for any Q E G. This concludes the proof for positive representations. When A is not positive, there exists a negative entry t , in some matrix of A . Substituting into Eq. (A.3) gives us: h,t'fx" f(tlx)
=
x >0
b,77xx" x < O lo x = 0.
On the other hand, denoting by 7 2 the corresponding entry in the corresponding matrix of B, we have: t2hl.xa x > 0 f(tlx) = 7,f(.x) =
Z2b2X0
10
x