Advances in Electronics and Electron Physics, Volume 48

ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS VOLUME 48 CONTRIBUTORS TO THISVOLUME J. Arsac C. Baud Ch. Galtier Rona...

Author: L Marton

27 downloads 921 Views 17MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS

VOLUME 48

CONTRIBUTORS TO THISVOLUME

J. Arsac C. Baud Ch. Galtier Ronald E. Rosensweig H. Rougeot G. Ruggiu P. R. Thornton Tran Van Khai J. P. Vasseur T. A. Welton

Advances in

Electronics and Electron Physics EDITEDBY L. MARTON Smithsonian Institution, Washington, D.C.

Associate Editor CLAIRE MARTON EDITORIAL BOARD T. E. Allibone E. R. Piore H. B. G. Casimir M. Ponte W. G. Dow A. Rose A. 0.C. Nier L. P. Smith F. K. Willenbrock

VOLUME 48

1979

ACADEMIC PRESS New York San Francisco London A Subsidiary of Harcourt Brace Jovanovich, Publishers

COPYRIGHT @ 1979, BY ACADEMIC PRESS,INC. ALL RIGHTS RESERVED. NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.

ACADEMIC PRESS, INC.

111 Fifth Avenue, New York, New York 10003

United Kingdom Edition published by ACADEMIC PRESS, INC. (LONDON) LTD. 24/28 Oval Road, London N W 1 IDX

LIBRARY OF CONGRESS CATALOG CARD NUMBER:49-7504 ISBN 0-12-014648-7 PRINTED IN THE UNITED STATES OF AMERICA

79808182838485 9 8 7 6 5 4 3 2 3

CONTENTS CONTRIBUTORS TO VOLUME 48 . . . FOREWORD . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

Vii ix

Negative Electron Afhity Photoemitters H . ROUGEOT AND C . BAUD I . Introduction . . . . . . . . . . . . . . I1. Photocathodes Using Negative Electron Affinity 111. Reflection Photocathodes . . . . . . . . IV. Photoemission by Transmission . . . . . . V. Angular Energy Distribution . . . . . . . VI. NEA Photocathode Technology . . . . . . VII . Photoemission Stability and Dark Current . . VIII . Conclusion . . . . . . . . . . . . . . References . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

1 2 9 17 21

. . . . . . . .

22

. . . . . . . . . . . . . .

31 32 33

. . . . . . . .

A Computational Critique of an Algorithm for Image Enhancement in Bright Field Electron Microscopy T. A. WELTON

I . Introduction . . . . . . . . . . . . . . . I1. Image Theory for Bright Field Electron Microscopy 111. Effect of Partial Coherence . . . . . . . . . IV. Statistical Error . . . . . . . . . . . . . . V . Object Reconstruction . . . . . . . . . . VI . Programs for Numerical Tests of the Reconstruction VII . Presentation and Discussion of Data . . . . . Appendix A.Estimates of Quadratic Effects . . . Appendix B. The Wiener Spectrum of the Object Set Appendix C. Programs for Determining W (R). . References . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . . .

Algorithm

. . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37 39 47 50 55 67 74 87 94 97 100

Fluid Dynamics and Science of Magnetic Liquids RONALDE. ROSENSWEIG

I . Structure and Properties of Magnetic Fluids . . . . . . . . . . 103 I1. Fluid Dynamics of Magnetic Fluids . . . . . . . . . . . . . 122 111. Magnetic Fluids in Devices . . . . . . . . . . . . . . . . 157 IV. Processes Based on Magnetic Fluids . . . . . . . . . . . . . 186 References . . . . . . . . . . . . . . . . . . . . . . 195 V

vi

CONTENTS

The Edelweiss System J . ARSAC.CH. GALTIER,G . RUGGIU. TRAN VAN

U A I , AND

I. General-Purpose Operating Systems. . . . . . I1. New Trends in Computer Architecture . . . . . 111. New Trends in Programming . . . . . . . . IV. Principles of EXEL . . . . . . . . . . . . V. Large-Scale Systems: The EDELWEISS Architecture . VI . The Single-User Family . . . . . . . . . . Appendix A . . . . . . . . . . . . . . . Appendix B . . . . . . . . . . . . . . . Appendix C . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . .

J . P. VASSEUR

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

202 203 205 207 223 256 263 264 267 269

Electran Physics in Device Microfabrication. I General Background and Scanning System P. R . THORNTON

I . General Introduction . . . . . . . . . . . . . . . . . . I1 Photon and Electron Beam Lithography for Device Microfabrication 111. Interactions between an Electron Beam and a Resist-Coated Substrate IV . Electron Beam Methods for Device Microfabrication . . . . . . . V. The Development of a Fast Scanning Systern-General . . . . . . VI . Design of a Fast Scanning System-Use of Thermal Cathodes . . . . VII . The Development of a Fast Scanning System-Use of Field Emitter Cathodes . . . . . . . . . . . . . . . . . . . . . . VIII . The Development of a Fast Scanning System-The Role of ComputerAided Design . . . . . . . . . . . . . . . . . . . . . IX. The Deflection Problem . . . . . . . . . . . . . . . . . X . High-Current Effects . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .

272 275 280 288 300 326

AUTHORINDEX. SUBJECT INDEX.

381

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

336 341 352 367 376 391

CONTRIBUTORS TO VOLUME 48 Numbers in parentheses indicate the pages on which the authors’ contributions begin.

J. ARSAC,Institut de Programation, Universite Paris VI, Tour 45-55, 11, Quai Saint-Bernard 75005, Paris, France (201) C. BAUD, Laboratoire de Recherches, Thomson-CSF Division Tubes Electroniques, 38120 St. Egreve, France (1) CH.GALTIER,Thomson-CSF, Laboratoire Central de Recherches, Domaine de Corbeville-B.P. 10, 91401-0rsay, France (201) RONALD E. ROSENSWEIG, Corporate Research Laboratories, Exxon Research and Engineering Company, Linden, New Jersey 07036 (103) H. ROUGEOT, Laboratoire de Recherches, Thomson-CSF Division Tubes Electroniques, 38120 St. Egreve, France (1) G. RUGGIU,Thomson-Brandt, 173, Boulevard Haussmann, B.P. 700-08, 75360, Paris Cedex 08, France (201)

P. R. THORNTON,G C A Corporation, Burlington, Massachusetts 01803 (271) TRANVAN KHAI, Thomson-Brandt, 173, Boulevard Haussmann, B.P. 700-08, 75360, Paris Cedex 08, France (201) J. P. VASSEUR, Thomson-Brandt, 173, Boulevard Haussmann, B.P. 700-08, 75360, Paris Cedex 08, France (201) T. A. WELTON,Physics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, and Department of Physics, University of Tennessee, Knoxville, Tennessee 37916 (37)

vii

This Page Intentionally Left Blank

FOREWORD Although many aspects of photoelectricity have been treated as recently as in Volumes 40A and B (1976) of the Advances, these related mainly to devices. A more thorough treatment of the physics of photoemitters has not appeared since the eleventh volume (1959) and this omission is filled partially by the review of H. Rougeot and Ch. Baud, entitled “Negative Electron Affinity Photoemitters.” The authors show that studies utilizing advances in solid state physics have improved the understanding and performance of such photoemitters. The next review is by T.A. Welton: “A Computational Critique of an Algorithm for Image Enhancement in Bright Field Electron Microscopy.” A closely related subject is treated in the recent monograph by W. 0. Saxton (Supplement 10 to Advances in Electronics and Electron Physics), but here Welton’s approach is different since he explores in some detail a small subset of computational procedures, with special attention to their practical difficulties, in an effort to obtain better image reconstruction. The title of the third review, “Fluid Dynamics and Science of Magnetic Liquids,” by R. E. Rosensweig may give the impression that its subject is far removed from the usual contents of the Advances. However, there are at least two reasons for its inclusion: first is the similarity of fluid dynamics of magnetic fluids to magnetohydrodynamics, and second is the intriguing possibilities of coupling ferrofluidic devices to electronic devices. The expected interest of electronics engineers in this new class of devices is ample justification for this review’s appearance. Since the last review of large-scale computer organization in the eighteenth volume (1963), little has been published on computer architecture except for a review of minicomputers (Vol. 44, 1977). It is timely therefore to examine the architecture of one particular modem computer and the review titled “The Edelweiss System,” by J. Arsac, Ch. Galtier, G. Ruggiu, Tran Van Khai, and J. P. Vasseur, does so. The authors discuss different trends in computer architecture, in programming, and in the use of such systems. The first part of a two-part review on “Electron Physics in Device Microfabrication,” by P. R. Thornton, completes this volume. The reduction in size of present-day integrated circuitry requires ever increasing sophistication in the methods needed for their production. With components approaching almost molecular dimensions, the most advanced techniques in electron optics are needed for these extreme requirements. The author discusses such techniques and their limitations. ix

X

FOREWORD

Following our custom we list again the titles of future reviews, with the names of their authors. This time the listings are given in three categories: first, regular critical reviews, second, as usual, supplementary volumes, and third a special listing of Volume 50 of this serial publication. This fiftieth volume, marking a kind of anniversary, will be devoted entirely to historical presentations of different subjects in electronics and electron physics. Critical Reviews: The Gunn-Hilson Effect A Review of Application of Superconductivity Sonar Electron Attachment and Detachment Electron-Beam-Controlled Lasers Amorphous Semiconductors Electron Beams in Microfabrication.I1 Design Automation of Digital Systems. I and I1

Spin Effects in Electron-Atom Collision Processes Electronic Clocks and Watches Review of Hydromagnetic Shocks and Waves Beam Waveguides and Guided Propagation Recent Developments in Electron Beam Deflection Systems Seeing with Sound Large Molecules in Space Recent Advances and Basic Studies of Photoemitters Application of the Glauber and Eikonal Approximations to Atomic Collisions Josephson Effect Electronics Signal Processing with CCDs and SAWS Flicker Noise Present Stage of High Voltage Electron Microscopy Noise Fluctuations in Semiconductor Laser and LED Light Sources X-Ray Laser Research Ellipsometric Studies of Surfaces Medical Diagnosis by Nuclear Magnetism Energy Losses in Electron Microscopy The Impact of Integrated Electronics in Medicine Design Theory in Quadrupole Mass Spectrometry Ionic Photodetachment and Photodissociation Electron Interference Phenomena Electron Storage Rings Radiation Damage in Semiconductors Solid-state Imaging Devices Particle Beam Fusion

M. P. Shaw and H. Grubin W. B. Fowler F. N. Spiess R. S.Berry Charles Cason H. Scher and G. Pfister P.R. Thornton W. G. Magnuson and Robert J. Smith H. Kleinpoppen A. Gnadinger A. Jaumotte & Hirsch L. Ronchi E. G. Ritz, Jr. A. F. Brown M. and G. Winnewisser H. Timan F. T. Chan, W. Williamson, G. Foster, and M. Lieber M. Nisenoff W. W. Brodersen and R. M. White A. van der Ziel B. Jouffrey H. Melchior Ch. Cason and M. Scully A. V. Rzhanov G. J. Bbnb B. Jouffrey J. D. Meindl P. Dawson T.M. Miller M. C. Li D. Trines N. D. Wilsey E. H. Snow A. J. Toepfer

xi

FOREWORD Resonant Multiphoton Processes Magnetic Reconnection Experiments Cyclotron Resonance Devices The Biological Effects of Microwaves Advances in Infrared Light Sources Heavy Doping effects in Silicon Spectroscopy of Electrons from High Energy Atomic Collisions Solid Surfaces Analysis Surface Analysis Using Charged Particle Beams Low Energy Atomic Beam Spectroscopy Sputtering Photovoltaic Effect Electron Irradiation Effect in MOS Systems Light Valve Technology High Power Lasers Visualization of Single Heavy Atoms with the Electron Microscope Spin Polarized Low Energy Electron Scattering Defect Centers in 111-V Semiconductors Atomic Frequency Standards Interfaces Reliability High Power Millimeter Radiation from Intense Relativistic Electron Beams Solar Physics Auger Electron Spectroscopy Fiber Optic Communication Systems Microwave Imaging of Subsurface Features Novel MW Techniques for Industrial Measurements Diagnosis and Therapy Using Microwaves Electron Scattering and Nuclear Structure Electrical Structure of the Middle Atmosphere Microwave Superconducting Electronics

P.P. Lambropoulos P. J. Baum R. S. Symous and H. R. Jory H.Frohlich Ch. Timmermann R. Van Overstraeten D. Berenyi M. H. Higatsberger F. P. Viehbkk and F. Riidenauer E. M. H&l and E. Semerad G. H. Wehner R. H. Bube J. N. Churchill, F. E. Holmstrom, and T. W. Collins J. Grinberg V. N. Smiley J. S. Wall D. T. Pierce and R. J. Celotta J. Schneider and V. Kaufmann C. Audoin M. L. Cohen H. Wilde T. C. Marshall and S. P. Schlesinger L. E. Cram P. Holloway P. W. Baier and M. Pandit A. P. Anderson W. Schilz and B. Schiek M. Gautherie and A. Priou G. A. Peterson L. C. Hale R. Adde

Supplementary Volumes: Image Transmission Systems High-Voltage and High-Power Applications of Thyristors Applied Corpuscular Optics Microwave Field Effect Transistors

G. Karady A. Septier J. Frey

Volume 50: History of the Revolution in Electronics Early History of Accelerators Power Electronics at General Electric 1900 to 1950 History of Thermoelectricity

P. Grivet M. S. Livingston J. E. Brittain B. S. Finn

W. K. Pratt

xii

FOREWORD

Evolution of the Concept of the Elementary Charge The Technological Development of the Short-wave Radio History of Photoelectricity From the Flat Earth to the Topology of Space-Time History of Noise Research Ferdinand Braun: Forgotten Forefather

L. L. Marton E. Sivowitch W. E. Spicer H. F. Harmuth A. van der Ziel Ch. Siisskind

As in the past, we have enjoyed the friendly cooperation and advice of many friends and colleagues. Our heartfelt thanks go to them, since without their help it would have been almost impossible to issue a volume such as the present one.

L. MARTON C. MARTON

ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS

VOLUME 48


ADVANCES IN ELECTRONIC3 A N D ELECTRON PHYSICS, VOL.

48

Negative Electron Affinity Photoemitters H. ROUGEOT

AND

C. BAUD

Laboratoire de Recherches Thomson-CSF Division Tubes Electroniques St. Egreue, France

Introduction .............................................................................. Photocathodes Using Negative Electron Affinity ...... Reflection Photocathodes ............................................................... Photoemission by Transmission ........................................................ Angular Energy Distribution.. .......................................................... NEA Photocathode Technology ........................................................ A. The Material and Growing Technology... B. Transmission Photocathodes with Heteroe C. Investigation of Material Characteristics....... D. High-Vacuum Enclosures ........................................................... VII. Photoemission Stability and Dark Current ............................................ VIII. Conclusion ...... ................... ..... ..... ........ References ................................................................................ 1. 11. 111. IV. V. VI.

1 2 9 17 21 22

29 31 32 33

I. INTRODUCTION In 1887, Hertz discovered that the surface of a conductc- emitted negatively charged particles when irradiated with ultraviolet light. Two years later, a similar effect was observed with visible light and alkali metals by Elster and Geitel. For many years, research was concerned with investigating the fundamental nature of this phenomenon, being aimed at explaining the interaction between electromagneticwaves and corpuscles, and at verifying the photon-to-free-electron conversion concept that was proposed by Einstein in 1905. Then, in 1929, the silver-oxygen-cesium photocathode was discovered. From 1930 onward, photoemission technology progressed rapidly. Quantum yields were improved, and the sensitivity was extended into the red end of the visible spectrum by the introduction of antimony-cesium, bismuthgold oxygen-cesium, and multialkaline photocathodes. The discovery of these new materials and the performance improvements were purely empirical. It was only after the rapid expansion of and major progress in solid-state 1 Copyright 8 1979 by Academic Press, Inc. All rights of reproduction in any form reserved. ISBN 0-12-014648-7

2

H. ROUGEOT AND C. BAUD

physics that photocathodes were considered as semiconductors. It was the semiconductive nature of the photoemissive layers that permitted explaining the photon-electron interaction. Emphasis was then placed on finding practical applications for photoemission and on creating models of the potential profiles across the photocathode-vacuum interface that were compatible with experimental results. Numerous investigations were carried out, but no matter what type of material was used for the layer, the energy required to create photoelectrons was always found to be greater than the ionization energy in the layers. The difference between the two values corresponds to the photocathode material's positive affinity for electrons. This affinity must be reduced to obtain improvements in photocathode yield and extension of red sensitivity. In 1965, Scheer and Van Laar discovered the phenomenon of negative electron affinity (NEA) at the surface of semiconductors.They found that by depositing a layer of cesium on a crystal of p-type gallium arsenide, followed by a layer of cesium and oxygen, photoemission with an excellent quantum yield could be obtained up to the cutoff wavelength (absorption limit) of the material. This limit to the photoemission indicated that the excitation energy did not depend on the surface properties, but on the forces between the electrons and atoms of the semiconductor. From this it was deduced that the vacuum level at the surface of the semiconductor had been lowered below that of a free electron in the conduction band of the crystal. This drop is a measure of the negative electron affinity at the surface of the material. With such an affinity, all free electrons can escape from the crystal without any supplementary excitation. Apker et al. (1948) had already tried to obtain photoemission with vacuum-deposited semiconductor films, but they had not discovered negative electron affinity. 11. PHOTOCATHODES USINGNEGATIVE ELECTRON AFFINITY

Consider a photocathode (Fig. 1) consisting of, for example, a semiconducting monocrystal of gallium arsenide (GaAs) with an activating layer of cesium and oxygen. With a photocathode designed for operation by " reflection " (emission from the irradiated face), an incident photon traverses the activating layer with a very low probability of being absorbed and penetrates the semiconductor. There, after a path length that depends on its energy and on the absorption coefficient of the material, the photon is stopped and creates a free electron-hole pair. After a certain number of collisions, one of these charges, a hot electron, can thermalize to near the lower level of one of the conduction bands that are available to it. Two such bands, known as X and r exist in gallium arsenide. Each one keeps the

NEGATIVE ELECTRON AFFINITY PHOTOEMITTERS Monocrystol

3

cs-0

FIG.1. GaAs photocathode operating by reflection.

electron for a specific time, during which it may “diffuse” to the surface of the semiconductor. To escape, an electron that is near the surface must penetrate a surface perturbation zone which controls the surface electronemission yield. Several hypotheses have been proposed to explain the potential profile of the perturbation zone. The basis for these hypotheses is given in Fig. 2, which represents the energy band configurations in gallium arsenide, a thin layer of ionized cesium, and a layer of cesium oxide. The bending of the bands at the crystal surface is due to surface centers that are initially electrically neutral, but which have electrons that are weakly bound as compared to those of the lattice molecules. These weakly bound electrons migrate to acceptor centers (impurities that have been deliberately introduced into the lattice), so creating a negative space charge near the surface, and causing the band bending and reduction in vacuum level shown in Fig. 2. This may be the situation before deposition of the activating layer. As a first hypothesis, suppose that this layer consists of cesium oxide, considered as a semiconductor, and whose energy bands (Fig. 2) are as proposed by Uebbing and James (1970a,b). Because the electron affinity of cesium oxide is lower than that of gallium arsenide, we can assume that a flux of electrons will pass from the first into the second material. This will stop when the gallium arsenide’s valency band, which constitutes an important electron reservoir, reaches the Fermi level. The resulting potential profile is shown in Fig. 3. Sonnenberg (1969a,b) was apparently the first to propose this model, which is also found in work by Bell and Spicer (1970). Notice that in particular, the vacuum level, which depends on the cesium

4


{rr

Vacuum level without band bending.--.-..-:

Vacuum,’ Conduction bar,

band bending.. ....

py”’ 0

-r--

-7-

Conduction band

i I

In

CS

Go As Fermi level of a p-type materiol

i i

--

Volency band

Valency bond

\I

(a)

(b)

(C)

FIG.2. Band configuration in (a) GaAs, (b) ionized cesium, (c) cesium oxide.

oxide, has been lowered to a value below the bottom of gallium arsenide’s conduction band, thus creating negative electron affinity conditions. In other words, an electron excited from the valency band to the conduction band of the semiconductor will be in an energy level above that of vacuum. However, this model does not completely explain many experimental observations including those of the authors. It indicates that 6 to 5 nm of Cs,O, considered as a semiconductor, would be required to provide maximum band bending and create negative electron affinity. In reality, a whole or partial monomolecular Cs-0 layer is sufficient. Emission already starts on partial monatomic layer of cesium and, after addition of an equivalent number of oxygen atoms, this effect is increased by a factor of 20. Further improvement can be obtained by adding successive

Conduction band

Vakncy band

Go A s

4

1 % 6nm

1 CsrO

FIG.3. Heterojunction model.

NEGATIVE ELECTRON AFFINITY PHOTOEMITTERS

5

layers of cesium and oxygen until a maximum is reached for a total thickness of a few layers. This noticeable improvement in photoemission, obtained by adding oxygen atoms to the cesium-covered surface of semiconductors, was first pointed out by Turnbull and Evans (1968) who alternated layers of cesium and oxygen to obtain the equivalent of 4 to 10 monomolecular layers of cesium. James et al. (1969) obtained optimum results with six 0-0 layers, whereas Garbe and Frank (1969) used three to four. The increased yields marked the departure from the explanation and model given by Scheer and Van Laar. According to Uebbing and James (1970a,b), NEA could be explained by the existence of a heterojunction between two semiconductors: the substrate (GaAs in this case) and n-type cesium oxide (Cs-0) with a very low electron affinity (0.85 eV). These authors were, in fact, clarifying a hypothesis that was first suggested by Sonnenberg (1969a,b), developed by Bell and Spicer (1970), and further developed by the same authors and others in several articles (Bell et al., 1971; Milton and Baer, 1971). By using quantitative chemical analysis in solution, Sommer et al. (1970) showed that the Cs-0 layer giving optimum photoemission was equivalent to four to five monatomic layers of Cs. However, because of the high density of the Cs10 component, they concluded that the surface deposit of this oxidized form was in fact a monomolecular layer. So, they contested the existence of a Cs,O-GaAs heterojunction. James and Uebbing (1970a,b), working on low-bandgap materials GaSb and GaAsSb, found new arguments. In particular, they showed that the NEA could be increased by depositing many successive layers of Cs-0. Brown et al. (1971)found conflicting evidence, showing once again that a simple monomolecular layer was sufficient to obtain optimum photoemission from GaAs, a material that also has a low band gap. They were more explicit about the role played by oxygen in the hypothesis of a surface dipole layer, suggesting that it acted as a shield between the surface cesium ions, so permitting an increase in dipole moment. As an alternative, they envisaged the possibility that oxygen atoms introduced themselves into the Cssemiconductor interface. Considering the structures of Fig. 2 (GaAs and Cs) and the low electronic affinity of Cs, we could imagine that the surface layer of Cs becomes ionized and forms a dipole layer with the GaAs by giving up its valency electrons to the semiconductor. This dipole layer would shift the vacuum level to the level of the bottom of the conduction band of the semiconductor, as shown in Fig. 4. This hypothesis was proposed by Scheer and Van Laar (1965). The role of the first layer of oxygen would not, in this case, be to form cesium oxide, Cs,O. Instead, because of the dangling surface bonds of the GaAs in the interstices of the cesium layer, it increases the dipole effect by

6


Vacuum level

Zone I

+ Zone 2

Zone 3

FIG.4. Approximate band profiles.

introducing itself into the interstices. Because the oxygen molecule is small, the image force, which depends on the distance between the substrate and the absorbed oxygen layer, is large. The vacuum level is lowered by a value corresponding to this image force. Although NEA photocathodes have since been made from many different materials, no simple explanation has been accepted by all the specialists. However, it is generally accepted that the Cs-0 surface layer is particularly rich in cesium (Sonnenberg, l971,1972a,b; Fisher et al., 1972).The structure of this deposit was examined by Uebbing and James (1970b), Goldstein (1973), and Martinelli (1973a, 1974), who concluded that it is amorphous, whereas in bulk CszO, Cs,O, Cs403, and Cs,O are crystalline (Simon, 1973) with well-defined structures (Simon, 1971; Simon and Westerbeck, 1972); CszO is a semiconductor with a band gap of 2 eV (Borziak et al., 1956; Hoene, 1970; Heiman et al., 1973); Cs,O has a distinctly metallic nature (Tsai et al., 1956). Clark (1975) suggested that the Cs-0 activating layer be considered to be an amorphous phase, Csz + x 0 where, depending on the value of x, the structure varies uniformly between the semiconductor Cs,O structure and the metallic Csz + x 0 (x > 0) structure, the material having a distinct metallic nature for x > 0.1. He deduced that the layer can easily pass from the metallic to the nonmetallic state, so that two cases must be considered, a heterojunction of two semiconductors (CszO and a 111-V material) and Schottky dipole (CsO and a 111-V material).


7

Using electron spectroscopy with ultraviolet-irradiated oxidized cesium on a silver substrate, Ebbinghau et al. (1976) concluded that the stable compound is C~11O3,this being similar to the ideas of Goldsmith so far as the nature of the oxidized compound is concerned. However, this assures that the composition is unaffected by the substrate, this being compatible with a heterostructure of the form GaAs/Csl103. According to Fisher et al. (1972) and Sommer et al. (1970), optimum emission with a III-V material is obtained by monomolecular Cs30 layer, the interface barrier potential being given by the difference between the electronic affinity of the III-V material and the Cs30, reduced by the image force effect. Fisher et al. (1972) calculated the effect of this barrier on photoemission by assuming it to be rectangular and taking the case of Ga,Inl -,As. James et al. (l969,1971a,b) and Clark (1975)have pointed out that if the theory of a III-V/Cs30 Schottky dipolar barrier is to be examined in greater detail, the surface states of the semiconductors must be taken into account because they can partially mask the difference in electronic affinity of the two materials. Bell (1973) assembled all the ideas on NEA, current in 1973, in a work entitled “Negative Electron Affinity Devices.” Sommer (1973~) has also prepared a similar, excellent work. Whether the superficial structure is a dipole or a heterojunction, the photoemission yield is due to a photoemission probability that was described by Fisher et al. (1974). These authors reconstructed a potential profile (not concerning themselves with its origin) to explain the observed emission probabilities. Some workers have tried to go beyond the hypothesis stage.and to observe the surface structure directly. Ranke and Jacobi (1973) used Auger spectrometry and thermal desorption to study the surface-bonding forces of gallium arsenide for different orientations. Their work is very interesting as it establishes desorption conditions under ultrahigh vacuum. Among other things, it shows that the Fermi level has a fixed value of 0.5 eV on the TTT (As) face of GaAs. Although these values are of use in establishing a potential profile for the activated material, they do not permit one to decide if the cesium-covered surface is of a dipole or heterojunction nature. Using low energy electron diffraction (LEED), Papageorgopoulos and Chen (1973)studied the absorp tion of Cs and H2 on W(lO0) and showed the existence of a regular structure. The minimum value of the work function corresponds to a regular Cs structure evenly deposited on a regular W structure. When only the Cs has a regular structure and when its lattice constant does not correspond to that of W, the work function increases. Although Martinelli (1974) found, by using

8


LEED, that Cs on silicon also has a regular structure, they could find no regular structure with any orientation of GaAs. However, Mityagin et al. (1973) and Derrien et al. (1977) contest this fact, indicating a regular structure for cesium on GaAs(ll0). Derrien et al. (1975) found an amorphous structure for cesium on Gap. In an article entitled “Absorption Kinetics of Cs on GaAs,” Smith and Huchital (1972) concluded that for Cs on GaAs, two states of absorption exist. The first has a sticking coefficient of one and an ionization of the absorbed cesium atoms which have a low work function and high bond strength. The second has a low sticking function, coinciding with the establishment of a second layer. The bonds weaken and the work function increases. Paradoxically, the second state starts to appear before the first state is completed. Using flash desorption, Goldstein and Szostak (1975) reached the same conclusion. They found, experimentally, that for covering factors of over 0.5, the electron affinity increases and photoemission diminishes. They admit that once this value has been passed, the polarization of the surface dipoles diminishes because of electrostatic repulsion between the cesium atoms, and the work function goes to a minimum. At the same time, it is found that thermal desorption of cesium can be carried out at lower temperatures, perhaps because the cesium atoms are not so strongly bonded. After addition of oxygen to the Cs layer, flash desorption indicates an increase in the quantity of strongly bonded cesium atoms. Oxygen thus increases cesium bonding strengths. Another interesting observation made by these authors is that oxygen can be desorbed in the G a 2 0state. The fact that there is no desorption in the Cs,O, state tends to indicate that there is no bonding in the Cs20 surface stage. Auger analysis of the surface confirms that oxygen does not start to escape until nearly all of the cesium has gone. The desorption of oxygen in the GazO stage does not, however, imply that oxygen fixes itself to the surface in the state Ga20. These conclusions may appear to contradict those of Gregory et al. (1974) who noted that if oxygen is supplied to a GaAs surface, an oxygenarsenide bond is formed, leaving the gallium free. In this way, they explain why the Fermi level at the surface of GaAs does not change when oxygen is added. The surface states due to the gallium atoms keep this characteristic unchanged during the oxidation. Desorption of oxygen in the form G a 2 0 could thus be due to the formation of the chemical compound by heating the GaAs above 400°C. On the other hand, Gregory et al. (1974) found, like Uebbing and James (1970b) and Bell et al. (1971), that coating p-type GaAs pinned the Fermi level at 0.5 eV above the valency band. This seems to be widely confirmed by other authors (Van Laar and Scheer, 1967; Dinan et al., 1971).


9

111. REFLECTION PHOTOCATHODES The overall quantum yield of a reflection photocathode was established by Baud and Rougeot (1976),working from hypotheses due to James et al. (1969) and Kressel and Kupsky (1966). It is given by the expression

x e-ao

+ T(E)(1-

(1)

PO)))

where E is the energy of the incident photon, R is the reflection coefficient, P,, Pr are the electron escape probabilities for the X and r conduction bands, a is the optical absorption coefficient, Lx, Lr are the electron diffusion lengths in the X and r conduction bands, Fx, Fr are the fraction of total electron excitation occurring in the X and r conduction band minima, xo is the thickness of the space-charge zone, and T ( E )is the escape probability for an electron of energy E. One can consider that the electrons have a Boltzmann distribution in each of the conduction bands, X and r. n ( A E ) a exp

(

E)

--

where A E is the kinetic energy of the electron, k is Boltzmann’s constant, and T is the absolute temperature. Electrons entering the space-charge zone are accelerated toward the surface by the internal field. They become hot electrons and interact with phonons. The electron will have a mean free path I,, and energy A E p will be lost at every electron-photon interaction. We will take I, = 43 A and AE, = 0.036 eV, values given for GaAs by Kressel and Kupsky (1966). The thickness of the space-charge zone is given by

where n, is the doping level, V, is the band bending (0.45 eV for GaAs), ccO is the dielectric constant, and q is the charge of an electron. The solution given by Williams and Simon (1967) and Bartelink et al. (1963) is used to calculate the surface energy distribution of photoelectrons that have traversed the band-bending zone. In the case: where Eo = q2F21i/3AE,

(4)

10


and F is the electrostatic field in the space-charge zone, E(0) is the kinetic energy of electrons at the surface, and E , is the energy lost by an electron, the energy distribution is of the form

This distribution must be normalized. Applied to the X and r conduction bands, it represents the probability of finding an electron from one of these bands at an energy level E at the surface of the material before escape into the vacuum (6) where E x , is the minimum energy of the X and r bands in the bottom of the conduction band after bending. The probabilities of emission of an electron from the X and r bands are E = Ex,r - E,

,-

and

Pr = C N d E i ) T ( E i )

(7)

Px = C Nx(Ei)T(Ei)

(8)

i

i

where Nr(Ei) and N x ( E i )are the probabilities of finding an electron from the I‘ and X bands at an energy level of E i . The transmission probability, T(E), was worked out by Baud and Rougeot (1976) in the following manner. Applying quantum mechanical principles to regular structures permits associating a wave, defined by Schrodinger’s equation, to an electron.

where V is the potential perturbation encountered by an electron on its trajectory, h is Planck’s constant, E is the energy of an electron at the surface of a material, and m is the mass of the electron. Integrating the equation for the three zones in Fig. 4, we have zone 1: $, = a, exp(ik, x) + b , exp( - ik, x),

2m

k:

=

hZ E

kt

=

hZ [ E - q(vl + v,)]

zone 2 :

lcl2 = a2 exp(ik,x)

+ b, exp(- ik2x),

2m

11


zone 3: $3

= a3 exp(ik,x)

+ b3 exp(-ik3x),

k: =

2m

[ E - q(V1 - VO)]

where h = h/27t, and m = electron mass. Zero energy is taken as being the bottom of the conduction band after bending. The constants a,, bl, a,, b, , u 3 , and b3 are obtained by equating the wave functions and their derivatives at the limits of the zones. The emission probability (probability of a photoelectron of energy E being emitted into the vacuum) is given by

where we have T=

(k, k,

+

4kl k3 k:

k2k3)2

+ (k; + k:)(k: + k:)

sinh2(k,ao)

where a. is the thickness of the Cs-0 dipole layer (see Table I).

TABLE I %ME

NUMERICAL VALUES OF EMISSION PROBABILITY T AS A FUNCTION OF ELECTRON ENERGY'

Electron energy (eV) relative to top of the valency band

Transmission coefficient, T

1.41 1.50 1.60 1.73 1.80 1.90 2.00 2.10 2.20 2.30 2.40 2.50 2.60 2.70 2.80

0.44319 0.48452 0.52429 0.56834 0.58915 0.61595 0.63983 0.66123 0.68052 0.69800 0.71390 0.73844 0.74177 0.75405 0.76538

For a, = 0.1 nm and barrier height, V, + V, ,of 2.8 eV (see Fig. 4).

(11)

12


Figure 5 shows the energy distribution (normalized) of surface electrons as a function of thermalization energy in the space-charge zone for several different doping levels. These curves are applicable for electrons from either the X or the r conduction bands.

-2.10'e c r n S 3 = N

__---

3.lOte ~ r n ' ~ 5.101B~ r n - ~ .......................... 1 0~ ~ r n ~- ~

0.1

0.2

0.3

0.4

Electron enerqy, E ( e V )

FIG.5. Normalized energy distribution of surface electrons as a function of doping N.

Figure 6 shows the quantum yield (reflection operation) as a function of wavelength for different doping levels. Activating layer thickness a. was 0.1 nm and the barrier height V, was 2.35 eV. Figure 7 shows the quantum yield (reflection operation) as a function of wavelength for different doping levels, thickness and barrier height being the same as in Fig. 6. Figure 8 shows the quantum yield (reflection operation) as a function of wavelength for different activating layer thicknesses (ao= 0.1,0.15,0.2 nm), other parameters having the following values: n, = 5.10'* Lr = 3 pm, V, = 2.35 eV. Figure 9 shows the quantum yield (reflection operation) as a function of wavelength for different barrier heights (Vz = 2.35, 2.05, 1.75, and 1.45 eV),

NEGATIVE ELECTRON AFFINITY PHOTOEMI'ITERS

13

0.3 -

0

.-WZI

5 o.2 -

c

.........................

0 c

---- ----

3

0

0.1 -

3.1019 ~ r n - ~

1019 ~ m - ~ 5.1018 cm-3 3.1018 ~ m - ~ -.,-,.-..- 2 . 1 0 ~~~r n - ~ Cs-0 thickness, a. = 0.1nrn Height of barrier, V 2 = 2.35 eV

-.

.

FIG.6. Quantum yield as a function of wavelength for different doping levels (photoemission by reflection). 0.4 -

0.3-

.............................

0.3

0.4

0.5 0.6 0.7 Wovelength, X ( p m )

0.8

0.9

FIG.7. Quantum yield as a function of wavelength for different diffusion lengths L ( p h e toemission by reflection)

H. ROUGEOT A N D C. B A U D

14

__----.-.-.

04

01 nrn = a,, 0.15 n m - 0.2 nm

.............................

0.3nrn

Doping, n, = 5.10'8~ r n - ~

03

Diffusion length, L = 3 lm Height of barrler, V2 =2.35 eV

D

-

%J r

E, 0 2

c

C 0

a 0

01

................................................................ (

;

0.5

0.4

0.6

0.7

......

0.8

.

_

0.9

Wavelength, X (pm)

FIG.8. Quantum yield as a function of wavelength for different Cs-0 thicknesses a, (photoemission by reflection).

04

03

z

%J r

5 02

c

C

0 2

0

__----.-.-. .............................

01

1.75e V 2.05 e V 2.35 e V

Doping, n, = 5.10'8cm-3 Diffusion l e n g t h , L = 3 p m C s - 0 thickness, a. = 0.1n m

0.3

0.4

0.5

0.6

0.7

Wavelength A ( p r n )

0.8

0.9

FIG.9. Quantum yield as a function of wavelength for different barrier heights V, (phe toemission by reflection).


15

other parameters having the following values: a, = 0.1nm, n, = 5.101* I,,- = 3 pm. Figures 10 and 11 are comparisons of theoretical values and experimental results due to Baud and Rougeot (1976).The substrate of the NEA photocathode was gallium arsenide, obtained by liquid-phase epitaxy. The theoretical curves were obtained by selecting values for the four parameters as explained below.

i a

-cnE

Cs - 0 thickness, a. = 0.1nm SL = 1 2 0 0 1 A / l m = Experimental observations

_ - - Theoretical curve I1

0.4

0.6

I

0.8 I Wavelength X (pm) FIG.10. Comparison of theoretical and observed spectral sensitivities (photoemission by reflection).

First, the thickness of the space-charge zone was deduced from the doping level, taking the spontaneous polarization of the 11 1B face of GaAs to be the generally accepted value of 0.45 V. It should be noted that the value taken for the spontaneous polarization does not greatly affect the theoretical curves. Second, the height of the potential barrier that must be traversed by the emerging electrons was taken to be 1.35 eV in all cases, in conformity with the contribution of the electron affinity of the different materials present, GaAs lllB and Cs-Cs,O (see Section 11). Third, the thickness of this barrier, of utmost importance according to the theory, was evaluated to be 0.1 nm for a clean surface before activation, but was corrected for each figure to obtain the best match between theory and experiment. Fourth, the diffusion length was chosen whose value gives the theo-

16


100

I

5

a

E

Y

v)

.-3 10 > .+ .-

I

Cs -0 thickness, a,, = 0.1nm SL = 1200 p A / l m

v)

C

0)

-In e

- -=

t

\

\

Experimental observations

_- Theoretical curve

0

0)

n

v)

0.4

0.6

0.8

I

Wavelength X ( pm) FIG. 11. Comparison of theoretical and observed spectral sensitivities (photoemission by reflection)

retical curves the characteristic shape that is also found in most of the corresponding experimental curves. Figure 10 shows the excellent agreement that is obtained over the whole spectral range. The close similarity of theoretical and spectral sensitivities is also shown by the curves in Fig. 11, being more obvious at low sensitivities because of the falloff in wavelength response. The pioneers in NEA photoemission (Scheer and Van Laar, 1965; Apker et al., 1948; James et al., 1968), were first concerned with depositing cesium on clean semiconductor substrates, obtained by vacuum cleavage. Turnbull and Evans (1968) were able to obtain 500 pA/lm. Using ion bombardment, Garbe and Frank (1969) obtained sensitivities of several hundred microamperes per lumen. Many workers were also primarily concerned about the purity of the cesium and the oxygen, and about the deposition conditions (speed, temperature, etc.), the work of Fisher (1974) being noteworthy. However, it was quickly realized that the crystalline quality of the material was the most important factor. Sensitivities of better than 2000 pA/lm are now obtainable in reflection operation; see James et al. (1971~)and Olsen et al. (1977).

17


IV. PHOTOEMISSION BY TRANSMISSION Figure 12 is a schematic cross section of a photocathode for transmission operation. It shows a transparent substrate carrying the active semiconductor (GaAs, for example), which is covered by a Cs-0 layer that causes negative electron affinity. C s - 0 activating loyer

Incident radiation N

c X

Tronsparent substrate

-

t

-

Active semiconductor

FIG.12. Photocathode operating by transmission.

Some of the conditions that must be satisfied for operation by reflection and operation by transmission are common to the two techniques. However, transmission operation has some extra requirements. Photons traversing the transparent support will excite photoelectrons on the illuminated surface of the semiconductor. They must then diffuse all the way through this active material without recombining and must arrive at the other face with a significant probability of escape. It can be assumed that virtually all of the electrons are thermalized into the bottom of the conduction band because of the thickness of the active layer as compared with the thermalization length. Expressions giving the photoemissive yield in transmission operation were established by Antypas et al. (1970) and Allen (1971). For permanent excitation, the density An of these electrons at a point of distance x from the illuminated face is given by (see Fig. 12) d2 An Ddx2

I

Electrons diffusing distance x

-

An

-

z

Electrons recombining in x

N + (1 - R)--ae-" A

]+[

=0

Electrons excited inx

(12)


18

where D is the diffusion coefficient of the electrons, z is the lifetime of the electrons, R is the reflection coefficient of the GaAs surface, u is the optical absorption coefficient of GaAs, N is the incident photon flux, and A is the illuminated area. An electric field that instantly absorbs any electrons approaching it exists to a depth of several nanometers at the emission surface. Many of these electrons are emitted into the vacuum. The electron concentration summed over t is An@) = 0 (13) Those which diffuse toward the illuminated face recombine at a speed S, and their flux toward this face is given by D F l x = o = S An

The flux of electrons diffusing toward the emitting surface is given by

I = -DIf T, is the transparence of the support and P the probability of emission, the photoemissive quantum yield is p=-

IPT, N/A

which becomes UL u2L2- 1

p = PT,(l- R)-

1

I

1 D / L ) cosh(t/l)

1;uD

+ S sinh(t/L)]

+ S ) - e-"(S

cosh

+

p

sinh 1 L1 1- uLe-"'/

(17)

1. Diffusion Length, L

In transmission photocathodes, the diffusion length must be as long as possible, so low doping with a very carefully chosen doping material is preferable. However, because a limited space-charge zone is required at the emission surface, a high concentration of free carriers is required. The doping must therefore be a compromise.


19

The diffusion length depends to a large extent on the crystalline purity of the material, this having been discussed by Abrahams et al. (1971). Figure 13 shows theoretical curves of quantum yield for an escape probability of 0.4 and total transmission in the substrate. Target thickness was taken to be 3 pm and recombination speed at the interface to be lo3 cm/sec. These curves are for diffusion lengths L of 1, 3, 5, and 7 pm.

_,__

- - --- -

Recombination velocity , S , a t interfoce =

lo3crn /second

7pm = L

--------

___10-31

014

5 Km

3 Prn I w-n

I

I

I

06

0.8

I

Wavelength, X ( p m )

FIG. 13. Quantum yield as a function of wavelength for different diffusion lengths L, target thickness of 3 pm (photoemission by transmission).

2. Thickness of the Active Layer, t A high quantum yield will be obtained if the thickness of the active layer is less than the diffusion length. Figures 13, 14, and 15, for thickness of 3, 1, and 6 pm, respectively, show that red sensitivity drops if the photocathode becomes too thin. The target thickness must be longer than the absorption length of the photons, but shorter than the diffusion length of the electrons.

l/a < t < L

I -

10-1-

_

v .-0

)

-

)I

s

I

c C

0

a 0

-

Recombination velocity at 10-2 -

i n t e r f a c e , S = lo3 cm/second 71m = L

-_ _ ____ - 5 w

- 3w

-. -. -. -. -. .-

.

.-

I wn

I

1

0.6

014

I

0.8

I


?

_ _ _ _ _ _ _ _ _ _ _ _ _-.

/ *-

_._._.

/'

-.-

\

Recombination velocity at the i n t e r f a c e , S lo3 cm /second 7pm = L

I

-___ -___ 5 vm -.-.-. -.- 3 1 m -. .-..-. .I pm

I

0.4

,

/./

I / /" /"

..I"-" I

0.6

0.8

Wavelength , X (prn

1

I



21

3. Quality of the SubstratelActive-Layer Interface: Recombination Speed, S An interface with a high recombination speed will absorb photoelectrons to the detriment of emission at the other face. The importance of this parameter S is shown in Fig. 16. The quality of an interface can thus be appreciated by looking at the form of the quantum yield curve. It depends mainly on the crystal structure. Constant quantum yield over a large part of the spectrum indicates a low recombination speed. The quantum yield itself depends to a large extent on the emission probability and on the transparency of the support.

L

10-1-

0

_._.--

.-.,

-

.-oi 21

-5

/'

/,,

-

a

/

/

0

/'

10-2

-

/

/

/'

'

/

,./"

C

0

/'

Diffusion length,L=3 p m Target thickness,t= 3 pm s = 102 cm/s 103 cm/s

_______

I

I

I

0.6

0.8

I

Wavelength , A ( p m )

FIG.16. Quantum yield as a function of wavelength for different recombination velocities S, at the interface (photoemission by transmission).

V. ANGULARENERGYDISTRIBUTION Csorba (1970)has calculated the resolution characteristics of a proximity focused tube incorporating an NEA photocathode. Assuming a perfectly flat photocathode and Lambertian electron distribution with a spread of 2 eV, the modulation transfer function (MTF) is given by MTF(v,) = exp[ - 12(EM/V)(v,d)']

(18)

22


where EM is the maximum emission energy in the Lambertian distribution, v, is the spatial frequency, I/ is the screen accelerating voltage, and d is the cathode-to-screen separation. Under these conditions, it should be possible to obtain much higher resolutions than with similar tubes incorporating multialkaline photocathodes. Pollard (1972) has calculated the tangential component of emission energy and has shown that values of 1 to 2 meV may be envisaged for GaAs photocathodes. These values were confirmed by experiment (Pollard, 1972), using a LEED Auger angular analyzer, the emission half-angle being less than 5”. However, measurements made by Holeman et al. (1974) on proximity focused tubes gave tangential emission energies on the order of 100 meV. Bradley et al. (1977) analyzed these discrepancies between experimental results and suggested two reasons: measurement errors in the LEED Auger analyses for energies below 1 eV, faceting of the emissive surface of the photocathode. In fact, Chen (1971) and Fisher and Martinelli (1974~)have already shown that no matter what the original orientation may be, heating causes “faceting ” along the 110planes of GaAs. The second point is thus the main reason for the high measured values (100 meV) (Bradley et al., 1977). Martinelli (1973b) has calculated the effect of this “faceting” on the resolution of photocathodes. The resolutions hoped for cannot be obtained in this case and are in fact similar to those of multialkaline photocathode. The desorption heating must be optimized to reduce “ faceting.” TECHNOLOGY VI. NEA PHOTOCATHODE A . The Material and Growing Technology

The semiconducting detector of multialkaline photocathodes is formed by evaporation, usually onto glass, of the constituent parts. The technique for NEA photocathodes is completely different. Here, the starting point is a semiconducting material, surface treatment being used to give NEA properties. The crystal quality of the semiconductor must be extremely high so that its electrical properties are compatible with its function as a detector. Optimum doping and a large diffusion length are particularly important. Initially, vacuum-cleaved monocrystals were used (Scheer and Van Laar,


23

1965; Van Laar and Scheer, 1968). But, it soon became apparent that epitaxial layers were preferabledoping is easier to control, and the structure lends itself better to transmission operation. The substrates are sliced from monocrystals that can be obtained in various ways: the Czochralski method, in which an oriented seed crystal draws the monocrystal vertically out of a liquid bath; the Bridgmann technique, a horizontal drawing method using a crucible. Several techniques can be used to deposit the epitaxial layers: liquid phase, vapor phase, organometallic compounds, molecular jet. The last one is not really suitable for photoemitters, being best adapted to microwave devices. Liquid-phase epitaxy with GaAs uses a solution of molten gallium, and was developed by Nelson (1963) to obtain p n junctions in GaAs. The principle of this technique is shown in Fig. 17: points A, B, and C are the fusion

i

1 X

5

Composition

FIG. 17. Phase diagram for GaAs.

points of gallium (30 "C), arsenic (800 "C), and gallium arsenide (1240 "C), respectively. At a temperature T , a liquid of composition X is in equilibrium with the solid S. If the temperature falls to T',the liquid partially crystallizes to point L and becomes enriched in gallium with a new composition X'. This method permits deposition at much lower temperatures than that using the fusion component. Tiller (1968), Tiller and Kang (1968), and Potard (1972), drawing on the

24


mathematical treatment of Pohlhausen and Angew (1921), have studied the thermodynamics of this method. This work was taken up again by Crossley and Small (1972). The substrate, which acts as the seed, can be brought into contact with the bath in various ways. Nelson (1963), Kang and Green (1967), and Goodwin et al. (1970) use a tilting method, whereas a vertical dipping technique was used by Deitch (1970) and King et al. (1971). A crucible of the form shown in Fig. 18 can also be used (Hayashi et al., 1970). This is particularly suitable for preparing multiple epitaxial layers, used in transmission devices, during a single thermal cycle.

,-

Quartz tube

Heater elements

FIG. 18. Horizontal liquid-phase epitaxy system.

Among the numerous publications dealing with these techniques are those of Pinkas et al. (1972), Miller et al. (1972), and Casey et al. (1973). Antypas (1970) has studied the phase diagram of InGaAs. Panish and Legems (1971) have established the basis for the calculation of quaternarycompound phase diagrams. Vapor-phase deposition was used as early as 1966 by Tietjen and Amick to prepare a layer of GaAsP using arsine and phosphine. The technique that they used is described in Bell (1973). Binary and ternary 111-V compounds can be made in this way. More recently, epitaxial growth using organometallic compounds has been employed by Manasevit and Simpson (1972). The 111-V compound is obtained from a group-I11 organometallic compound and a group-V hydride by means of the reaction (CH,),Ga

+ ASH,

+

GaAs

+ 3CH,

The GaAs substrate is hot. This method has the advantage that it permits preparing aluminum compounds, something that is not possible with chlorides. Encouraging results have been obtained by Allenson and Bass (1976) who, using homoepitaxial structures, have obtained diffusion lengths similar to those obtained with liquid-phase epitaxy and a sensitivity of 1150 pA/lm. Andre et al. (1976) have made GaAs/Ga, -,Al,As heterostructures with electrical


25

qualities comparable to those of homoepitaxial GaAs obtained by the vapor-phase method. However, this technique is still in the experimental stage. Liquid-phase and vapor-phase epitaxy have their own individual advantages. Liquid-phase epitaxy gives better electrical characteristics, but the resulting surface finish is of lower quality, which could be a disadvantage for optical applications (Ettenberg and Nuese, 1975). Gutierrez et al. (1974) recommend a mixed technique for heterostructure, GaAs/GaAlAs/GaP, that is suitable for transmission operation; the GaAlAs is deposited in liquid phase, and the GaAs in vapor phase. At present, both liquid-phase and vapor-phase epitaxy are used, each technique having given good results (Olsen et al., 1977). B. Transmission Photocathodes with Heteroepitaxial Structures

Various structures could be envisaged for a transmission photoemitter. The simplest technique would be to polish a monocrystal until the desired thickness, 2 to 5 pm for 111-V materials, is obtained. Although this would be possible for silicon, it is not suitable for 111-V materials because of their fragility. Some form of support, or substrate, is needed that is monocrystalline and transparent to radiations in the spectral sensitivity range of the photocathode. Sapphire (A1,0,) and Spinel (MgA1204),which have the advantage of being transparent in the ultraviolet and visible region of the spectrum, have been used. However, the results, 70 to 200 pA/lm, are low (Syms, 1969; Liu et al., 1970; Andrew et al., 1970; Hyder, 1971). This is due to a high carrier recombination speed at the interface, caused by the difference between the substrate and GaAs lattice parameters. Materials of type 111-V are preferred for photocathodes, Gap, GaAs, or InP being chosen depending on the characteristics required. Here we will discuss the most common case, GaAs, although silicon and photocathodes optimized for neodymium-doped YAG laser ( A = 1.06 pm) detection should also be mentioned. The most common substrate material is Gap, which is transparent beyond 0.55 pm. Gutierrez and Pommering (1973) have made a GaAs/GaP heterostructure using vapor-phase epitaxy. However, the difference between the lattice constant of the two compounds (see Fig. 19) results in an elevated recombination speed. A buffer layer is employed to reduce this speed. The buffer, which must be transparent, is used to absorb the defects due to the lattice mismatching. The intermediate layer can have one of several compositions. GaAlAs is very suitable (see Fig. 19); its band gap is greater than that of GaAs and its lattice

H. ROUGEOT AND C . BAUD

26 28 2.4

-> 2 .o - 1.6 W

a

0

m

U

c

O

1.2

m 0.8 0.4

- Direct Direct

---

band band gap gap Indirect band gap 1

55

56

57

58

Lattice constant

59

60

61

62

(HI

FIG. 19. Bandgap variation as a function of lattice constant for a number of ternary Ill-V compounds.

constant is virtually the same. Allenson et al. (1972) and Frank and Garbe (1973) have made GaAs/GaAlAs/GaP heterostructures using liquid-phase epitaxy and have obtained transmission-operation sensitivities of better than 400 ,uA/lm. Gutierrez et al. (1974) used the same structure, but tried a hybrid, liquid-phase plus vapor-phase technique to obtain a GaAs surface that was blemish-free. GaAsP is also used as a intermediate layer. Although its recombination speed is less suitable than that of GaAlAs, it is transparent over a greater range of the spectrum (see Fig. 19). Gutierrez and Pommering (1973), who used vapor-phase epitaxy, were the first to give results for this structure. The work of Fisher (1974) should also be noted. By varying the composition of the ternary layer, it can be optimized on the GaP/GaAsP and GaAsPIGaAs interfaces so as to minimize the dislocation density. The formation of a potential well for photoelectrons at the interface can be avoided by using ptype doping of the intermediate layer. A systematic study of the use of In,Ga, -,P as a buffer layer material has been carried out by Enstrom and Fisher (1975). Liu et al. (1973) explored the possibility of using structures of the type GaAs/GaAlAs/GaAs-substrate. The opaque substrate must be eliminated from the useful surface of the photocathode, the GaAlAs acting as the support. However, this structure remains fragile (Baud and Rougeot, 1976).


27

To get over this disadvantage while keeping the advantages of such a structure, Antypas and Edgecumbe (1975) proposed sealing it onto a glass support to make it rigid. A heteroepitaxial structure of the type GaAlAsJGaAsJGaAIAsJGaAs-substrate is prepared by liquid-phase epitaxy. The last layer (GaAlAs) is sealed onto glass, and then the substrate and the first layer of the GaAlAs are eliminated by selective chemical baths. The sealed layer of GaAlAs acts as a buffer layer, absorbing damage caused by sealing. Sensitivities of better than 600 pA/lm have been measured in transmission operation with photomultipliers using this type of photocathode structure, higher values of sensitivity being classified.

C. Investigation of Material Characteristics

The preceding sections have shown the major importance of the diffusion length, doping, and electron escape probability. To be able to measure these parameters and to correlate then with photoemission is therefore of the utmost importance. If possible, the measurements should be nondestructive.

1. Difusion Length

Garbe and Frank (1969) proposed using the photoemission to measure the diffusion length and, if desired, the electron emission probability. This method only works for wavelengths which only excite the r band of the semiconductor. For example, if the material is gallium arsenide the wavelength must be longer than 700 nm. The inverse of quantum yield, measured by using photoemission by reflection, is given by the expression 1

[

+!]

-~

p ( ~ ) -(1 - R ) P L ~ ( A )

P

The curve of l/p(A) with respect to l/a(A) is linear, and if it is extrapolated so as to cut the abscissa at the origin [l/p(A) = 01, it can be used to measure the diffusion length. The electron emission probability P can be calculated from where the curve intersects the ordinate at the origin [l/a(A) = 01. Mayamlin and Pleskov (1967) have shown that the potential difference AV(A) developed between a semiconductor and an electrolyte into which it is immersed is of the same nature as the photovoltaic effect observed in a semiconductor placed in a vacuum or gaseous environment (Johnson, 1958). The inverse of the photovoltaic potential at a given wavelength is related to

28


the inverse of the absorption coefficient at the same wavelength by the expression (Allenson, 1973)

where N o is the incident photon flux, and k is an electrolyte/semiconductor interaction coefficient, eliminated by the measurement method. This relationship is linear, so if the curve is extrapolated, the diffusion length can be calculated from the intersection with the abscissa. A third way to measure the diffusion length uses the photomagnetoelectric effect. If the mobility of the charge carriers in the material is known and if certain approximations are made, the measurement of current or voltage can give the diffusion length (Agraz and Li, 1970). This method, unlike the previous ones, has the disadvantage of being destructive. Another destructive technique is to measure the photocurrent at the epitaxial-layer/substratejunction of a beveled sample. If the response is plotted as a function of distance from the junction in log-linear coordinates, the diffusion length can be derived from the resulting straight line (Ashley and Biard, 1967; Ashley et al., 1973).

R

= [2 exp( - x/L)](1

+ S/VD)-

(21)

where R is the relative response, S is the surface recombination speed, and V, = D/L with D being the diffusion coefficient. Laser excitation may be replaced by an electron beam (Hackett, 1972). The smaller dimensions mean that oblique polishing is no longer necessary, simple cleaving being sufficient.

2. Doping

The doping can be measured by the Hall effect, using the now classic technique of Van der Paw (1958). With epitaxial matepals, the substrate must be semi-insulating. Measurement at 300 K gives the carrier density and mobility. Measurement at 77 K gives the compensation coefficient. This technique is destructive. Another method, nondestructive this time, exploits the plasma resonance that occurs at the surface of a semiconductor which is irradiated with farinfrared radiation. This phenomenon affects the reflection at a wavelength that is a function of the doping. Various ways of applying this technique are described in Black et al. (1970), Schuman (1970), Riccus et al. (1966), and Kesamanly et al. (1966).


29

3. The Nature of the Sudace

The characteristics mentioned previously concern the bulk of the semiconductor and its electrical properties. The nature of the surface affects the electron escape probability, depending on its crystalline quality and purity. Auger electron spectroscopy is now widely used for the study of photocathodes. Numerous articles, such as Chang (1971), Auger (1975), and Harris (1968), describe this secondary emission process. The energy of each peak is characteristic of the emitting body, so the identification of chemical species at the surface can be performed with an accuracy of 1/100 of a monolayer. The influence of contaminants can thus be shown with this technique (Uebbing, 1970). The lattice structure at the surface can be checked by low energy electron diffraction,a method which gives an image of the surface reciprocal lattice. A description of this is given in Estrup (1971) and Fiermans (1974). This nondestructive control instrumentation should be installed inside the enclosure in which the photocathode is prepared, so that the desorption and activation processes can be checked (Goldstein, 1975; Stocker, 1975; Van Bommel and Crombeen, 1976). D. High- Vacuum Enclosures

The incorporation of NEA photocathodes in sealed tubes presents several problems, among which are maintaining a vacuum of lo-’’ torr inside the enclosure, and the desorption under high vacuum of the monocrystalline surfaces that are to be activated. 1. The Need for High Vacuum

By high vacuum, we mean residual pressures of less than lo-’ torr. The necessity of using high vacuum becomes evident on inspecting Fig. 20 which gives the number of impacts per square centimeter as a function of partial pressure of an element of molecular weight M and Fig. 8 which shows the rapidity with which photoelectron transmission decreases due to the tunnel effect as a function of the thickness of the barrier potential. From Fig. 20, it can be seen that only 1 sec is required to deposit a monomolecular layer of H 2 0 on a cold surface at a pressure of 2.lO-’ torr. This time becomes sec at 2.10-lo torr. 2. High- Vacuum Sealing Equipment

Various ways of sealing tubes under high vacuum have been recommended. As an example, the authors describe here their own equipment (see

30

H. ROUGEOT AND

C. BAUD

Partial pressure ( t o r r )

FIG.20. Variation of number of impacts per square centimeter with partial pressure.

Fig. 21). It consists of a liquid-helium cryogenic pump; a quadrupolar gas analyzer; an Lshaped, metal-jointed, bakeable valve with a high throughput; and a bell housing. Half of the tube is mounted on a b e d support. The other half is mounted on a support that can be made to slide along two guide columns by means of a mechanical press mounted outside the vacuum enclosure that transmits the motion via a metal bellows. Initially, the two halves of the tube are kept separated, the space between them permitting the introduction of the photocathode before sealing. The photocathode itself is degassed and prepared outside the tube. It is fixed to a turntable, controlled manually from outside, that can be rotated so as to position the photocathode for the different operations of desorption, coating with cesium, and insertion in the tube. Pure oxygen can be introduced into the vacuum enclosure via a leak valve. Various viewing ports permit positioning the photocathode, measuring its sensitivity, and checking the sealing operation. Photocathodes of several different structures have now been tried in tubes, including GaP/GaAsP/GaAs with a sensitivity of 270 pA/lm in transmission operation (Hughes et al., 1972,1974) and GaP/GaAlAs/GaAs with sensitivity of better than 300 p A / h (Holeman et al., 1974).


31

FIG.21. Equipment for ultrahigh vacuum sealing with indium joints: 1, Cryogenic pump; 2, quadrupole gas analyzer; 3, metallic-joint ultrahigh vacuum valve; 4, bell jar; 5, oxygen leak valve; 6, inspection window; 7, workpiece manipulation system; 8, sorption pumps.

VII. PHOTOEMISION STABILITY AND DARK CURRENT Factors affecting stability are listed in publications by Sommer (1973a,b) and Yee and Jackson (1971).It appears that the main cause of photocathode destruction is bombardment by residual ions; Shade et al. (1972)studied this problem and found that if the photocathode is to conserve its efficiency,the current drawn must not exceed lo-’ A/cmz. Spicer (1974) studied the origin of the instabilities in greater detail by irradiating metallic cesium and 111-V materials with ultraviolet light and measuring how their photoemission evolved with degree of oxidation. He found that cesium keeps its metallic nature after a short exposure to oxygen which, apparently, dissolves into the bulk of the material. A larger exposure to oxygen first reduces the work function to a minimum value of 0.7 eV and then tends to increase it. Working with p-type gallium arsenide, he noted that the energy bands tended to bend toward a lower level. A special feature of NEA photocathodes is their low dark current. This dark current has several origins: thermionic emission, which varies with the height of the band gap of the material. This is the main source of dark current in GaInAsP photocathodes that are intended for operation at 1.06 pm (Escher et al., 1976); charge carriers created by generation-recombination centers at the interface; the Cs-0 layer, whose effect increases with thickness.

32

H.ROUGEOT AND

C. BAUD

The dark current of GaAs photocathodes is on the order of A/cm2 (Van Laar, 1973). For materials with narrower band gaps, it may reach lo-'' A/cm2 (Martinelli, 1974). This subject is treated in publications by Bell (1970) and Spier and Bell (1972). VIII. CONCLUSION Although much of this discussion has been based on gallium arsenide, other materials can, of course, be used. They must, however, fulfill the following conditions:

(1) The material must be highly absorbent and have a diffusion length compatible with the thickness of the photoemission layer. (2) The width of the band gap must not be less than the work function of the activated surface. (3) Because the active material is either sealed or grown epitaxially onto a substrate that is transparent to the wavelengths of interest, their expansion coefficients must be well matched. So far as the second point is concerned, the work function of the oxygencesium layer, deposited on the active layer, is on the order of 0.85 eV. This sets a lower limit for the band gap of the semiconductor, below which an equivalent spectral response (into the infrared) can hardly be hoped for. This limit corresponds to a wavelength of 1.46 pm. It must also be remembered that, according to the theories developed up to now, any extension of the spectral response toward longer wavelengths will, because of the reduction in NEA (band gap of -0.85 eV), be accompanied by a drop in electron escape probability, and hence a reduction in quantum yield. Figure 22 shows the main semiconductors that satisfy conditions 1 and 2. With ternary compounds, the band gap, and hence cutoff wavelength, can be adjusted to give a desired value by simply altering the composition. The most commonly used ternary compounds are GaInAs and InAsP, deposited by epitaxy on GaAs and InP, respectively. Condition 3 limits this possibility because the lattice dimensions and the absorption limits of ternary materials both vary with composition (Vegard's law). To make the parameters independent of each other, the use of quaternary compounds has been envisaged. These materials permit varying the spectral absorption limit while keeping the lattice structure constant. One quaternary material that has been investigated in some detail is GaInAsP. The main contribution of the study of NEA at the surface of semiconductors has been to permit a better understanding of photoemission and to set its limits. Improvements now depend on the metallurgy of the active layers, the efficiency of surface-cleaning techniques, and so far as operating stability

33

NEGATIVE ELECTRON AFFINITY PHOTOEMITTERS 1.74eV ( 0 . 7 1 2 u m )

Cd Se

1.5 eV ( 0 . 8 3 p m 1

C d Te

1.4 eV ( 0 . 9 p m )

Go As

1.27eV10.975 p m )

InP

1.17eV (1.06 p m ) 1.12eV ( 1.1 p m

_-______--------- Wavelength of

N d Yag loser

Si

O.8SeV ( 1 . 4 6 p m )

Limit of NEA (electron

offinity of C s -0 loyer

)

EV

FIG.22. Band gap of commonly used semiconductors in NEA.

is concerned, the cleanliness of the enclosures incorporating such photocathodes. It may be possible to increase the value of limiting wavelength, presently 1.46 pm, beyond which NEA disappears. This would require the discovery of new activating materials with lower electronic affinity, or the creation in some way of an electron-extraction field in the substrate. Several laboratories are working on these ideas at present.

REFERENCF~S Abrahams, M. S.,Buiocchi, C. J., and Williams, B. F. (1971) Appl. Phys. Lett. 18, 220. Agraz, J. C., and Li, S. S. (1970). Phys. Rev. B 2, 1847. Allen, G. A. (1971). J . Phys. D 4, 308. Allenson, M. B. (1973). S E R L Technol. J . 23, 11.1. Allenson, M. B., and Bass, S. J. (1976). Appl. Phys. Lett. 28, 113. Allenson, M. B., King, P. G. R., Rowland, M. C., Steward, G . J., and Syms, C. H.A. (1972). J. Phys. D 5, L89. Andre, J. P., Gallais, A., and Hallais, J. (1976). " Process Gallium Arsenide and Refated Compounds" (C. Hilsum, ed.), Conf. Ser. No. 334 Edinburgh 1.

34


Andrew, D., Gowers, J. P., Henderson, J. A., Plummer, M. J., Stocker, B. J., and Turnbull, A. A. (1970). J. Phys. D 3, 320. Antypas, G. A. (1970). J. Electrochem. Soc. 117, 1393. Antypas, G. A., and Edgecumbe, J. (1975). Appl. Phys. Lett. 26, No. 7, 371. Antypas, G. A,, James, L.W., and Uebbing, J. J. (1970). J. Appl. Phys. 41, 2888. Apker, L., Taff, E., and Dickey, J. (1948). Phys. Rev. 74, 1462. Ashley, K. L.,and Biard, J. R. (1967). IEEE Trans. Electron Devices ED-14,429. Ashley, K. L., Carr, D. L., and Moran, R. R. (1973). Appl. Phys. Lett. 22, 23. Auger, P. (1975). Surf. Sci. 48, 1. Bartelink, D. T., Moll, J. L., and Meyer, N. I. (1963). Phys. Rev. 130, 972. Baud, C., and Rougeot, H. (1976). Rev. Thomson-CSF 8, 449. Bell, R. L. (1970). Solid State Electron. 13, 397. Bell, R. L. (1973). “Negative Electron Affinity Devices.” Oxford Univ. Press (Clarendon), London and New York. Bell, R. L.,and Spier, W. E. (1970). Proc. IEEE 58, 1788. Bell, R. L., James, L. W., Antypas, G. A., and Edgecumbe, J., and Moon, R. L.(1971). Appl. Phys. Lett. 19, 513. Black, J. F. et a/. (1970). Infrared Phys. 10, 126. Borziak, P. J., Bibik, V. F., and Dramarenko, G. S. (1956).Izv. Akad. Nauk SSSR, Ser. Fiz 20, 1039. Bradley, D. J., Allenson, M. B., and Holeman, B. R. (1977). J. Phys. D 10, No. 4, 11 1. Brown, F. Williams, and Tietjen, J. J. (1971). Proc. IEEE 59, 1489. Casey, H. C., Miller, B. I., and Pinkas, E. (1973). J. Appl. Phys. 44, 1281. Chang, C. C. (1971). Surf. Sci. 25, 53. Chen, J. M. (1971). Surf. Sci. 25, 305. Clark, M. G. (1975). J. Phys. D 8, Crossley, J., and Small, M. B. (1972). J. Cryst. Growth 15, 275. Csorba, I. P. (1970). R C A Rev. 31, 534. Deitch, R. H. (1970). J. Cryst. Growth 7 , 69. Derrien, J., Arnaud DAvitaya, F., and Glachan, A. (1975). Surf. Sci. 47, 162. Derrien, J., Arnaud DAvitaya, F., and Bienfait, M. (1977). Colloq. Int. Phys. Chim. Surf. Solides, 3rd, 1977, p. 181. Dinan, J. H.,Galbraith, L. K., and Fisher, F. E. (1971). Surf. Sci. 26, 587. Ebbinghau, G., Braun, W.. and Simon, A. (1976). Phys. Rev. Lett. 37, 1770. Enstrom, R. E., and Fisher, D. J. (1975). J. Appl. Phys. 46, 1976. Escher, G. A,, Antypas, J., and Edgecumbe, J. (1976). Appl. Phys. Lett. 29, 153. Estrup, P. (1971). Surf. Sci. 25, 1. Ettenberg, M., and Nuese, C. J. (1975). J. Appl. Phys. 46, 3500. Fiermans, L., and Vennik, J. (1974). Silicates Industriels 3, 75. Fisher, D. G. (1974).IEEE Trans. Electron Devices, August 1974, 541. Fisher, D. G., and Martinelli, R. U. (1974). “Negative Electron Affinity Materials. Image Pick Up and Display” (B. Kazan, ed.), vol. 1. Academic Press, New York. Fisher, D. G., Enstrom, R. E., Escher, I. S.,and Williams, B. F. (1972). J. Appl. Phys. 43,3815. Fisher, D. G., Enstrom, R. E., Esher, J. S.,Gossenberger, H. S., and Appert, J. A. (1974). IEEE Trans. Electron Devices 21, 641. Frank, G., and Garbe, S. (1973). Acta Electron, 16, 237. Garbe, S., and Frank, G. (1969). Solid State Commun. 7,615. Goldstein, B. (1973). Surf. Sci. 35, 227. Goldstein, B. (1975). Rapport AD/A 026-710. National Technical Information Service. Goldstein, B., and Szostak, D. J. (1975). Appl. Phys. Lett. 26, 111.


35

Goodwin, A. R.,Gordon, J., and Dobson, C. D. (1970). Br. Appl. Phys. Lett. 17, 109. Gregory, P. E., Spicer, W. E., Ciraci, S., and Harrison, W. A. (1974). Appl. Phys. Lett. 25, 511. Gutierrez, W. A., Wilson, H. L., and Yee, E. M.(1974). Appl. Phys. Lett. 25, 482. Hackett, J. (1972). J . Appl. Phys. 43, 1649. Harris, L. A. (1968). J . Appl. Phys. 39, 1419. Hayashi, I., Panish, M. B., Foy, P. W., and Sumsky, S. (1970). Appl. Phys. Lett. 17, 109. Heiman, W., Hoene, E. L., and Kansky, E. (1973). Exp. Technol. Phys. 21, 193. Hoene, E. L. (1970). Trans. f M E K O Symp. Photon Detect., 4th, 1969 p. 29. Holeman, B. R.,Conder, P. C., and Skingsley, J. D. (1974). SERL Technol. J . 24, 6.1. Hughes, F. R.,Savoye, E. D., and Thoman, D. L. (1972). “Application of Negative Electron Affinity Materials to Imagine Devices.” AIME, Boston, Massachusetts. Hughes, F. R.,Savoye, E. D., and Thoman, D. 1. (1974). J. Electron. Mater. 3, 9. Hyder, S. B. (1971). J. Vac. Sci. Technol. 8,228. James, L. W., Moll, J. L., and Spicer, W. E. (1968). Inst. Phys. Conf. Ser. 7, 230. James, L. W., Antypas, G . A., Edgecumbe, J., and Bell, R. L. (1971a). Inst. Phys. Con6 Ser. 9,195. James, L. W., Antypas, G. A., Uebbing, J. J., YepT.,and Bell, R. L. (1971b). J. Appl. Phys. 42, 580.

James, L. W., Antypas, G. A., Edgecumbe, J., Moon, R.L., and Bell, R.L. (1971~).J. Appl. Phys. 42, 4976.

Johnson, E. 0. (1958). Phys. Rev. 111, 153. Kang, C. S., and Green, P. E. (1967). Appl. Phys. Lett. 11, 171. Kesamanly, F. P., Maltsev, Yu. V., Masledov, D. M., and Ukhanov, Yu. I. (1966). Phys. Stat. Solid 13, 41 19. King, S., Dawson, L. R.,Kilorenzo, J. V., and Jahnson, W. A. (1971). Inst. Phys. Conf. Ser. 9,108. Kressel, A., and Kupsky, G. (1966). Int. J . Elect. 20, 535. Liu, Y. Z., Moll, J. L., and Spicer, W. E. (1970) Appl. Phys. Lett. 17, 60. Liu, Y.Z., Hallish, C. D., Stein, N. W., Badger, D. E., and Greene, P. 0.(1973). J. Appl. Phys. 44, 5619.

Manasevit, H. M., and Simpson, W. I. (1972). J. Cryst. Growth 13/14, 306. Martinelli, R. U. (1973a). J . Appl. Phys. 44, 2566. Martinelli, R. U. (1973b) Appl. Opt. 12, 1841. Martinelli, R. U. (1974). J. Appl. Phys. 45, 1183. Mayamlin, V. A., and Pleskov, Yu. (1967). In “Electrochemistry of Semiconductors” (P. J. Holmes, 4.). Academic Press, New York. Miller, B. I., Pinkas, E., Hayashi, I., and Gapik, R.J. (1972). J. Appl. Phys. 43, 2817. Milton, A. F., and Baer, A. D. (1971). J . Appl. Phys. 42, 5095. Mityagin, A. Ya., Orlov, V. P., Panteleev, V. V., Khronopulo, K. A., and Cherevaltskii, N. Ya. (1973). Sou. Phys. Solid State (Engl. Transl.) 14, 1623. Nelson, H. (1963). RCA Rev. 24, 603. Olsen, G. H., Szostak, D. J., Zmerowski, T. J., and Ettenberg, M. (1977). J . Appl. Phys. 48, 1007. Panish, A. B., and Legems, M.I. (1971). fnt. Phys. Conf. Ser. 9, IPPS, London. Papageorgopoulos, C. A., and Chen, J. M. (1973). Surf: Sci. 39, 283. Pinkas, E., Miller, B. I., Hayashi, I., and Foy, P. W. (1972). J. Appl. Phys. 43, 2827. Pohlhausen, K., and Angew, L. (1921). Math. Mech. 1, 252. Pollard, J. H. (1972). AD 750 364. Potard, C. (1972). J. Cryst. Growth 13/14, 804. Ranke, W., and Jacobi, K. (1973). Solid State Commun. 5. Riccus, H. D., et a/. (1966). Can. J. Phys. 44, 1665. Scheer, J. J., and Van Laar, J. (1965). Solid State Commun. 3, 189. Schuman, P. A., Jr. (1970). Solid State Technol. 13, 50.

36


Shade, H., Nelson, H., and Kressel, H. (1972) Appl. Phys. Lett. 20, 385. Simon, A. (1971). Naturwissenschafen 58, 622. Simon, A. (1973). Z. Anorg. Allg. Chem. 395,301. Simon, A., and Westerbeck, (1972). Angew. Chem., Int. Ed. Engl. 11, 1105. Smith, K. L., and Huchital, D. A. (1972). J. Appl. Phys. 43,2624. Sommer, A. H. (1973a). Appl. Opt. 12.90. Sommer, A. H. (1973b). R C A Rev. 34,95. Sommer, A. H. (1973~).5th Inst. Phys. Conf. Ser. 17, 143. Sommer, A. H.,Whitaker, H. H., and Williams, B. F. (1970). Appl. Phys. Lett. 17, 273. Sonnenberg, H. (1969a). J. Appl. Phys. 40,3414. Sonnenberg, H. (1969b). Appl. Phys. Lett. 14, 289. Sonnenberg, H. (1971). Appl. Phys. Lett. 19, 431. Sonnenberg, H. (1972a). Appl. Phys. Lett. 21, 103. Sonnenberg, H. (1972b). Appl. Phys. Lett. 21, 278. Spicer, W. E. (1974) “Study of the Electronic Surface of 111-V Compounds” (AD A 010 802 Oct.). National Technical Information Service. Spicer, W. E., and Bell, R. L. (1972). Publ. Astron. Soc. Pac. 84, 110. Stocker, B. J. (1975). Surf: Sci. 47, 501. Syms, C. H. A. (1969). Ado. Electron. Electron Deoices ZSA, 399. Tietjen, J. J., and Amick, J. A. (1966). J . Electrochem. SOC.113, No. 7, 724. Tiller, W. A. (1968). J. Cryst. Growth 2, 69. Tiller, W. A., and Kang, C. (1968). J. Cryst. Growth 2, 345. Tsai, K. R.,Harris, P. M., and Lassetre, E. N. (1956). J. Phys. Chem. 60,345. Turnbull, A. A., and Evans, G. B. (1968). J. Phys. D 1, 155. Uebbing, J. J. (1970). J. Appl. Phys. 41, 802. Uebbing, J. J., and James, L. W. (1970a) Appl. Phys. Lett. 16, 370. Uebbing, J. J., and James, L. W. (1970b). J. Appl. Phys. 41, No.11,4505. Van Bommel, A. J., and Crombeen, J. E. (1976). Surf: Sci. 1976, 109. Van der Pauw, L. J. (1958). Philips Res. Rep. 13, No. 1, 1. Van Laar, J. (1973). Acta Electron. 16, 215. Van Laar, J., and Scheer, J. J. (1967). Surf: Sci. 8, 342. Van Laar, J., and Scheer, J. J. (1968) Philips Tech. Rev. 28, No. 12, 355. William, B. F., and Simon, R. E. (1967). Phys. Rev. Lett. 18,485. Yee, E. M., and Jackson, D. A. (1971). Solid State Electron. 15, 245.

ADVANCPS IN BLKTRONICS A N D ELECTRON PHYSICS, V O L

48

A Computational Critique of an Algorithm for Image Enhancement in Bright Field Electron Microscopy* T. A. WELTON Physics Division Oak Ridge National Laboratory Oak Ridge, Tennessee and Department of Physics University of Tennessee Knoxville, Tennessee

I. Introduction IV. V. VI. VII.

............................................................. 37 ............ 39

Statistical Error ......... ....................................... Object Reconstruction ........................................... Programs for Numerical Tests of the Reconstructi Presentation and Discussion of Data...............

50

Appendix B. The Wiener Spectrum of the Object Set.................................. Appendix C. Programs for Determining W(k) ......................... ............................................... References ...

94 97

loo

I. INTRODUCTION Since the elucidation by Scherzer (1936) of the role played by the aperture defect in limiting the resolution of the electron microscope, a number of approaches have been explored to obviate such limitation. Scherzer himself proposed several possible improvements (1947, 1949), that of greatest interest being probably the use of multipole corrector lenses (Scherzer, 1947). Subsequent work has unfortunately made it all too clear that this approach, while admirable from a theoretical viewpoint, is highly complex in realization (Seeliger, 1951; Burfoot, 1952; Archard, 1955; Deltrap, 1964). No resolution improvement in an actual microscope has, in fact, been realized, and

* Research sponsored by the Division of Physical Research, US. Department of Energy, under contract W-7405-eng-26with the Union Carbide Corporation. 37 Copyright 0 1979 by Academic Press,Inc. All rights of reproduction in any form reserved

ISBN 0-12-014648-7

38

T. A. WELTON

there is presently good reason for skepticism regarding the true value of this initially promising idea. Other suggested improvements (Thon and Willasch, 1971) were based on the idea of altering the image wave in the back focal plane by selective obstruction and/or retardation, in order to at least partially compensate the extremely unfavorable phase variations introduced by the aperture defect, defocus, and axial astigmatism. When, however, the severe fabrication problems inherent in this approach were ingeniously surmounted, the actual results were disappointing. Some skepticism as to these approaches seems accordingly also presently justified. A third class of improvements has also received considerable attention (Hahn and Baumeister, 1973; Hahn, 1973; Thon and Siegel, 1970, 1971; Stroke and Halioua, 1973; Stroke et al., 1974; Welton, 1970; Langer et al., 1971; Erickson and Klug, 1971) namely, that in which the imperfect micrograph is processed in some way to at least partially compensate for the effect of aberrations. These suggestions have always possessed some attractive features. They do not involve delicate fabrication problems, since they operate on the micrograph, an entity of macroscopic size. These methods further bear a close relationship to methods currently under study in light-optical applications, with much of this experience being directly transferable to the electron-optical problem. As with the other methods mentioned, practical results thus far obtained with these micrograph enhancement methods do not yet live up to their promise, and it is this frustrating situation which the present article is intended to address. We conclude this section by noting that two distinct methods exist for carrying out the work of enhancing a micrograph. The first utilizes coherent illumination of the micrograph (in transparency form), followed by suitable focusing, retarding, and absorbing elements. The result is a real image which provides a finished picture by exposure of a film. This method has been carefully explored by several workers (Hahn and Baumeister, 1973; Hahn, 1973; Thon and Siegel, 1970,1971; Stroke and Halioua, 1973; Stroke et al., 1974). No fully conclusive results appear to have been obtained yet, and it is of considerable importance to understand the difficulties and potential strengths of the method. Further discussion is beyond the purview of this article, and definitive answers must clearly be obtained by practitioners of the methods. The methods to be discussed in detail (Welton, 1970; Langer et al., 1971; Erickson and Klug, 1971) in the present article involve carrying out the required enhancement operations by large-scale computation on a digital representation of the micrograph. As always, these methods carry their peculiar advantages and difficulties, and these desperately require realistic evaluation, especially in view of the fact that here again substantial successes are difficult to find.

IMAGE THEORY FOR BRIGHT FIELD ELECTRON MICROSCOPY

39

Accordingly, the plan of this article will be to explore in some detail a small (but promising) subset of the computational procedures, with a view to enumerating and evaluating the practical difficulties. Parallel discussions of the optical processing and the purely electron-optical methods are badly needed, as well as detailed comparisons with the purely computational procedures. The method to be used is based on computer synthesis of the micrograph to be processed, a procedure that allows careful control of the quality of the data used. For a number of years, such results (Welton, 1971a) were the only existing substantial evidence for the possible value of computational methods, although a few interesting results have been more recently reported (Welton, 1975), using actual electron micrographs. 11. IMAGE THEORY FOR BRIGHTFIELD ELECTRON MICROSCOPY It is a truism that the enhancement or improvement of any image can only be undertaken if a reasonably detailed theory is available for the process of image formation. We shall see, in fact, that the aberrations of the imaging process, the statistical error level (noise), and some statistical information concerning the object structure must all be reasonably well known before any progress can be made. At this point, we specialize to the case of bright field imaging, the requisite theory being then relatively simple. Although the dark field image mode is more attractive to the eye, it has been found by numerous tests that the bright field mode is not inferior in information content. Once the bright field micrograph is digitized, moreover, the background density level can be subtracted and the contrast level of the remainder expanded to the limits of the final display medium, so that the eye can be fully satisfied. A similar contrast enhancement could, of course, be performed by optical printing onto a high contrast emulsion. We will shortly exhibit the advantage of the bright field mode from the standpoint of computer processing. The basic theory needed is, of course, basically that given by Abbe for the optical microscope, as adapted by Scherzer (1949) for the case of electron imagmg. Some changes will be necessitated by the essential aberrations of electron-optical systems and by the special properties of high resolution electron microscopic samples. Most of these differences will be seen to cause additional difficulties (ranging from moderate to severe) in the electronoptical case, the single exception being the small numerical aperture of the electron microscope. The simple Abbe theory is nevertheless a most useful starting point, and we proceed to set up a treatment adequate for our purpose. We assume the object to be some arrangement of atoms, lying nearly in a plane (to within 100 A or less). The electrons are assumed to be incident all with very nearly the same direction and energy, so that the illumination is nearly coherent. The first Born approximation will be

40

T. A. WLTON

assumed to be reasonably valid for the description of the interaction between the beam electrons and the sample atoms, but we shall discuss briefly the effects of departures from this assumption. We further assume that the field is small enough that coma is small in a sense easily made quantitative. In order to describe the effect of partial coherence, we calculate the image intensity distribution expected from each of a continuous set of perfectly coherent illuminations and finally average these intensities over an assumed distribution function for the energy and direction spread of the actual illumination. These averages must in general be done numerically if high accuracy is to be achieved, but it will be seen that a certain degree of crudity in this calculation can be tolerated for the purpose at hand. A basic approximation will therefore be that of normal (gaussian) distribution for the energies and directions of the incident electrons, a procedure first exploited in the numerical synthesis of bright field micrographs (Welton, 1971a) and in the use of informational concepts to characterize microscope performance (Welton, 1969). Full derivations have been given for the case of energy spread by Welton (1971b) and by Hanszen and Trepte (1971), and for the case of angular spread by Welton (1971b) and by Frank (1973). We now present a version of bright field image theory in a form adequate for our purpose. A particular electron of the illuminating beam is described by a wave function exp[i(lc

+ C)z + it

*

x]

where

and m is the electron rest mass, c light speed, h (h/27r) Planck‘s constant, p mean electron momentum, D electron speed, and /-j = u/c. The quantities ( and 5 are deviations for an individual electron from the mean momentum of the beam electrons. It will be extremely convenient to characterize a beam electron by a longitudinal (or axial) momentum component h(ic + () and a pair of rectangular transverse components he, written for compactness as a two-dimensional vector quantity. In all that follows it will be assumed that C and 6 are distributed in normal fashion. Thus, with N a normalization factor, is the probability per unit volume in the C, 6 space of finding a given beam electron. With a carefully adjusted illumination system, qx can be made equal to q,,, but in any event a particular orientation for the xy axes has been chosen to slightly simplify Eq. (2.3). We may anticipate that the above form for the [ distribution may be quite good


41

generally, and that the 5 distribution will be closely of the chosen form for the interesting practical case of the field emission source (Young and Muller, 1959). We note the equivalence of the parameters 6 and q to more familiar forms, thus

where 6 p and 6E are t..e rms momentum and energy spreaL.i, respectively, and 60 and 6x are the illumination angle and the transverse coherence length, respectively.* In the Abbk theory, as used by Scherzer, the interaction of the illuminating beam with the sample is described by computing the phase retardation of the wave corresponding to a particular electron, produced by passage through the electrostatic fields of the sample atoms. We consider a sample atom with coordinate (x,, z,) and electrostatic potential V ( x - x, , z - z,). It will be convenient (and presumably adequately accurate) to take the potentials of the separate atoms as additive and spherically symmetric, thus ignoring the details of chemical binding. It is further plausible that the first Born approximation will be of useful accuracy.? Finally, it is easily shown that a simple calculation of the retardation produced by the sample atom for the undeflected incident wave yields a convenient and accurate form for the scattered wave. For the moment, we take = 6 = 0, and write a3

where the arrow indicates the change in the form of the incident wave produced by passage through the sample. The retardation function 8~ is simply defined by energy conservation. Thus

+ 6 ~ ) ] ’+ m2c4 = (E + mc’ + eV)’ ( ~ c K+)m2c4 ~ = (E + mc2)’

[~c(K

and

(2.6) (2.7)

* The approximate equivalence indicated for 6 would be precise in the nonrelativistic limit and that indicated for q x , qy reflects a confusion in the literature between several nearly equivalent definitions of the width of the angular distribution. t Exceptions will obviously arise when a number of atoms are nearly aligned in the beam direction, or when the average number of atoms per unit area, transverse to the beam, becomes too large.

42

T. A. WELTON

where E is the beam kinetic energy, e is the magnitude of the electron charge, and V is the electrostatic potential (positive for an atom with its positive central charge). A simple calculation yields

where terms in 6rc with powers of V past the second have been ignored. The term quadratic in V will in fact be ignored, as will some further such quadratic terms, and it is of some importance to understand the basis for this neglect. First, as /3 -+ 1, the term in question clearly becomes of vanishing importance. For more modest /3 values, however, another argument is needed. As an extreme assumption, take V = Ze/r

(2.9) so that we are to compare Ze2/rnc2r with unity. Clearly the quadratic term will be unimportant for r > Ze2/mc2 (2.10)

or, taking Z = 92 as the worst case, for r > 92 x 2.42 x

cm (2.11)

or r > 0.0022

A

which figure is far beyond the resolution capability of any imaginable electron microscope. Thus the very small regions around the sample nuclei in which the quadratic term of (2.8) is important simply are not imaged. Following Scherzer (1949), we return to Eq. (2.5) and make a series expansion of the exponential factor containing 6rc. We thus find that an incident wave exp(ilcz) is transformed, by passage through the sample, into the sum of the incident source wave plus an object wave exp(iicz)-+ exp(iicz) + iO(x) . exp(ilcz)

(2.12)

[ im I

(2.13)

dz 8K(X, z)

(2.14)

where

m

iO(x) = exp i

I‘

m

or

O(x) 3

dz 6lc(x, z) - 1

*-m

where the form (2.14) again involves neglect of a term quadratic in V.* Unlike the previous neglection, which involved a comparison of potential * See Appendix A for a discussion of this point.


43

energy with rest energy (and was therefore relativistic in origm), this one involves neglect of the mechanism by which the incident wave is systematically diminished in amplitude by interaction with the sample. Again, a “worst case ” analysis suggests that for imaging of single uranium atoms, the linearized analysis will be valid except for unresolvable disks of diameter about 0.03 A surrounding each nucleus. We will assume in the following that all samples (of necessity biological, rather than metallurgical) are sufficiently thin that attenuation of the incident beam is of no consequence. The assumptions made as to the negligibility of terms quadratic in V will be collectively referred to as the “weak object ” approximation. For the moment, we take all sample atoms to be located in the plane z = 0. The optical system following the sample will be characterized by numerous aberrations, most of which can be ignored. The primary spherical aberration and defocus must be considered, and we choose to include the axial astigmatism, since a practical system can never be completely free of anisotropy. For the present, the illumination is fully coherent, so that chromatic aberration does not yet appear. As is usual in microscopic imaging, aberrations involving location in the object plane are of little importance, so that distortion, field curvature, and nonaxial astigmatism will be ignored completely. Coma will be the field aberration which sets the practical limit to the usefulness of the image processing methods to be considered. A simple wave-optical calculation of the propagation through the system will now yield the image-plane intensity Z(x) which is produced by the source wave and object wave O(x). For convenience, we take the system to have unit magnification. As in AbbC’s original theory, it is extremely convenient to decompose the object wave into its Fourier components, each of which can be considered to propagate independently through the system. Thus, if

1

-

O(x) = dk B(k) exp(ik x)

(2.15)

we can calculate the image plane amplitude A(x)= i

[ dk b(k)O(k) exp(ik

*

x)

(2.16)

by multiplying each Fourier component by an appropriate complex function b(k), before recombining the individual components. A given component describes an electron wave propagating at a small angle to the system axis, a convenient description being one that uses an angle 8 with two components, such that 8 = k/K

(2.17)

44

T. A. WELTON

In the absence of aberrations, this component wave would focus to a diffraction-limited “point ” image in the back focal plane. (z = 2F, where F is the focal length of the assumed thin objective lens. In a slightly more complete treatment, the object plane would be at a distance F preceding the first principal plane, and the back focal plane would be at a distance F following the second principal plane.) The center of the point focus for the component k would then be at a transverse point

xB = F8

(2.18)

It will naturally be convenient to think of k as a position in the back focal plane, although the precise focusing implied by (2.18)will be disrupted by the aberrations. The effect of axial aberration is now simply described by computing the amplitude that would be present at the back focal plane, in the absence of aberrations, and multiplying it by a complex function of position, which involves the aberrations. Thus, the image amplitude will be determined by (2.16) with = exp[i+(kll (2.19)

w

and

+(k) = (

N ~ c- ,8 + -YC,I8 14)

(2.20)

The scalar constant C3is simply the usual coefficient of primary spherical aberration (aperture defect), having a magnitude of the order of the focal length for an optimized objective lens. The quantity C, (actually a 2 x 2 symmetric matrix) describes the defocus and axial astigmatism. Thus, we have for the first term in the parentheses of Eq. (2.20) (2.21) A more usual notation would be Clxx = C,

+ ( 4 2 ) cos 2a

Clxy= ( 4 2 ) sin 2a

(2.22)

ClYy= C, - ( 4 2 ) cos 2a

where C1is now the scalar mean defocus, A is the astigmatism, and a is the rotation angle from the x axis to the principal axis of the astigmatism. The absolute sign of 4 is not usually of great importance (although it does determine the sign of the contrast), but it is necessary to note that C, and C3 will have the same sign for the case of overfocus. If it were not for the possible anisotropy of the illumination as described by Eq. (2.3),it would be natural to choose axes so that a would vanish. However, to allow for illu-


45

mination anisotropy, it will be convenient to keep the form (2.3), which already selects an orientation and necessitates use of the more general form (2.22). In order to complete the treatment of bright field imaging of a weak object by fully coherent illumination, we must propagate the source wave through the system, add it to the image amplitude, and take the absolute square of the sum to obtain the image intensity. The source wave, from Eq. (2.12), has unit amplitude over the object plane, so that its Fourier transform is given by 9 ( k ) = (2a)-’

dx exp( - ik x)

= 6(k)

(2.23)

The resulting image plane amplitude will then be

-

mC 6(k) 6(k) exp(ik x ) = &(O) = 1

(2.24)

corresponding to the fact that a unit amplitude wave propagates through the system of unit magnification as a unit amplitude wave. Finally the image plane intensity can be written as 1

+

Z(X)

=

11

+ A(x)IZ

or I(x) = A(x) = A(x)

+ A*(x) + IA(x) 1’ + A*(x) + quadratic terms

(2.25)

We will effectively neglect the quadratic terms in what follows, consistently with our assumption of a weak object, but some discussion will be given of the probable error thus incurred. It will be convenient at this point to obtain an expression for O(k) in terms of the actual object structure. We accordingly write for the electrostatic potential of the sample (2.26)

where V , is the potential of the nth atom, at transverse position xn and centered on the plane z = 0. We now have, from (2.8) and (2.14) O(x) = (e/hcB)

OD

n

-w

dz K ( x - x, , z )

(2.27)

46

T. A. WELTON

and from the inverse of (2.15)

The second line of (2.28) is obtained by an obvious origin shift, with the assumption of spherical symmetry for each V, about its nucleus. Finally, we use the definition of the amplitude for electron scattering from an atom, in Born approximation 2me sin kr F(k) = (2.29) r2 dr V(r)h2 0 kr to obtain h O(k) = -(211)-’ C F,(k) exp( - ik * x,) (2.30) mcB n We now combine (2.16), (2.19), and (2.25) to obtain ~

I ( x ) = J’ dk Y(k) exp(ik * x) =i

(2.31)

1’

-

dk €(k)O(k) exp(ik * x)

J’ dk €*(k)O*(k) exp( -ik

- x)

+ quadratic terms

(2.32)

Equation (2.28)defines O(k) as the Fourier transform of a real function of x, so that O*(k) = O( - k) (2.33) and a reversal in sign of k allows the first two terms of (2.32)to be combined. Thus I ( x ) = i J’ dk[€(k) - a*(- k)]O(k) exp(ik x)

-

+ quadratic terms

(2.34)

which, by use of (2.19) and (2.20) becomes

1

-

I ( x ) = dk i{exp[i+(k)] - exp[ - i+(k)]}O(k) exp(ik x)

+ quadratic terms

(2.35)


47

We neglect the quadratic terms for the time being, and obtain

Y(k) = P ( k ) .O(k) where

P ( k )= - 2 sin

4(k)

(2.36) (2.37)

which nearly completes the derivation of the conventional bright field imaging formula, with neglect of all quadratic terms. We can allow for sample thickness by taking z,, as the axial displacement of the nth atom from the mean sample position (defocus Cl). An effective axial resolution can be defined as the axial displacement 6z which causes the phase factor exp(i8z . k’/2~)to be - 1, for the largest value of k of interest. If 6 is the spatial resolution required, then

k I n/6 = k,,

(2.38)

and 8z = 2n~/ki,, =4

(2.39)

62/1

For 6 = 1 8, and I = 0.037 A (100 keV), we obtain 6z = 100 A. It appears quite feasible to handle thicker samples by use of sample tilt, but we will not attempt to so burden the present discussion. OF PARTIAL 111. EFFECT

COHERENCE

Partial coherence of the incident electron beam is easily handled in the approximation used in the previous section. The essential idea, as used by Hanszen and Trepte (1971), by Frank (1973), and by the author (Welton, 1971b), is to calculate the sum of the image intensities due to each energy and direction of illumination, the distribution of these quantities being given by Eq. (2.3), for the case of a field emission source without condenser aperture, which is the case to be considered here. The case of a single electron of the ensemble now representing the beam is handled by consideration of the more general incident wave function (2.1). For the case of approximately axial illumination, the illumination angle 68 will be smaller than rad, and the fractional momentum deviation will be less than Under these conditions, two potentially troublesome complications are avoided. First, the perspective alteration as the beam direction varies will be so small that for a reasonable size field (order of lo00 A) the resulting shifts in image position will be undetectably small. Second, the magnification variation as the beam energy varies will again produce undetectably small image plane shifts. These two observations

T. A. WELTON

48

make it possible to use a single object function for all electron energies and directions present in the illuminating beam. More precisely, Eq. (2.5) now becomes exp(ilcz + i[z

+ it

*

x) 3 exp(ilcz

+ i[z + i t - x ) exp

[

I

m

dz‘ ~ I C ( Xz’) ,

i J-m

(3.1) The subsequent argument is slightly altered by the presence of the factor exp(i6 * x) in O(x). It is easily seen that Eq. (2.16) must be altered to allow for the modified propagation through the optical system. Thus, to replace (2.16), we have A(x)= i

J dk d(k + 5, K + [ ) O ( k ) exp(ik

*

x) exp(i5 x)

(3.2)

The indicated modification of the transmission factor d leads to a rather complex form when taken literally, but some neglections are very much in order. We neglect the change in C3with electron energy and assume that aF Cl4ICZ

(3.3)

where F is the objective focal length. This condition is very nearly equivalent to Ci 4 F (3.4) which is clearly extremely well satisfied. Finally, we ignore all terms in 4 which are quadratic in 5 and [. The result for 4 is

+(k

+ 6,IC + () = (IC/~)[IC-%Ci k + (c4/2)C31 k 1‘ + x[K-’k C, 5 + I C - ~1 kCI2k ~ 51 + (k’/2~’)( IC *

*

*

*

(3.5)

The contribution to the image intensity I ( x ) is altered slightly from Eq. (2.25), since the source wave in the image plane now becomes

q r ) exp(i5

x)

(3.6)

where S(g)can be replaced by unity, in view of Eq. (3.5). Finally, Eq. (2.35) becomes, with neglect of the quadratic term (2.33),

49


There remains only to average over 4 and [, using Eq. (2.3). We use the lemma

/ 1-

m

m

(exp(im))

=

ds exp(ias) exp( - p2s2/2) - W

= exp(

oo

ds exp( -p2s2/2)

2) 2P2

which is simply obtained by completing the square in the exponent of the integrand of the numerator and deforming the contour in a clearly legal fashion. We thus obtain

.

= J dk Y(k) exp(ik x)

(3.9)

where

-

(3.10)

9 ( k ) = 9 ( k ) O(k) with 9(k) =

-2

exp[ - E(k)] sin #(k)

(3.11)

and

(3.12)

-

The notation (C, k)x, of course, signifies

+

Cixxkx+ Clxyky= [C, ( 4 2 ) cos 2a]kx

+ [(A/2) sin 2a]k,

(3.13)

and similarly for (C, * k), . The case of isotropic illumination, with negligible astigmatism, is of particular interest as a limiting case, although the more complete formalism is needed in practical calculations. We then obtain

'

lk12(C, +-i-c 3 K

),I

(3.14)

As will be shown, the form (3.11) will be extremely useful as a theoretical basis for the problem of object reconstruction. The factored form with an

50

T. A. WELTON

envelope function exp( - E) multiplying a simple sinusoid follows simply from exclusion of quadratic terms in the deviation quantities from the exponent of &‘(k).Although we shall use the simple form (3.11) in the numerical work to follow, it should be emphasized that the factored form is actually more general than the derivation here given. It was in fact shown by Frank (1973) that a relaxation of the requirement that +(k) be precisely the function appropriate to the fully coherent case will allow the form (3.11)to be retained. This result will be seen to allow retention of the bulk of the formalism to be displayed herein. A word concerning the quadratic term in I ( x ) will be useful. The 6, averaging which has been carried out for the linear terms leads to no particular simplification for the quadratic term. As a plausible empirical method for estimating the importance of this term, we adopt the procedure of replacing the proper average of the term A * ( x ) A ( x ) by the absolute square of the average of A ( x ) . Thus, we assume

I

I ( x ) = dk 9(k) O(k) exp(ik * x)

+ (A*(x))

*

(A($)

(3.15)

where ( A ( x ) ) = i J’ dk exp[ - E(k)] exp[i+(k)]O(k) exp(ik * x)

(3.16)

We may anticipate that the quadratic term of Eq. (3.15)will be an underestimate of the true term, but that the order of magnitude will be correct. ERROR IV. STATISTICAL In the absence of an optimum coding scheme, any recorded information, such as a micrograph, must be more or less corrupted by noise, or statistical error of some sort. In the case of an electron micrograph, the number of electrons impinging on a unit area of the emulsion will of necessity be finite, and only its expectation value is related to the object structure. At this point, it will be extremely convenient to modify our formalism so that it is oriented to practical computations. We note that we cannot deal with an image in complete detail, but must rather work with a two-dimensional table of sampled image values from a set of picture elements (“pixels”). A given image value will normally be the average of the actual image density over the pixel, and for not too heavily exposed emulsions, this average will be nearly proportional to the actual number of electrons incident per unit area. We now assume that the object function O(x) is periodic in the x plane, and in fact repeats when x or y are increased by an amount S. Such a repeating object will be expected to yield an image structure with one of its


51

unit cells, sensibly identical to that which would obtain if only a single unit (S x S) of the object were present. This assumption in fact constitutes a potentially important limitation on the smallness of the sample area that can be practically handled, and every case must be examined individually to determine the width of margin about the edge of the sample area which is seriously contaminated by nonexisting information from the surrounding sample. It can in fact be seen (Harada et al., 1974) that the transverse coherence length is a plausible measure of the width of the defective margin. In addition to this artificial periodicity, we assume that the domain of spatial frequency k required for imaging is finite, and we are thus immediately led to the representation of O(x) and Z(x) by a finite Fourier series. Thus, we make the following assumptions O(x) =

c O(k) exp(ik c 9 ( k ) exp(ik *

x)

(44

x)

(44

k

I(x) =

k

with the obvious inverses

O(k) = N-’

1 O(x) exp( -ik

x)

(4.3)

x)

(4.4)

X

4 ( k ) = N-’

c

Z(x) exp( -ik

*

X

We take the dimension of the unit object cell S to equal NQ,where Q is the chosen pixel dimension. The discrete x values are then given by x

= (x, Y ) = (mQ, nQ)

(4.5)

with m, n = 0, 1, . . . , N - 1. The discrete k values are correspondingly given by k ( k x , ky)= (k . 2n/S, 1 * 2n/S) (4.6) with k, 1 = 0 , 1, ..., N - 1. Mention must be made at this point of the phenomenon of “aliasing,” by which is signified the equivalence of half the range of positive spatial frequencies to negative spatial frequencies. Thus, for the spatial points defined by (4.5),

exp(ik, . x) = exp(ik . 2n/S . mQ) = exp[2nik . m / N ) ] = exp(-2niN

m/N) . exp(2nik * m/N)

= exp[ -2ni(N - k) . m / N ]

(4.7)

52

T. A. WELTON

We therefore think of values of k in the range

0 5 k I(N/2) - 1

(4.8)

as corresponding to positive k,, and values in the range N/2 Ik 5 N - 1

(4.9) as corresponding to negative k, . The precise correspondence for this latter range is obviously (4.10) k, = -(N - k ) * 2x1s

The limiting value k = N/2 yields

k,= -N.n/S = -n/Q

(4.11)

which can be equally well thought of as a positive k , value, since exp(in/Q . mQ) = exp(imn)

(4.12)

which simply alternates between + 1 and - 1 as we pass from a pixel to its neighbor. Note that, if k,,, is the maximum spatial frequency present in an image, then a natural definition of the spatial resolution available is S=Q=

n/km,x

(4.13)

We can now introduce noise in a natural way. Let N, be the number of electrons incident per unit area on the sample, so that N, Q2 is the expected number incident on a given pixel. We designate by (Z(x)) the image intensity calculated in the previous section, since it is just the expected intensity distribution, averaging over statistical error. The expected number of electrons striking a given pixel of the image plane will then be NeQ2[1+ (Z(x))I

(4.14)

The actual number detected will differ randomly from (4.14),with Poisson distribution and standard deviation 6N = (N,QZ)1’2[1+ (Z(X))]”~

(4.15)

The distribution can be taken as gaussian, to high accuracy, and we accordingly write for the observed number of electrons per pixel NeQ2[1+ (Z(x))][l

+ R(x)]

(4.16)

where R(x) is normally distributed with (R) = 0

(R2) = (N,Q2)-’[l

+ (Z(X))]-’

(4.17)


53

If we now divide by N e Q 2 ,we obtain 1 + Z(x) = [l

+ (I(x))][l + R(x)]

(4.18)

where Z(x) is the measured image intensity distribution. At this point, the special convenience of bright field imaging becomes apparent. We assume (Z(x)) 4 1 and N , Q Z B 1, with two important consequences. First, the noise level becomes independent of position, since we now have

( R ~ =) ( N , Q ~ ) -

(4.19)

and second, the noise becomes additive, since the cross product of Z(x) and R ( x ) in (4.18)can be ignored. We finally obtain

+ R(x)

Z(x) =

(4.20)

which form will allow some interesting manipulation. It is clearly of imporand tance to obtain an estimate of the error incurred by the use of Eq. (4.20), this is done analytically in Appendix A, and computational evidence on this question will be found in Section VII. At any rate, further progress is extraordinarily difficult without the above assumption, either taken literally, or used as the starting point for a sequence of successive approximations. Allowance can easily be made for the imperfection of the image plane detector by dividing the right-hand side of Eq. (4.19)by the detective quantum efficiency (DQE) of the detector (approximately 0.7 for good quality electron emulsions), so that the variance of R now refers to the statistics of silver halide grain development. We now introduce a concept that will be central in the following work. Write (R(x)R(x’)) = C( I x - X’ I )

(4.21)

where C will be referred to as the autocorrelation coefficient for the error function R ( x ) . The indicated average can be defined in several ways, and these will be assumed to be substantially equivalent. The first definition considers a single micrograph and averages over position in it. Thus dS R(x + S)R(x’ + S)

(R(x)R(x’)) = A - ’

(4.22)

A

where A is the area chosen for averaging. The second definition considers a large collection (ensemble) of micrographs, identical save for statistical error, and the average is now the average over this ensemble. The connection between the two definitions lies in the fact that many subareas of a single micrograph can be thought of as the members of a small ensemble. The form chosen for C, as depending only on the magnitude of the displacement between the two points in question, reflects the fundamentally satisfying

T. A. WELTON

54

assumption that the statistical error is somehow independent of position and orientation in the micrograph. A convenient and plausible further assumption is that the error in one pixel is uncorrelated with that in any other pixel. This assumption can, in principle, fail if electron scattering in the detector allows a single electron to cause response (e.g., expose silver halide grains) in two adjacent pixels. In practice, the author has never seen evidence of a requirement for this degree of generality, and we accordingly assume C(

I X - x’I)

= ( N O Q 2 ) - ’S(X- x’)

(4.23)

where discrete values have been assumed for x and x’ and 6(x - x’) is the Kronecker symbol, defined by 6(x - x’) = 1,

x = x’

6(x - x’) = 0,

x # x’

(4.24)

The fundamental imaging equation (4.20) lends itself beautifully to a treatment in Fourier space. We have I(x) = C 9 ( k ) exp(ik x)

(4.25)

k

(I(x)) =

B(k) * O(k) exp(ik * x)

(4.26)

k

.

R(x) = C W(k) exp(ik x) k

(4.27)

where (4.25) is a definition [I(x) being the measured image plane intensity distribution], where (4.26)is just (3.9)and (3.10)rewritten, and where (4.27) is also a definition. These three equations can be immediately combined to yield Y(k) = B(k) * O(k)

+ W(k)

(4.28)

and the statistical properties of W ( k ) are easily deducible from those of R ( x ) . Thus

C(I x - x’ I ) = =

C W ( k )exp(ik * x)W*(k’)exp( -ik’ * x’)

(k. k‘

C (W(k)W*(k’)) exp[i(k - x - k’

k, k’

x’)]

(4.29)

We have simply inserted the definition (4.27) in Eq. (4.21) using R*(x’) instead of R(x’) for convenience (R is real, in any event). It is clear that Eq. (4.29)can be obeyed only if (W(k)W*(k’)) = JV( 1 k I ) 6(k - k‘)

(4.30)


55

in which case, Eq. (4.29) becomes C( Ix - x’ I ) =

k

N(1 k I ) exp[ik

(x - x’)]

or

(4.31)

N ( k )= N - *

C C(x) exp( -ik

- x)

X

Finally, by use of Eq. (4.23), we obtain

N ( k )= (N,N2Q2)-l

(4.32)

The usual terminology at this point would refer to N ( k ) as the power spectrum (electrical analogy in the time domain) or Wiener spectrum of the function R(x). Finally, we note that a normal, possibly spatially correlated distribution for the values of R ( x ) will imply a normal distribution for the values of W(k), uncorrelated in the k domain. V. OBJECT RECONSTRUCTION

We are now prepared to attack the problem of finding a suitable algorithm for the extraction of object structure information from a measured noisy image. This process is frequently referred to as image enhancement, but the name “object reconstruction” will be used in the following. Such reconstruction will of necessity be incomplete, but we propose to develop simple techniques for evaluating the degree of reconstruction that should be possible, and then processing the micrograph as simply as possible to achieve something like optimum reconstruction. The measured image intensity contains sample information which has been degraded in two distinct ways. By reference to Eq. (4.28), we see that B(k), the modulation transfer function (MTF)of the microscope, will act to make the image function differ substantially from the object function. In fact, 9(k)is simply the Fourier transform of the point spread function P(x), which the optical system (in the absence of noise) convolutes with O(x) to yield I(x). In a conventional optical system P(k) will be slowly varying and B(k) will drop of one sign up to some value k = k,,,. Beyond k =,,k rapidly to zero, with or without rapid oscillation, usually with the assistance of an aperture stop in the back focal plane. The normal result of this behavior for 9 ( k ) is that each point feature of O(x) produces a somewhat diffuse feature with a radial extent of approximately n/kmaX. In the conventional electron microscope, the necessarily nonvanishing aperture defect [C, , from Eq. (2.2011, in conjunction with the defocus (C,) will cause such a rapid oscillation of B(k),and the rapid increase of the envelope function E(k), usually because of beam energy spread [as conclu-

T. A. WELTON

56

sively shown by Frank (197511, will act to cut off contributions to f ( k ) . An objective aperture is normally used to prevent confusion of the image by successive sign reversals of P(k),but we will assume the aperture to be absent. This omission has no serious effect on the image quality; it simplifies the theory and can easily be inserted in the course of computer processing. In Fig. 1, the oscillatory curves are essentially the function F2(k) exp[ - 2E(k)] sin’ +(k)

(54 with a small added constant, irrelevant to the present argument. The variation of F2(k), as indicated, is not important to the argument. The dashed

1.0

I

I

I

I

I

I

I

I

I

I

I

0.9 0.8 0.7

t Il l / m

0.6 -

0.5

I

-

Y

i

\

I

0.4

0.3 0.2 0.1

0

0

20

40

60

80 100 120 140 160 180 200 220 240 k

(I-‘)

FIG.1. Theoretical diffractogram densities for the conventional and high-coherence cases.

curve (labeled “conventional”) and the solid curve (labeled “high coherence”) are both calculated for microscopes having

W = beam energy = 100 keV F = objective focal length = 2 mm C3= C,= spherical aberration coefficient = 1 mm C, = P--aF = chromatic aberration coefficient = 1 mm 2 aP


57

The conventional microscope is further characterized by C, = 700 A (underfocus)

6=

($)

=6.4 x low6

rms

dx = transverse coherence length = 20 A

while the high-coherence microscope has C, = 1650 A (underfocus) 6 = 1.0 x

6x=200A At a later stage in the argument, it will be necessary to define the sample and the conditions of exposure. The basic sample is an amorphous carbon film of thickness 10 A, with N, = 500 electrons/A2. This basic sample will be assumed, without further adornment, unless otherwise specified. The conventional curve in Fig. 1 is characterized by a single dominant peak, with a following train of small further peaks (invisible on this scale, except for the first such). The choice of C, (the precise criterion will be discussed shortly) is qualitatively such as to extend the large peak to as high a value of k as possible, subject to the requirement that the peak also remains " high " as " long as possible " (with these meanings to be clarified). This last requirement tends to be thwarted by the chromatic aberration contribution to E(k), the relevant term being The high-coherence curve of Fig. 1, on the other hand, is permitted by the lesser value of 6 p / p [which yields 2E,h,(k) = (k/4.129)4]to display a train It will subsequently of peaks still of substantial importance at k = 3.0 kl. appear that the integral jokm"k dk exp( - 2E) sin2 4

(5.3)

is of prime importance in defining image quality, so that the maintenance of substantial values for the critical envelope function exp( -2E) over the widest possible range is extremely desirable. We now proceed to derive a suitable expression for the image quality, and in so doing, we shall have found the essentials of an interesting reconstruction procedure. Consider again the fundamental imaging equation (4.28). With the assumption of normal distribution for the noise function Wb),we can write an expression for the probability distribution of the function 9 ( k ) about its expectation P(k)O(k). We use an obvious bracket

T. A. WELTON

58

notation for the probability, with specified conditions first and result second, thus

Mk)Iy(k)}=

z-’ exP[-ck

I 4 k ) - P(k)@(k)I’ / 2 N ( l k l ) ]

(5.4)

where Z is a suitable normalization constant. The computation of 2 yields useful practice with this formalism. First, note that the sum over k includes each term twice, because of the reality condition f * ( k ) = 9(- k) (5.5) We consider a single k vector only, and write 9 ( k ) - P(k)O(k) = 9 ( k ) x + iy (5.6) The full normalization constant 2 is clearly a simple product of such constants, one for each k vector, thus Z=

fl z(k)

(5.7)

k

where m

z(k) =

m

dx -m

-m

d y exp[ - I x

+ iy I’/N(k)]

m

= 211

jor dr exp[ - r2/N(k)]

= nN(k)

We now verify the form (5.4) by calculating the variance of W(k). Thus

( I~(k)1’)= (x’

+ y 2 ) = (r’)

Jb

m

= 2n

r3 dr exp[ - r’/N(k)]

r dr exp[ - r’/N(k)]

Equation (5.9) agrees precisely with Eq. (4.30)if we note that the distribution (5.4) has no correlation between the variables for different k vectors.


59

The solution of the reconstruction problem requires a probability distribution of a type different from that of Eq. (5.4). We need {Y(k)1 O(k)}, i.e., the probability that the observed Y(k) implies a given O(k). Elementary intuition requires a close connection with (5.4), since a narrow distribution of 9 - 9 . O would seem to imply a close relationship of the implied 0(k) to the measured 9(k). At this point we use the fundamental theorem of Bayes, concerning inverse probability. Thus

V ( k ) I O(kN = 2- ‘{O(k)) {O(k) IY(kN *

(5.10)

where 2=

j

. {O(k)} . P ( k ) I Y(k)}

(5.11)

The integration is over all possible values of all the O(k), and is hopeless from the viewpoint of practical computation unless the integrand is of very simple form. The functional {O} is a somewhat shadowy quantity whose meaning we now attempt to make clear. It will be called (after Bayes) the prior probability of the specified object structure O(k). We may consider that the procedures used for preparing the samples define an average number of molecular species of various allowed types to be expected on a typical sample of the set. The locations and orientations of these species in a particular sample of the set cannot be known in advance, and it is the task of microscopy to attempt to define these parameters. We can make a simple statement about the statistics of the object set, by considering the quantity

(@(k)O*(k’))

(5.12)

Consideration of the spatial isotropy and homogeneity of the object set requires

(O(k)O*(k))

=Y

( k ) 6(k - k’)

(5.13)

where the angular brackets now imply an average over the object set. It is usually trivial to subtract off (O(k)) from O(k) itself, so that without loss of generality, we can take the distribution of O(k) to have zero mean. In some hypothetical simple cases, the absence of correlation implied by (5.13) may actually persist for higher moments of 0, but in practical cases the simplicity of an uncorrelated normal distribution, which obtained for g ( k ) is not to be expected for O(k). We nevertheless proceed by assuming the simple normal distribution to hold. Thus, we propose to write (5.14)

60

T. A. WELTON

with an obvious normalization constant required to convert the proportionality to an equality. We now take seriously the relation (5.10), with insertion of (5.4) and (5.14). The result can be beautifully simplified because of the several normal distributions which are being compounded. Thus

with a suitable normalization again required. Any function of the 9 ( k ) alone can be factored out, to be absorbed in the normalization, and the essential 0 dependence will be left as

{f(k)l 0(k)) exP[- w / 2 1

(5.16)

with

u = ( 9 - 1 + M-1P’)1012

+9

- &--9(9* * 0

*

0*)

= ( 9 - 1 +M-~P’)[plz-Jlr-lP

+ Jlr-19q-1(9* 0 +3 = ( 10 - (0)I2/A’) - (1 (0)I2/A2) x

(9-1

*

*

0*)] (5.17)

The vector k appears as the argument of every quantity in Eq. (5.17) and is therefore conveniently omitted. The quantities (0) and A are given by (0)= 9.9q.4P’ ”

+ &”)-I

A = (91 + SzJlr- 1)-

1/2

(5.18) (5.19)

The term in I (0) 12, left in U after completing the square, is a function of 9 only and can be dropped by absorbing it in the overall normalization constant. The drastic simplifications thus far introduced have clearly led to some simple results; their utility is yet to be determined. We note that (O(k)), from (5.18),is a mathematically simple estimate of the most probable object function following from a specified image, while A is an equally simple expression for the r m s error present in this estimation. Both forms are plausible. Consider Eq. (5.18) for a range of k such that the statistical error Jlr is very small. We then obtain

(0(k))

N

P - ’ ( k ) .3(k)

(5.20)

which corresponds to a naive attempt to compensate for the attenuation of the various Fourier components introduced by the imperfections of the optical system. Such an attempt must fail and is in fact always frustrated by


61

noise. We rewrite Eq. (5.18) as ( 0 ) = [1+

( N / Y 9 2 ) ] 9 - *13

(5.21)

where the ratio (5.22) plays the role of a dimensionlessnoise figure (noise/signal),which becomes overwhelmingly large as we approach any zero of 9(k). The rms object spread A correspondingly approaches 9”12, as it should, when the noise figure (5.22) becomes large. For values of the noise figure small compared with unity, on the other hand, the object spread A can be very much less than the limiting value Y1/2. These observations correspond nicely to our expectation that only where 9 ( k ) is sufficiently different from zero can any sharpening of our prior object distribution be expected. An extremely important concept, easily introduced at this point, is that of the informational content of a micrograph (Welton, 1969,1971b).On this we follow the work of Fellgett and Linfoot (1955), who were first to apply the now standard ideas of information theory to optical images. We consider the normalization integrals required for the prior object distribution (Z,) and that for the distribution which holds as a result of the micrograph. Thus Jtr/Y92

Z o = dO exp and

[

[

-

k

IO(k) - (O(k))

Z = dO exp k

1

(5.23)

12/2A2(k)]

(5.24)

C 1 O(k)I2/29(k)

The concept of information content of the micrograph is subject to a difficulty of the same sort encountered in giving a classical definition of the entropy, and we will here also content ourselves with computing the increase in sample information resulting from the micrograph. Thus, we write

1

11

(5.25)

and analogously for I, using {9IO } for the distribution appearing in the integral. As in the usual statistical mechanical derivation of the entropy, we obtain (5.26) I0 = -1% z o - C ( IO(k) 12)/9(k) k

I = -log Z - C ( I O(k) - (O(k)) I2/A2(k) k

(5.27)

62

T. A. WELTON

where the averages indicated on the right are easily done. The results are 10

= -log

zo - c

I = -1ogz-c

(5.28) (5.29)

where the constant C is infinite (or at least uncomfortably large) but is the same number for both distributions. Finally, we write 61 = 1 - 1 0 = log(Zo/Z)

(5.30)

for the information content of the micrograph and proceed to evaluate Zo and Z, using the result (5.7). We obtain

zo = fl [nY(k)]1/2

(5.31)

k

Z=

n [KA~(~)]”~

(5.32)

k

each of which is wildly infinite. We finally obtain for 61

(5.33) =

c l o d l + 92(k)Y(k)/-4’-(kll

3k

Note the use of the exponent 4 to allow use of an unrestricted sum over k values. Finally, we pass to an integral over k, so that 61 = (48~’) dk log[l

+ 8’(k)Y(k)/-4’-(k)]

(5.34)

and we have a natural definition for information density (61/A),A being the area of the micrograph. It is now of considerable interest to evaluate Y ( k )for a simple class of object, in order to see a little more of the meaning to be attached to Eq. (5.34). Consider the object set to be that in which each micrograph, of area A, is known to contain a single atom of known species, but with completely uncertain location. We then ask for the positional accuracy which can be achieved by study of a single micrograph. Consider Eq. (2.30) for B(k) in terms of the electron scattering amplitude F ( k ) and the atomic position. An obvious modification is required to convert (2.30)to the form appropriate for the discrete Fourier series representation and we obtain O(k) = (h/mcfl)A- ‘ F ( k ) exp( - ik x,)

(5.35)


63

The average over the object set here means averaging over x, ,the position of the single unknown atom. Thus Y ( k )= A - ’

dxA(h/mcfi)’A-’F’(k) A

= (h/mcfi)’A-’FZ(k)

(5.36)

Equation (5.34)now becomes, using Eq. (4.32)for A’”&),

1

61 = ( 4 8 ~ ’ ) dk log[l

+ N , A - ’ ( h / r n ~ B ) ~ F ’ ( k ~ ~ ( k )(5.37) ]

This equation simplifies if we assume A

4N,(h/mcfi)’F2(0) % 600 A2

(5.38)

where N , = 500 electrons/i(’, the beam energy is 100 keV, and F is taken to be that for mercury. The logarithm in the integrand of (5.37) can now be expanded, so that

j

00

61 = ~,(4n)-’(h/mcfi)2 k d ~ ( k ) ~ ( k )

(5.39)

0

with the assumption that 9 depends only on k( = I k I ). We now consider the information increase corresponding to localization of the atom within area 6 A . The probability of fmding the atom within any cell of area 6 A is just 6A/A. As a result of analysis of the micrograph, the probability becomes unity for the cell actually occupied, and zero for all other cells. The information change is then

61=1*log1+o~logo+o*logo+ - (A/6A) *

( 6 A / A ) log(GP/A)

(5.40)

= log(A/GA)

where 0 . log 0 = lime+oE log E = 0, and the factor A / 6 A is simply the total number of cells over which the summation is to be performed. Finally, we define an effective accuracy of location 6x = (6A)’/’, given by 6x = A’/’ exp( -61/2)

(5.41)

Some typical numbers will be given for 6x shortly, but we first wish to emphasize the utility of the simple expression (5.39)as a convenient measure of microscope performance. We should also emphasize that while the quality of conventional imaging depends on the range of k in which 9 ( k )has no sign change, the criterion (5.39)has no such requirement. We should then think

64

T. A. WELTON

of 6x, as given above, as the true resolution parameter, capable of being realized by suitable processing of the micrograph, even though the image is badly blurred by sign changes in B(k). This informational approach to the definition of image quality was the original motivation for the work reported in Welton (1969). The recognition that image quality in a real sense does not directly depend on absence of aberration leads immediately to the question of how to extract in a useful fashion the full information content of a blurred micrograph. A parallel question was also immediately asked, namely, how to design a microscope with the best possible informational performance. The result of these considerations was the so-called high-coherence microscope (Worsham et al., 1972, 1973). These same considerations have clearly been explicitly or implicitly important in the work of Siege1 (1971), and of Chiu and Glaeser (1977). The essential consideration in microscope design is the reduction of the effect on P(k)produced by instabilities and incoherence. Thus, the form of the envelope exponent E(k) given in (3.12) imposes the necessity for adequately small rms spread in focal length and small illumination angle. These matters have recently been carefully discussed by Chiu and Glaeser (1977). In view of their considerations, it would seem probable that the most important gain yet to be made may lie in the reduction of the chromatic aberration coefficient by use of the composite magnetic and electrostatic objective proposed by Rose (1971). The traditional effort to eliminate the primary spherical aberration now appears less important in itself, although not without a point, as will be seen. Unfortunately, such an elimination of the aperture defect remains a formidable task. Similarly, a substantial increase in beam energy seems to be less important in its own right than as another possible method for reducing the information loss caused by chromatic aberration. We do not wish to give an extensive discussion of radiation damage, but here note that for given BZ(k), 61 is simply proportional to N , //Iz.The damage produced by the illumination is (N,/f12)f(/I), where f(/I) increases very slowly with energy up to about 500 keV, and then more rapidly. In other words, the ratio of 61 to damage decreases only moderately with energy, until the electrons become relativistic. The computation of 61 is easily extended (Welton, 1971b) to a case of much greater practical interest. An atom of interest will of necessity reside on a substrate of some sort. The substrate will not normally be resolved into its atoms, but will constitute an important contribution to the noise level of the micrograph. We assume as substrate an amorphous carbon film of thickness t A. We further assume the atoms of the carbon film to be randomly distributed (not strictly true, of course, but probably not in serious


65

error, for present purposes). We return to the basic imaging equation (4.35) and rewrite it as Y(k) = B(k) . O(k)

+ 9(k)

+

O,(k) 9 ( k ) (5.42) where O(k) is the object function for the atom of interest, and Os(k) is the object function for the substrate. Previous results will be unchanged on making the substitution *

+ B’(k)( I Os(k) 1)’

N ( k )= (N,N’Q’)-

(5.43)

The procedure leading from (5.35) to (5.36) immediately yields

( I o,(k)

1)’

= (N’QZ)-’(h/~~B)’Nc~’(k)Ff(~) (5.44)

where F,(k) is the electron scattering amplitude for carbon, and N,( = t/lO) is the number of substrate carbon atoms per A’. We proceed to give a small tabulation (Table I) for the two standard cases described earlier in this section [conventional (CONV) and high coherence (HC)]. We consider substrate thicknesses of 0, 5, and 10 A, and TABLE 1

Case CONV HC CONV HC CONV HC

t

(4

(A)

61

bx = 512 exp(-61/2)

0 0 5 5 10 10

32.3 63.9 5.6 12.1 3.3 7.0

5.0 x 7.0 x lo-’* 3 1.0 1.2 99.0 15.0

assume that a single mercury atom is to be located within a field of 512 x 512 A’. For mercury, we use a simple but adequate approximation for F(k), namely, (5.45)

We see that the approximate doubling of 61 achieved by passing from the conventional case to the high-coherence case has a striking effect on the potential resolution dx. Note also the very serious degradation produced by even very nominal substrate thickness. The very small values of 6x obtained for t = 0 are, of course, meaningless (aside from the unavailability of such samples) unless suitably small pixels (Q < 6x) are used. We return now to the expression (5.18) for the most probable object function. The derivation given is suggestive, albeit based on a seemingly

66

T. A. WELTON

crude assumption for the prior probability {O(k)}. In this regard, it is of considerable interest that the same formula follows by application of an argument originally given by Wiener (1949).* Wiener's argument seeks to find the convolution on the image that yields an estimate of the object function with the smallest possible mean squared deviation from the true object function. It is assumed that the object set distribution is uncorrelated with the noise distribution, and the squared deviation is assumed to be averaged over both distributions. We also note that Wiener's derivation was originally given for the case of an electrical signal, in an accurately linear system, with strictly additive gaussian noise. In addition, he proposed realizing his filter with a passive circuit, so that the desired convolution was required to be over the past history only of the corrupted signal. Modern communications usage takes full advantage of storage and computation to allow a realization fully analogous to that expressed by Eq. (5.18). Accordingly, we define W(k) = Y(k) * P(k)[Y(k)PZ(k)+ N(k)]-'

(5.46)

and make the quantity W(k) the central concept of the reconstruction algorithm to be tested. A further specialization must be made, in order to achieve practical results. We must have a simple standard assumption for Y(k), which can be an extraordinarily complicated object if its definition is taken literally. As discussed further in Appendix A, optimal use of a micrograph requires that Y(k) takes into account all prior information on the probable numbers of various molecular species present, as well as available information on bond lengths and angles. We here content ourselves with a minimum of such information, namely, the probable numbers of atoms of various species present, assuming each atom to be randomly distributed over the field of the micrograph. The result will be Y ( k ) = (h/mc/3)2(N2Q2)-1 NaF:(k)

(5.47)

0

where N o is the number of atoms of species a per square angstrom and F,(k) is the electron scattering amplitude for an atom of species a. In actual practice we will take advantage of the qualitative similarity in shape of the Fa&)curves for all 2 values, for the k range of interest, and define the sample by an equivalent number of carbon atoms. Thus ~c

=

C Na(F,Z(k)/F,Z(k))

(5.48)

a

* Rohler (1967) has given a proof directly applicable to the optical image problem.

67


with the indicated average probably best defined as knur

c a

j”

=

(5.49)

d ( k 2 )~ ( k ) / ~ ( k )

0

The values of the Caare not strongly sensitive to the value of k,,, but for illustrative purposes, we give some typical values appropriate for 1-Aresolution (k,, = K A- ’) (see Table 11). TABLE I1 a

C

N

0

P

Br

Hg

Th

C,

1.oooO

1.1878

1.2765

3.1100

9.9092

28.9600

34.9200

It is apparent that there is no serious danger that we have thus overestimated the available prior information. In fact, almost all the atomic arrangements considered as possible in (5.47)could be ruled out by considerations of bond lengths, repulsion radii, and similar information. It will, however, become clear from Appendix A, and from the discussion of Section VII, that any attempt to build into Y ( k )more detailed prior information will incur severe computational difficulty, as well as serious danger of artifact production. It is finally not at all obvious that the crude assumption (5.47)is sufficiently refined to give a useful result, and it will only become clear from the evidence of Section VII that real progress is possible. VI.

NUMERICAL TESTS OF THE RECONSTRUCTION ALGORITHM

PROGRAMS FOR

From the work of the preceding sections, it is a simple matter, in principle, to set up simple numerical tests of the Wiener reconstruction algorithm, as modified by the assumption (5.47)for Y ( k ) .Several such tests have been made (Welton, 1971a), using computed bright field micrographs of simple objects as the input data, with promising results. Several tests (Welton, 1975; Welton et al., 1973)have, in addition, been made using actual micrographs, without notable success in the first case, but with modest success in the later attempt. It is unfortunately fair to say that no conclusive proof (or disproof) has yet been given of the practical utility of the Wiener reconstruction algorithm, and the present work accordingly had for its principal motivation the evaluation of that algorithm. As a starting point, it was decided to use a group of computer programs developed by the author (Welton, 1974;Welton and Harris, 1975)for use in

68

T. A. WELTON

processing actual micrographs. The flexibility and generality of this system of programs recommends it as a basis for careful testing, particularly in view of the success it has yielded in the handling of several simple micrographs (Welton, 1975).The author has, however, experienced considerable difficulty in procuring micrographs taken under adequately controlled conditions, and it was thought advisable to conduct the present tests on easily controllable synthetic micrographs. We accordingly describe in useful detail at this point the computational procedures followed.* These procedures are to be thought of as job steps (in the IBM 360 sense), and they should be cataloged in the disk system of the computer so that they can be conveniently invoked in various combinations. The first such job step is given the name OBJECT, with the task of supplying the atomic coordinates for the desired sample, exclusive of substrate. One available option positions the atoms for a DNA double helix having a size of 600 nucleotide pairs (Plate la). This molecule is built around an axis that is smoothly bent into a space-fillingcurve occupying a roughly rectangular plane region approximately 240 x 280 A. The molecule is about 20 A thick and is assumed to be placed on a carbon film no more than 10 A thick, so that a single defocus value can be used for all atoms of the sample, at the assumed pixel size of 1 A [cf. Eq. (2.3911. The axial displacement of each atom is actually calculated and transmitted by the program, so that relatively thick objects can be studied, if desired. The micrograph to be produced will have 512 pixels in each direction, this number being large enough to allow interesting results and small enough to keep costs down. The reconstruction programs have been tested at size 1024 pixels, and size 2048 pixels could be handled with only minor changes. To allow for a range of pixel sizes and numbers, the atomic coordinates are actually calculated and transmitted to an accuracy of 1 part in 4096, although the subsequent job step may not use such accuracy. Note that for a picture 1024 x 1024 8, the available position accuracy would be 0.25 A, which certainly cannot be resolved. In the sample shown in Plate la, a mercury atom has been substituted for the phosphorus atom of each phosphate group, This replacement is naturally not intended to be chemically realistic, but does indicate the sort of atomic spacing and identification with which we would like our methods to deal. Another version of OBJECT produces text (Plate 4a) composed of letters, each of which is represented by a dot matrix whose dots are single atoms of thorium, mercury, bromine, phosphorus oxygen, nitrogen, or These programs have all been carefully optimized and rather fully tested, and can be made available on request to any interested investigator. Unfortunately, proper optimization goes outside the usual Fortran language and has only been done for the IBM 360, 370 system. Optimization for another system should not be difficult.


69

carbon, as desired. The separation of these dots is chosen as 2 A in Plate 4% to make objects that are directly recognizable on a fairly coarse scale as Roman capital letters, while on a finer scale, the individual atoms may become detectable. In both versions, as many as 25,200 individual atoms can be accommodated, so that rather complex and interesting objects can be produced, by simple modifications of the basic program. A second job step (IMAGE)carries out the formulation of the preceding sections to produce I ( x ) in the form of a file containing density information for the computed bright field micrograph. The input is the file containing the atomic locations produced by OBJECT, plus the parameters describing the microscope used and the conditions of exposure. The output file will normally have one byte (0-255) integers for the various pixels, arranged in 512 records (scan lines) of 512 bytes each. The statistical error would have to be incredibly small for greater precision to be required. Considerable simplification is required in the description of the atomic scattering amplitudes, because of the complexity of the problem. It has been found that a reasonable approximation is the following

~ , ( k= ) A,/(k2

+ a 2 )+ ~ , / ( +k ~B2)

(6.1)

where it is essential that a and fl be independent of the atomic species a. The representation (6.1) is not required to be accurate over all k, but only over the range corresponding to the desired resolution. We have followed the practive of choosing a and /3 to allow a reasonable shape difference between the two terms and then adjusting A, and B, to yield precise values of F,(k) for k = 0 and k = ~tA-'. Because of the reasonable shapes of the two terms, the resulting fit is quite good, certainly more than adequate for the exploratory purpose we have in mind. Table 111 lists the values chosen (a = 3.0781 A-',

B=

3.9738 A-').

TABLE 111

C N

0

P Br

Hg Th

26.1966 10.6589 0.0034 88.8238 58.4816 143.5691 429.9931

-4.8222 17.0567 31.3891 - 62.0841 14.7710 -29.1951 -375.9774

(4 300

(4

PLATE1. (a)Object-DNA, 300 A x 300 A. (b) Image-EMPTY-HC,10 A-NORMAL, 300 A x 300 A. (c)Image-DNA-COW, 5 A-NORMAL, x 300 A. (d) Diffractogram, EMFTY-HC-10 A-NORMAL, 271A-' x 271A-'.

A

72

T. A. WELTON

The first task of IMAGE is to calculate two object functions, by use of the given atomic coordinates and the A, and Ba values. Thus

C Aa S(X - xa) = C Ba S(X -

OA(x) =

a

O,(X)

Xa)

(6.2)

a

For simplicity, it is assumed that each atom is repositioned to the center of the pixel in which it lies, an assumption that cannot introduce serious error at the resolution level implied by the assumed pixel size. The sum over atoms is to include the sum over all substrate atoms, which are introduced by taking (with the help of a random number generator) the number of substrate carbon atoms in each pixel from a Poisson distribution, with mean determined by the assumed thickness of the film (a lo-A film is assumed to have an average of 1 carbon atom/A2). The two real functions (6.2) are taken to be the real and imaginary parts of a complex object function, which is then Fourier transformed (fast Fourier transform). From the transform of this complex object function, it is simple to extract the separate transforms O,(k) =

C A, exp(-ik

xa)

a

O&) = C Ba exp( - ik x,)

(6.3)

a

From these it is a simple matter to form the full object function (A = NZQZ= area of sample) 6(k)= (h/mc/l)A-

C Fa@)exp(-ik

*

xa)

a

= (~/mcS)A-'"A(k)/(kz

+ ".I

+ PB(k)/(k2 + S'll>

(6.4) The amplitude function d ( k ) is then formed by introducing the complex instrumental MTF b(k), as implied by Eq. (3.16).Thus

@) = exp[ - W

l exP"k)l

(6.5)

and d ( k ) = ib(k)* O(k)

(6.6) Note that the amplitude (6.6)has been averaged over the energy and angular spreads of the illuminating beam. At this point, we find the Fourier transform of d ( k ) ,yielding (A(x))= i

k

b(k)O(k) exp(ik x)

(6.7)


73

which is just the discrete analog of Eq. (3.16). We then obtain (I(x)) [or rather the estimate given by Eq. (3.1511 The final step is to form I(x) according to Eq. (4.15),with R(x) drawn from a normal distribution, again with the help of a random number generator. More explicitly, we take I(X)

= (I(x))

+ (N,Q2)-'/2[1+ (I(x))]'/~ . r(x)

with r ( x ) a random function, without correlation, and with normal distribution for each x value. Thus (r(x)) =0

(r(x)r(x')) = S(x - x') and the distribution function for r is just P ( r ) = (274- 1/2e-r2/'

(6.10) (6.11)

Finally, the function I(x) is tested for minimum and maximum values, and a one-byte integer (0-255) calculated from

(6.12) B(x) = 255[1(x)- Iminl/(lmax - Imin) and output as the previously defined disk file. The input parameters for IMAGE include the complete list of parameters thus far introduced, plus one other artificial, but rather useful, constant. In order to make convenient checks of the importance of the quadratic term in Z(x), all scattering amplitudes can be multiplied by a factor FCV, and at the same time the electron dose N, is divided by (FCV)'. This has the effect of leaving all pictures unchanged (signal/noise unchanged) if the quadratic term is negligible. The third job step is named XFORM, and has the simple function of obtaining the Fourier transform of a disk file in the format produced by IMAGE. The resulting transform file consists of N 2 ( N = 512) eight-byte complex numbers, and is therefore too large to be conveniently saved. Instead, it is passed to subsequent job steps as a scratch file, to be deleted at the end of the complete job. The step XFORM is the starting point for three important procedures. These all have as a probable end the production of a two-dimensional display from a data file in the standard format produced by IMAGE. The job step for display is called PLOT and takes considerable advantage of the useful characteristics of a rather ancient cathode ray tube plotter (Calcomp, Model 835). A Fortran-callable subroutine generates a tape that will direct the electron beam to any desired pixel and produce the desired optical

74

T. A. WELTON

density, This process is surprisingly economical and has produced all the plates in this article. Some tests by the author indicate that use of one of the more modern film writers (Perkin-Elmer Model 1010A, for example) would produce neater results, with an unfortunate capital cost (or rental fee) attached. One available procedure passes the transform file from XFORM to a job step XPLOT, which has the function of forming a new file, which is essentially the absolute square of the transform file (put on a logarithmic scale to avoid some major uncertainties in scaling). This new file is then rescaled and output as a new file in standard one-byte format. With PLOT as the third job step, a display is produced of a function which is essentially the diffractogram of the starting data. A second procedure follows IMAGE directly by PLOT to display the Z(x) file, as a synthetic micrograph. A third procedure has as its purpose the determination from a given micrograph the best values for the instrumental and exposure constants required to compute (O(x)), the Wiener estimate for the object function. These job steps are called SPEC (power spectrum) and WIENER. These are sufficiently described in Appendix C, and suffice it to say at this point that SPEC has the transform file from XFORM as its input and WIENER has as its output the constants required to compute W(k). A fourth procedure passes from the transform file to another job step FILT (Jilter) which accepts as input the constants for W(k),as well as the transform file Y(k), from XFORM. It constructs a new file W(k)Y(k),and Fourier transforms it to produce (O(x)). This file is then put in standard one-byte form and output for use as input by PLOT. AND DISCUSSION OF DATA VII. PRESENTATION

We are now ready to describe and discuss some typical results obtained by application of the formalism developed in Sections 111, IV, and V, together with the programs of Section VI and Appendix C. We summarize the basic assumptions. The objects studied consist of one of three ordered arrays of atoms mounted on one of three substrates. The ordered arrays are designated as DNA, TEXT, or EMPTY, while the three substrates consist of random carbon films of nominal 0, 5, and 10 A thickness. Each micrograph consists of a square field 512 x 512 A, with a pixel size 1 x 1 A. The full field may not be displayed, the actual displayed area being indicated in the caption. The arrays called DNA and TEXT are described under the job step OBJECT in Section VI, and they are displayed at high resolution in Plates l a and 4a, respectively. The array called EMPTY has no atoms present, and the corresponding display for it would be simply a blank


75

square. The two parameter sets describing the conventional microscope (CONV) and the high-coherence microscope (HC) have already been listed in Section V, and the electron dose N, has been uniformly set to 500 electrons/A2.Finally, the cases are distinguished by the value chosen for the parameter FCV, the value 1.0 being designated as NORMAL (some of the quadratic terms being included approximately correctly),while the value 0.1 will be designated as LINEAR (the quadratic terms now being artificially reduced by a factor of ten with respect to the linear terms). The two displays named “ object ” are in reality very special images in which the highest possible resolution is provided for. The substrate thickness is taken as zero, all aberration coefficients are made to vanish, the electron illumination N, (electrons/AZ)is taken to be an extremely large number, and a phase shift of 4 2 is inserted in the source wave (as though a retarding film of suitable thickness were inserted in the center of the back focal plane) in order to produce a bright field image intensity in the absence of aberrations. The display then would contain all the available sample information if the resolution of the display system were adequate. Comparison of the displays of a 300 x 300 A square with a 100 x 100 A square suggest that thedisplay system is not yet limiting at 300 x 300 A. The individual mercury atoms are in fact quite apparent where they extend (in projection) to the outside of the double helical structure. Note also the easy visibility of the base planes, which are seen edge-on in the straight sections of the molecule. The curved sections show peculiar phenomena arising from the fact that the 33.4 A required for the helix to complete one turn about the helical axis is a substantial fraction of the 50 8, required for the shorter radius turns and 150 A required for the longer radius. At the 512 x 512 A level chosen for the TEXT displays however, it is apparent that the display resolution is limiting. As elsewhere described, the characters are composed of atoms located on a 2 x 2 A lattice, so that in horizontal or vertical lines, the atomic spacing is 2 A, while in the diagonal direction, it will be 2.82 A. Inspection with a magnifier shows (in the originals at least) no significant discrete structure for horizontal and vertical lines, while a definite discrete structure is seen for the diagonal lines. The displays labeled “image ” are actual computed micrographs with sample, exposure, and instrumental conditions as indicated in the captions. The displays whose captions refer to “original image” are the result of an attempted object reconstruction based on the indicated micrograph, and using a Wiener function W whose origin is indicated in the following parentheses as “theory,” or a plate number. If a plate number is given, then W was actually calculated (by use of SPEC and WIENER) from the indicated micrograph. We are now ready to begin an orderly presentation of the evidence. Plate

76

T. A. WELTON

l b shows the typical “phase grain” seen in bright field micrographs of amorphous films. The sample has one carbon atom per pixel on the average, 0, 1, and 2 being then the dominant occupation numbers. It is important to note that no structure resolvable with the stated instrumental conditions is present, and we see only a badly blurred image of the atomic array forming the substrate. This micrograph is probably best characterized by its diffractogram, as displayed in Plate Id, or the computed smoothed function shown as the “high-coherence” curve in Fig. 1. The connections between the features of the radial plot of Fig. 1and the circular pattern of Plate Id are quite apparent, but we must again emphasize the extremely noisy character of the diffractogram. This noise originates primarily from the random locations of the substrate atoms and secondarily from the statisticaI error imposed by the 500 electrons/A2illumination. For later comparison, we show in Plate l c the conventional image of the DNA object placed on a 5-A carbon substrate. Atomic resolution is naturally not present, but examination with a magnifier reveals features with an extent of about 3.5 8, which appear to result from the merging of two mercury atoms which occasionally lie close in projection. Less prominent features can be found with an extent of roughly 2.0 A, which is reasonably consistent with the curve labeled “ conventional” in Fig. 1. In Plate 2a, we display a high-coherence micrograph with real content. The apparent resolution is very poor, as would be anticipated from the phase reversals seen in Fig. 1. The first such reversal in fact comes at k = 0.82 A, which would correspond to a S value of lt/0.82 or 3.8 A. As noted in the caption, the micrograph computed with FCV = 0.1 (LINEAR) has an appearance indistinguishable from that of Plate 2a. Substantial differences do, however, appear in the portion of the diffractogram with k 5 1.0 A-i, and it was accordingly felt important that the effect of the approximated quadratic terms on the object reconstruction should be studied. In Plate 2b is displayed the result of processing with the W(k) which results from the use of the job steps SPEC and WIENER (Appendix C). Note the unfortunate reversal of the gray scale, which somewhat hampers comparison with other reconstructions. At this point, it will be useful to discuss briefly the calculation of W(k) by the adaptive procedure (Appendix C) and compare the results with the W(k)obtained more accurately, but less directly. Figure 2 displays the function 9(k) obtained by SPEC, from the image of Plate 2%plotted as a function of k along four radial lines. The plots are somewhat crude but illustrate clearly the difficulties involved. The four plots coincide reasonably well for k 2 0.6 A-1, and have generally the shape of the idealized curve of Fig. 1. The complex behavior for k I0.6 A-’ can be shown to be due entirely to the detailed object structure (DNA organization of the object atoms, rather than some other arrangement of the same atoms) and cannot easily be used in a statistical investigation.The irregulari-


77

ties that remain for k > 0.6 A-’ do not prevent a reasonably accurate estimate of the essential parameters for the construction of W(k). A better estimate can be obtained from the image of the substrate alone (Plate lb), which completely lacks the complex behavior at small k, and which yields better agreement of the four radial plots for larger k. Finally the true function W(k)can be calculated from Eq. (5.46) and the parameters listed for the high-coherence instrument. By simple division, we write in a slightly - - more convenient form

where +(k) is assumed to have the standard form defined by Eqs. (2.20), (2.21), and (2.22). The function F,(k) is the amplitude for electron scattering from carbon, and the form (5.47) for Y ( k ) has been assumed, with the additional simplification embodied in Eq. (5.48). The constant C is a composite of the equivalent carbon atom count N , and the electron count N , , and the envelope exponent E(k) is given the general form of Eq. (3.12),although the special form (3.14) suffices for our assumption of rotational invariance. For purposes of easy comparison, we write 4 and E in a simple standard form W x , k y ) = 41ki+ 4 2 k x k y + 43k,” + 44k4 (7.2) E ( k x , ky)= Elk: E2 kx ky + E3k,” (7.3)

+

+ k2(E4k: + E,k,ky + E,k,”)+ E,k6

and present the values of the A i , B i , and C in Table IV. TABLE IV Case

Constant

2a

lb

4.9166 0.0028 4.9319 -0.5157 -0.5236

4.9631 0.0015 4.9690 -0.5197 -0.1196 0.0063 -0.1397 0.06342 -0.00195 0.06721 - 0.00668 0.1388

Theory ~

0.0008

-0.4469 0.1924 O.oo00

0.1808 - 0.01823 0.5 164

4.9062 O.oo00

4.9062 -.0.5110 0.001204 0.000000

0.001204 0.0012336 0.0000000

0.0012336 O.oo005223 0.1072

W 4

(4

(dl

PLATE2. (a) Image-DNA-HC, 10 &NORMAL (LINEAR has identical appearance), 300 A x 300 A. (b) Original image from (2a). Processed with W (2a). Note photographic reversal from other processing results. 300 A x 300 A. (c) Original image from (2a). Processed with W (theory).300 A x 300 A. (d) Original image had same assumptions as (2a), except LINEAR instead of NORMAL. Processed with W (theory). 300A x 300A.

80

T. A. WELTON

4 .O

0.9 0.8

-

0.7

-

+

++

+

>

+:

L

0.6

0

+

VO +

+

00

SMOOTHED DIFFRACTOGRAM

O&+

0

a4m

2

0.5

i)

a+ 0

I + 0

0.4

a X

0

0.3

8 +

0.2

0.4 0

0

0.4

0.8

4.2

4.6

k

2 .o

2.4

2.8

(8-')

FIG.2. Diffractogram densities obtained by smoothing from the Fourier transform of a high-coherence image.

The rotational symmetry of the system is reflected in the equality of r$l and 4~~, E , and E 3 , and E4 and E , ,as well as the vanishing of E , ,and E5.These rules are reasonably well obeyed in the approximate forms (2a) and (lb), the accuracy being considerably better in the 4i than in the Ei . A false anisotropy is introduced into the system by the necessary anisotropy of the object. This anisotropy is anticipated to be markedly greater for the DNA + substrate than for the substrate alone, and this is well borne out, for the 4i at least. It should be clear that 4(k) must be accurately represented because of the requirement that the factor sin 4 in W(k) should strongly suppress contributions to the reconstruction from k values near the zeros of

+,,

4.

The accuracy of the fit to E ( k ) is clearly less crucial, since it is not


81

required to produce any precise cancellations. This is fortunate, since the false anisotropies introduced in the Ei deduced from (2a) and (lb) are substantial. We also note that the detailed values of the Eiin the first two columns bear no obvious relation to the exact values in the third column. This failure reflects the fact that the number of parameters allowed for the fitting process is larger than the smoothness of the data would justify. Nevertheless, the E ( k ) from Plate l b lies close to the true function over the range of k of interest. The variation in the value of C is to be regarded more seriously. This parameter is directly proportional to the noise-to-signal ratio, and the value obtained from the actual micrograph (2a) appears to be significantly higher than the theoretical value given. Since the known ambiguities of the problem suggest only a downward revision of the theoretical C value, the discrepancy between C = 0.5164 and 0.1072 must be regarded as potentially serious. The effect produced in reconstruction by use of too high a C value is to reduce unduly the use of image information near the zeros of +(It), and it is reassuring to fmd the essential features of the displays of (2c) and (2d) present in (2b) as well. A word is necessary concerning the assumptions used in obtaining C theoretically in the various cases considered. For a simple carbon substrate of thickness (A), we have

N, = t/lO (carbon atoms/A*)

(7.4)

which corresponds to a total of 262,144 carbon atoms distributed over an area of 512 x 512 A’. The DNA molecule contains approximately the equivalent of 50,000 carbon atoms [Eq. (5.48)and Table 111, and TEXT contains the equivalent of 122,000 carbon atoms. In Table V, we give the resulting theoretical C values for the various micrographs. The values given were not the ones actually used, the discrepanciesbeing due to ambiguities of definition not apparent until late in the process of data collection. Some further discussion of the quadratic effects is now in order. A comparison (not shown) of the estimated Wiener spectra for Plates lb, 2a, TABLE V Plate

3b 3b, except 5 A 2a

4c 5b

C(theory)

C(used)

0.6683 0.1847 0.1072 0.1323 0.08713

0.3010 0.2127 0.1160 0.1128 0.07826

82

T. A. WELTON

and the corresponding micrographs with FCV = 0.1 (the LINEAR cases) show that the quadratic effects are important in the low-k region where the gross irregularities of Fig. 2 are apparent. The irregularities are due to the real object structure, as previously stated, but their magnitude changes with FCV as would be expected from the quadratic terms. The comparison of the NORMAL and LINEAR versions (lb), on the other hand, shows no irregularity, but does have a smoothly varying quadratic contribution. It should be noted that the quadratic terms will not show zeros in the Wiener spectrum at the zeros of +(k), so that their presence can falsify the estimate of the noise level (Appendix C) and hence alter the C-value. This effect will presumably raise the estimated C-values above the theoretical values and plausibly account for the discrepancies noted in the last line of Table IV. The very much improved C value obtained by computation from the micrograph of the substrate alone appears to result from the very much smaller quadratic effects associated with the absence of heavy atoms. It is therefore strongly suggested that a single actual micrograph be divided into two adjacent squares, one containing sample and the other consisting of substrate only. To reasonable accuracy, these will have the same instrumental, exposure, and substrate parameters. The adaptive procedure of Appendix C can then be profitably applied to the square of substrate only. The scattering power of the sample square should be estimated from higher k values (k > 1 A-', say), and in this way reliable C values should be experimentally obtainable. These questions clearly deserve more detailed investigation than has been permitted by the time and cost limitations of this study. We are at least prepared to state empirically that the uncertainties in the C value discussed above do not fatally affect the results of reconstruction. This is clear by detailed inspection of Plates 2b, c, d, and 3a. Comparison of Plates 2c and d in addition demonstrates that the quadratic terms do not seriouslyinterfere with reconstruction, at least with the thin and relatively weak samples considered here, We now note Plate 3b, which is the only case of a reconstruction attempted with a C value substantially smaller than the correct theoretical value. This discrepancy, of more than a factor of two, will cause k values too close to the zeros of 4 to be used in the reconstruction. The resulting amplification of noise close to these contours produces a characteristic artifact consisting of fringes having spatial frequencies for which 4 vanishes. It may be of interest in this connection to display W(k) in some detail, together with some properties of its Fourier transform, the Wiener kernel W ( x ) .Figure 3 shows a plot of W(k),calculated with the HC parameter set from Eq. (5.46). The absolute magnitude is plotted for convenience, a sign reversal being understood at each dip to zero. The values assumed by


83

c$ at each of these zeros is indicated. The absolute magnitude shows minima , a rise as k approaches near points where C#I is an odd multiple of ~ / 2 with a multiple of K. This rise is terminated by a catastrophic fall to zero (and reversal in sign) as the available signal strength falls below the noise level. Consider the Wiener kernel

5

W ( r )= ( 2 ~ ) - dk ~ W(k)e-ik ’

(7.5)

where W ( k ) is given by (5.46), which can be rewritten as W(k)= l/4[exp( - E ) sin C#I

+ i ( N / 4 Y ) - ‘I2] + c - c

(74 We see that W ( k ) has a series of simple poles, each near a value k , of k which makes c$ equal a multiple of K. These poles are displaced off the where real axis of k by an amount proportional to (N/9’n)-1’2, 9, N Y ( k , ) . More precisely, W(k) can be represented as a sum of simple poles W ( k ) = C Rn(k2- (:)n

+c

*

C.

(7.7)

where (, is approximately given by Cn

= kn

+ iy,

and

The residues R, are of no importance to our argument but the values of the y n , the imaginary displacements of the poles, are in fact central. If the Fourier integral (7.5) is performed on W ( k ) in the form (7.7), a sum of terms will be obtained of the form (7.10)

with (7.11)

The function Hi*?is the usual Hankel function with 1 or 2 being taken (according to the sign of yn) to force an exponential decay of W, for large

(4

(4

PLATE 3. (a) Original image identical with (2a), except LINEAR. Processed with W (lb). 300 di x 300 di. (b) Image-DNA-HC-O,&LINEAR, 300 di x 300 di. (c) Original image from (3b). Processed with W (theory). 300 A x 300 di. (d) Original image identical with (3b), except 5

A substrate. Processed with W (theory).300 di x 300 di.

86 4

3

5 2

z

1

0

FIG.3. The Wiener function for reconstruction of a high-coherence micrograph.

r. Thus, W,(r) behaves at large r as

-

(7.12) W,(r) R, r - ‘I2 exp( L ik, I ) exp(- Iy, Ir ) The characteristic lengths Iy, I - are also displayed in Fig. 3. The k, values are of interest in defining the periodicities of artifacts which can appear in reconstructions for which inappropriate noise figures have been used. The lengths ly,,l-’ are of importance in deciding how large a “frame” is required around an area to be reconstructed. We would anticipate that a micrograph of side S would allow reasonable reconstruction of a smaller square of side S - 2y,$, ,where yminis the minimum of 1 y, I. We have not encountered this limitation in the computations here presented, because of our assumptions of periodicity for the data, but in practical work, the data square used for Z(x) should be substantially larger than 2yii: if gross inefficiency is to be avoided. Another possible procedure for reconstruction would perform the convolution of W with Z directly, without recourse to Fourier transformation. This direct convolution can, of course, be more efficient if yiif is relatively small compared with the size of the data square, but the author knows of no real test of this method. Finally, comparison of Plates 3c and d with the preceding recon-

’


87

structions shows the obscuring power of the substrate. A simple estimate, using the ideas of Eq. (5.43) and (5.44), indicates that the noise introduced by 1 A of carbon is roughly equivalent to that arising from an illumination of 500 electrons/A2 and accordingly, 10 A of carbon corresponds to about X, electrons/A2. Plates 4 and 5 make use of the object TEXT [shown in (4a)] to test in more graphic form some of the foregoing conclusions. In (4b) is shown the conventional micrograph of TEXT on a 5-A substrate, while (5a) shows the effect of a lo-A substrate. No discrete dot structure is ever visible, and lines of dots appear as lines of about 2 A width. The effect of the substrate in blurring the object is quite apparent. With high-coherence imaging of TEXT, in (4c) and (5b), we see an image apparently devoid of meaning. The rectangles corresponding to the individual characters retain their identity, but the aberrations have led to an impenetrable disguise for each character. Note that the image (4c) was formed with the quadratic terms in full force. The reconstruction (4d)is a striking illustration of the potential power of the Wiener algorithm. The characters formed of thorium and mercury atoms are fully visible over a 5-A carbon substrate, while the bromine atom characters are partly discernible. The lines of the individual characters have the minimum width of lA, and no real ambiguities are present, in spite of the many opportunities available. Some artifacts are, however, visible, including a slightly distorted rendering of “S,” and it is easy to imagine that a character set two or four times as large could show definite ambiguities. Finally, a pair of micrographs (Sb) of identical appearance were computed, both with 10 A of substrate, but one NORMAL and the other LINEAR. The extremely gratifying reconstructions (5c) and (5d) show a little more obscuration by the thicker substrate, but no important differences ascribable to the quadratic terms. APPENDIX

A. ESTIMATESOF QUADRATIC EFFECTS

We have more or less systematically ignored terms in the image function that are not linear in the object function. The quadratic effects of relativistic origin [Eqs. (2.8) ff.] seem to be always very small at resolution levels presently available, or likely to be attained, and will not be further discussed. Several effects, however, remain as real worries. The first is seen from Eq. (2.13). We define m

q(x) =

1

-m

dz

~ K ( x Z) ,

W 00

(4

(4

PLATE 4. (a) Object-TEXT, 512 A x 512 A. (b) Image-TEXT-COW-5 A-NORMAL, 512 A x 512 A. (c) Image-TEXT-HC-5 A-NORMAL, 512 A x 512 A. (d) Original image from (4c). Processed with W (theory). 512 A x 512 A.

(4

P1

PLATE 5. (a) Image-TEXT-CONV, 10 A-NORMAL, 512 A x 512 A. (b) Image-TEXT-HC, 10 A-NORMAL (LINEAR has identical appearance), 512 A x 512 A. (c) Original image from (5b). Processed with W (theory). 512 A x 512 A. (d) Original image identical with (5bX except LINEAR. Processed with W (theory). 512 A x 512 A.

92

T. A. WELTON

so that, by Eq. (2.13),

O(x) = -i[exp(iq) - 13

(A4

Past this point, we systematically approximated the object function by O(x) = rl

(A.3)

so that our first question concerns the magnitude of the next term in the expansion of the exponential. A seemingly unrelated question concerns the possible error incurred by the neglect of the quadratic term in the image plane intensity Z(x). There is, in fact, a connection between these two questions arising from the requirement of electron conservation. We consider the simplest case of coherent imaging, with no inelastic scattering allowed. The wave function for a typical electron is taken as

$ = A(x) exp(ilcz)

(A.4)

where A(x)= 1

before the sample plane, and

after the sample plane. If no inelastic scattering is present, q(x) will be real, and the intensity of the electron wave is unaltered by passage through the sample. Thus

I 1 l2 = IexP(i?) I2 = I 1 + iq - (q2/2) + ...l2 = 11

+ iO(x)l2

(A.7)

where the quadratic term q2/2 has the obvious function of canceling the absolute square of the linear term iq,when the cross term between 1 and q2/2 is formed. A similar interplay of linear and quadratic terms occurs in the image plane, to enforce overall electron conservation, and an interesting theorem (essentially the requirement of unitarity for the electron-atom scattering matrix) results. Wherever the real amplitude F ( k ) appears, it should have added to it an imaginary term

F , ( k ) = (4a~)-’

a’F(#)F( I k - k I )

(A4


93

Specialization to the direction k = 0 yields F,(O) = (4nrc)-'

5 dk F y k )

(A.9)

which is easily seen to be identical with FI(0) = rc(4n)- 'aT

(A.lO)

where gT is the total elastic scattering cross section, for an electron incident on the atom in question. Considerations of electron conservation dictate that the form (A.lO) will hold in general, if oT includes the inelastic scattering, although the proper generalization of Eq. (A.8) is too complex to be usefully presented here. It can, however, be qualitatively indicated that the contribution to F,(k) arising from inelastic scattering will decrease much more rapidly with increasing k than will the contribution from elastic scattering. This is so because the spatial extent of the inelastic mechanism (outer electrons, mainly, and largely excited by rather remote collisions) is relatively large. In the main body of the text, we have at all times assumed that F ( k ) is purely real, and we now see that examination of the magnitudes of the quadratic terms is, in fact, inseparable from the question of the error incurred by ignoring the imaginary part of F(k). We here give sample results from two methods of investigation. First, we tabulate q(r) for the cases of a carbon atom and a mercury atom on axis (x. = 0; r = I x I). The simple analytic amplitude expression (6.1) will be used, with the constants from Table 111. The function q(r)becomes logarithmically infinite at r = 0, but, as previously, we assume that only the features beyond a reasonable resolution circle will be of importance. We obtain Table A.1 from which it appears that q2 6 q for conditions of interest. Exception would clearly occur for cases where too many atoms are approximately aligned, but this situation is unlikely for thin amorphous samples. TABLE A.1

0.50 0.65

0.0106 0.0057

0.0572 0.0310

We also give some approximate results for F ( k ) and F,(k), again for carbon and mercury. The FIvalues given in Table A.11 are those calculated on the assumption of elastic scattering only. The value of Fl,,-(O) will be increased by approxi-

94

T. A. WELTON

TABLE A.11

k(A-’)

Fdk)

Fi,c(k)

0.0 1.0 2.0 3.0

2.765 2.501 1.944 1.418

0.107 0.105 0.100 0.092

F~s(k)

FI,d

k)

3.120 3.062 2.916 2.683

14.931 13.505 10.498 7.657

(Fl/F)c

(Fl/F)Hs

0.0387 0.0420 0.0514 0.0649

0.209 0.227 0.278 0.350

mately 0.1 A if inelastic scattering is included, with however a rapid decrease of this addition as k increases. The addition is roughly the same for mercury, so that the dominant effect in the heavy element is that arising from the elastic scattering. The impression conveyed by Table A.11 is similar to that of Table A.1, namely, the effect of the nonlinearities is likely to be relatively unimportant, but not negligible. As indicated in Sections V and VI, the effect of the simplest nonlinearity (neglect of the I A 1’ term in I)can be directly checked, and this has been extensively used in Section VII. The modification of the test programs to allow use of exp(iq) instead of 1 + iq is a simple one and obviously should be carried out at an early date.

APPENDIXB. THEWIENERSPECTRUM OF THE OBJECT SET The Wiener spectrum of the object set Y ( k ) is a central concept for the present discussion and deserves a somewhat fuller discussion. We first ask how Y ( k )will reflect prior assumptions about the object set more restrictive than that actually used to obtain Eq. (5.47).As an example, suppose that the sample is known to contain precisely N , “molecules.” A “ molecule ” is a planar array of atoms with accurately known interatomic distances, and each such structure can be translated and rotated (about an axis perpendicular to the sample plane) at random. Thus

c

Nm

O(x) =

K(x - x,,

n=l

or O(k) =

cN-2 c c exp(- ik c exp(-ik n

=

n

=

c U ( k ) exp(ik k

*

x)

(B.1)

K(x - x, , 4,) exp(- ik x)

X

xn)K2

n

=

4,)

c K(s, 4,) exp(-ik -

s)

(B.2)

8

x,)X(k,

cos 4n- k, sin d,, k, sin +,,

+ k, cos $,,)


95

We now form (exp(ik x: - ik * x , ) )

(O(k)O*(k‘)) = n

n‘

x ( ~ ( k +,n ) X * ( k ,

4,s))

03.3) A simple lemma is needed,

where we are to average over all x , and 4,. namely, (exp(ik’ x,)) = d,,, d(k’ - k)

-

so that

+ (1 - d,,,) d(k) 6(k) (~(kP*(k)) = d(k’ - k) C I ~ ( k4,) , 1’ n

(B.4) (B.5)

plus a term which vanishes except for k = k‘ = 0. It will immediately be recognized that the angular average on the right-hand side of Eq. (B.5) must give a result independent of the direction of k, and of n. We then obtain, from the definition (5.13)

jOzn d 4 I ~ ( k4,)

~ ( k=)~ m ( 2 n )1-

12

(B.6)

If, now, we assume specific atomic species and coordinates for the structure of a “molecule,” we obtain X(k,4 ) = (h/rncS)A-’

a

Fa(k) exp( - i k

- xa)

03.7)

where k can be taken as the vector (k cos 4, k sin 4). We then find

’

Y ( k )= N,(h/mc/3)2A-Z C Fa(k)Fa3(k)(2n)aa*

x

jOznd 4 exp[(ik’

= N,(h/mcS)’A-’

*

(xa - x..)]

C Fa(k)Fa,(k)Jo(kI xa - xa, I )

aa‘

In order to describe the form of 9 ( k ) ,it is convenient to consider the terms with a = a’ separately. Since Jo(0) = 1, we obtain Y ( k )= (h/mcS)2(Nm/A2)C F,Z(k) a

+ (h/mcS)’(Nm/A’)

C Fa(k)Fa*(k)Jo(kRaa*)

a#a’

(B.9)

The first term on the right is immediately identifiable as the definition (5.47), with an obvious notational change. The second term has a much more

96

T. A. WELTON

complex structure and will oscillate violently for all k values satisfying kR,i” Z

(B.lO)

where Rminis the minimum interatomic distance Ru,. It is clear that if we were certain of all the R values for a “molecule,” the form (B.9) would yield superior results in an attempt to reconstruct a micrograph. In actual fact, the unavoidable presence of such factors as radiation damage and rotations around other axes than the single axis considered must operate to smear badly the oscillatory part of (B.9). The result must certainly be that the oscillatory term is sharply reduced from the idealized form given, and we shall not further agonize over the truncation used. Finally, we attempt to evaluate the gaussian assumption for the prior distribution of O(k). We first point out the obvious and unfortunate fact that the set of O(k) corresponding to randomly distributed atoms fails to satisfy some obvious relations between its moments, which would be required for a normal distribution. Consider the first moment, with the nonessential simplification that all atoms ( N , per unit area) are assumed identical. The factor h/rncfl is also dropped, for convenience. We accordingly obtain

O(k) = ( N 2 Q 2 ) - ’ F ( k ) (exp( -ik

*

x,))

(B.11)

n

and use the obvious lemma (exp( -ik * x,)) = 6(k) to obtain (O(k)) = ( N 2 Q Z ) - 1 ( N , N 2 Q Z ) F ( 6(k) 0)

(B.12) (B.13)

If we do the inverse transform, we find O(x) = N,F(O)

(B.14)

which corresponds precisely to the phase shift produced in an electron wave by propagating through the mean electrostatic potential of the sample atoms. A trivial subtraction of (O(k)) from O(k) itself yields a set of quantities with zero mean. Unfortunately, the cubic average of obvious interest

(B.15) (O(k’))l”(k’’) - (O(k”))I) still fails to vanish and the hoped-for uncorrelated normal distribution is therefore skewed and correlated. It is finally of interest to ask what sort of object set would yield a normal, uncorrelated distribution for the O(k). The following form

([O(k) - (qk))l[O(k’)

-

O F ) = (h/rncg)(N2QZ)-’

L,F(k) exp( - ik x,) a

(B.16)


97

has the desired properties if the x, are randomly distributed, as previously, and if the numbers [, are normally distributed, with zero mean, and without correlation. We are saying here that all atoms have the same shape for their charge distributions, but have normally distributed strengths ([,). We can obtain an extraordinarily good approximation to a normal distribution for the O(k) if we assume just four discrete possible values for the [,, namely, [, = +a, f3.146264a

(B.17)

with probabilities of occurrence 0.908248 and 0.091752, respectively. Products of 1, 2, . . . , 7(k) factors would then all have the desired averages, while the eightfold product is of the desired form, but with a numerical coefficient 81, rather than the gaussian value 105. An element of unrealism arises with the introduction of negative scattering amplitudes. This can be partially repaired by adding a suitable constant to all amplitudes and subtracting ( O ( k ) ) from the definition (B.16), but full realism for a realizable object set seems difficult to achieve. In view of the apparent importance we have attached to the quantity Y ( k ) , and the further assumption of a normal distribution for the O(k), it would appear to be of considerable interest to carry through reconstruction of a synthetic object, constructed according to (B.16). The comparison of interest would be with the corresponding reconstruction of a more realistic object. This comparison could clearly be carried out by a trivial modification of the job step OBJECT, but has not yet been performed. The crucial nature of the assumption of normal distribution is actually somewhat reduced by the existence of the original derivation of our reconstruction algorithm by Wiener, who would have worked only with 9 ( k ) , with the assumption of a best least-squares fit between (0) and 0. APPENDIXC. PROGRAMS FOR DETERMINING W(k)

Two job steps were not described in Section VI, and a brief outline will be attempted here. Bright field electron micrographs have the extraordinarily useful property that essentially complete information on the instrumental and exposure parameters for a given micrograph can be extracted from that micrograph. There thus becomes feasible a procedure which we shall term “ adaptive reconstruction,” (Welton, 1974; Welton and Harris, 1975) and which has been called “ blind deconvolution ” by Stockham, Cannon, and Ingebretson (Stockham et al., 1975). The basic procedure to be described would use as input the absolute square of the Fourier transform of the measured image function Z(x). This can quickly be obtained in reasonable form on film exposed in the focal plane of a focused coherent light beam, in the path of which is placed the

T. A. WELTON

98

transparent micrograph. The resulting diffractogram (or its computed equivalent) has a striking appearance (see Plate Id for a display of a diffractogram computed from a high-coherence synthetic micrograph). The function displayed is extremely noisy, and the innocuous appearance it presents to the eye is the result of a very large amount of sophisticated averaging and filtering which is performed automatically in the eye-brain system of the viewer in an attempt to create some sort of order out of a most disorderly situation. If, for example, a plot is made of the density values along a radial line passing through the center of the display, only a vague hint of the apparently systematic lightdark alternation will survive. The diffractogram can nevertheless be used to obtain reasonably accurate estimates of Ci, A, 01 [Eq. (2.2211, C3 [Eq. (2.20)],6 * (dF/alc),qz ,and qy’ [Eq. (3.12)], N , [Eq. (4.1111, and N, [Eq. (5.48)]. The basic observation is that if we define the diffractogram density as

W) = ( 14k)I2)

(C.1) where the angular brackets indicate a “ suitable” smoothing of the measured values, then 9(k)

-

+

Y ( k ).P2(k) N(k)

(C.2) The constant of proportionality in this relation turns out to be irrelevant, and Y ( k ) ,P(k), and N(k)have the meanings given in Eqs. (5.13), (3.11), and (4.39), respectively. Because of the vanishing of B(k) when +(k) is a multiple of n, we can obtain values of N(k)along such contours (“dark rings ” in the diffractogram). With the assumption of reasonable smoothness for M(k), subtraction of an estimated N(k)can be made along contours where +(k) is an odd multiple of 4 2 , and Y ( k )exp[ - 2E(k)] can there be found by using sin 4 = L- 1. The function Y ( k )is then represented in terms of F,(k) and an equivalent density of carbon atoms N , , and a sampling of values for E(k) can be obtained by division. The known analytic form for +(k), Eq. (2.20), is to determine the of great importance in using the contours of minimum 9@) aberration parameters and also in reliably locating the contours where sin = f 1. The procedure here outlined is simple, in principle, but surprisingly difficult to carry out. Conceptually it is clear, but rigorous it very definitely is not. Central to the whole procedure is a conjecture which states (approximately, at least) that all reasonable forms of averaging will lead to essentially the same result. A rather similar assertion of the equivalence of time averaging and ensemble averaging underlies statistical mechanics, without proven fault. For the problem at hand, we do not expect quite the same equivalence of spatial and ensemble averaging, but the procedure used is plausible and not easy to improve on.

+


99

The definition of the smoothing process to be used for 9 ( k ) is then central to our procedure. If there were no axial astigmatism in the microscope, a simple circular averaging would be clearly appropriate. Although expert operation and careful selection of micrographs should allow such circular averaging in principle, it was felt that a less demanding procedure would be much more generally useful. The averaging procedure chosen was therefore convolution of IY(k) 1' with a gaussian kernel of adjustable width. Thus

9 ( k ) = (R2/n) dk' exp[ - R2 I k

- k'

12]

IY(k)1'

which operation is naturally carried out by double use of the fast Fourier transform. Thus, define the autocorrelation function for Z(x) by D(x)=

k

IY(k) 1'

exp(ik * x)

(C-4)

multiply by the transform of the kernel of (C.3), and calculate the inverse transform. Thus exp( - I x I2/2R2)D ( x ) exp( -ik

9 ( k )= N - 2

- x)

(C.5)

X

which can be calculated with great efficiency to yield a smoothed version of the diffractogram. Some caution is necessary in the choice of the parameter R. If R is chosen too large, the smoothing will be inadequate, whereas with too small an R, the essential light-dark alternation will be strongly attenuated. Simple inspection of the easily obtained optical diffractogram will yield a value for the shortest wavelength effectively present, and the R value can then be easily set. The job step just described (SPEC) produces as output a temporary disk file containing 9(k), which is then available as input for the step WIENER. This final step is simple in principle, but quite complex in execution. Essentially, it explores the variation of 9 ( k ) along a set of 20 radial lines (through k = 0), covering an angular range of 180" in the k space, at uniform angular increments. The other half space is identical and can contribute no further information. Along each radial line, the k values for the minima are noted, and the values of 9 ( k )recorded. A decision is made for each minimum as to what multiple of A the phase 4 there equals. Once this tabulation is complete, the values of k at each angle at which 4 is an odd multiple of 4 2 can be found and the corresponding values of 9(k) can be recorded. A least-squares fit (linear) can then be made to +(k), by adjusting C1, A, a, and C3. If the function E ( k ) is taken as a sixth-degree polynomial in k, and k, (even powers only), another linear least-squares fit

100

T. A. WELTON

will determine the coefficients, as well as the required estimate for N,. The noise parameter N, is, of course, determined from the values of 9 at the minima. A useful degree of immunity to the twin hazards of too large or too small an R value has been built into the program, which appears to perform easily and reliably. Mention should be made of several other procedures that have been proposed. The first (Frank et al., 1971) has been tested and appears workable. Since it requires a nonlinear least-squares fit, it may perhaps prove a bit balky in some cases. Another (Frank, 1976) appears to be potentially as useful as the method we have described. It requires two micrographs taken under identical conditions, but appears applicable to cases where the number of minima in 9(k) may be too low for the above procedures to work well.

REFERENCES Archard, G. D. (1955). Proc. Phys. Soc. London, Ser. B 68, 156. Burfoot, J. C. (1952). Thesis, University of Cambridge. Chiu, W., and Glaeser, R. M.(1977). Ultramicroscopy 2, 207. Deltrap, J. H. M.(1964). Thesis, University of Cambridge. Erickson, H. P., and Klug, A. (1971). Philos. Trans. R. Soc. London, Ser. B 261, 105. Fellgett, P.B., and Linfoot, E. H.(1955) Philos. Trans. R. Soc. London, Ser. A 247, 369. Frank, J. (1973) Optik (Stuttgart) 38, 519. Frank, J. (1975) Proc., Electron Microsc. Soc. Am. Na 2, 182. Frank, J. (1976) Proc., Electron Microsc. Soc. Am. Na 2, 478. Frank, J., Bussler, P. H., Langer, R., and Hoppe, W. (1971). Electron Microsc., Proc. Int. Congr., 7th, 1970, p. 17. Hahn, M. (1973). Nature (London) 241,445. Hahn, M., and Baumeister, W. (1973). Cytobiologie 7, 224. Hanszen, K. J., and Trepte, L. (1971). Optik (Stuttgart) 32, 519. Harada, Y.,Goto, T., and Someya, T. (1974). Proc., Electron Microsc. Soc. Am., p. 388. Langer, R., Frank, J., Feltynowski, A., and Hoppe, W. (1971). Electron Microsc., Proc. Int. Congr., 7th, 1970 Vol. 1, p. 19. Rohler, R. (1967). “Informationstheone in der Optik,” p. 175 and R. Wiss. Verlagsges., Stuttgart. Rose, H. (1971). Optik (Stuttgart) 34, 285. Schemer, 0. (1936). Z . Phys. 101, 593. Schemer, 0. (1947). Optik (Stuttgart)2, 114. Schemer, 0. (1949). J . Appl. Phys. 20, 20. Seeliger, R. (1951). Optik (Stuttgart) 5,490. Siegel, B. M.(1971) Philos. Trans. R. Soc. London, Ser. B 261, 5. Stockham, T. G., Jr., Cannon, T. M.,and Ingebretson, R. B. (1975) Proc. IEEE 63, 678. Stroke, G. W., and Halioua, M.(1973). Optik (Stuttgart) 37, 192. Stroke, G. W., Halioua, M., Thon, F., and Willasch, D. (1974). Optik (Stuttgart) 41, 319. Thon, F., and Siegel, 8. M.(1970). Ber. Bunsenges. Phys. Chem. 74, 1116. Thon, F., and Siegel, B. M.(1971). Electron Microsc., Proc. Int. Congr., 7th, 1970, p. 13. Thon, F., and Willasch, D. (1971). Proc., Electron Microsc. SOC.Am., p. 38.


101

Welton, T. A. (1969). Proc., Electron Microsc. SOC.Am., p. 182. Welton, T. A. (1970). Proc., Electron Microsc. SOC. Am., p. 32. Welton, T. A. (1971a). Proc., Electron Microsc. SOC.Am., p. 94. Welton, T. A. (1971b). Proc. Workshop Con$ Microsc. Cluster Nuclei Defected Cryst., Chalk River Nucl. Lab. CRNL-622-1, p. 125. Welton, T. A. (1974). Proc., Electron Microsc. SOC.Am., p. 338. Welton, T. A. (1975). Proc., Electron Microsc. SOC.Am., p. 196. Welton, T. A., and Harris, W. W. (1975). Electron Microsc., Proc. Int. Congr., 8th. 1974, p. 318. Welton, T. A,, Ball, F. L., and Harris, W. W. (1973). Proc., Electron Microsc. SOC.Am., p. 270. Wiener, N. (1949). “The Interpolation, Extrapolation, and Smoothing of Stationary Time Series.” Wiley, New York. Worsham, R. E., Mann, J. E., and Richardson, E. G. (1972). Proc., Electron Microsc. SOC.Am., p. 426. Worsham, R. E., Mann, J. E., Richardson, E. G., and Ziegler, N. F. (1973). Proc., Electron Microsc. SOC.Am., p. 260. Young, R. D., and Miiller, E. W. (1959). Phys. Reo. 113, 115.


ADVANCE3 I N RLIETRONICS AND ELECTRON PHYSICS, VOL.

48

Fluid Dynamics and Science of Magnetic Liquids RONALD E. ROSENSWEIG Corporate Research Laboratories EXXON Research and Engineering Company Linden, New Jersey I. Structure and Properties of Magnetic Fluids ............................................ 103 A. Introduction ............................................................................ 104 B. Stability of the Colloidal Dispersion ................................................. 108 C. Equilibrium Magnetic Properties ..................................................... 111 D. Magnetization Kinetics ................................................................ 114 E. Viscosity ................................................................................ 117 F . Tabulated Data and Other Properties ............................................... 120 I1. Fluid Dynamics of Magnetic Fluids ...................................................... 122 A . Magnetic Stress Tensor and Body Force ............................................ 122 ............................................... 126 B. Alternate Forms ................... . . C. Generalized Bernoulli Equation ...................................................... 128 D. Summary of Inviscid Relationships ................................................... 130 E. Basic Flows ............................................................................ 131 F . Instabilities and Their Modification.................................................. 141 111. Magnetic Fluids in Devices ............................................................... 157 A. Seals .................................................................................... 158 B. Bearings ................................................................................ 163 C. Dampers ................................................................................ 175 D. Transducers ............................................................................ 177 E. Graphics ............................................................................... 179 F. Other ................................................................................... 183 IV. Processes Based on Magnetic Fluids ..................................................... 186 A. Magnetohydrostatic Separation ...................................................... 187 B. LiquidILiquid Separations ............................................................ 189 C. Energy Conversion .................................................................... 189 D. Other ................................................................................... 190 List of Symbols ............................................................................ 192 References .................................................................................. 195

I.

STRUCTURE A N D PROPERTIES OF

MAGNETIC FLUIDS

In the past, classical mechanics and thermodynamics of fluids have dealt mainly with fluids having no appreciable magnetic moment. In the last ten to fifteen years, problems of mechanics and physics of liquids with strong magnetic properties have been attracting increasing attention and the fluid 103 Copyright 0 1979 by Academic Press, Inc. All rights ofreproduction in MY form rcrcrved. ISBN O - I ~ - O I W E - ~

104

RONALD E. ROSENSWEIG

dynamics of magnetic fluids, similar to magnetohydrodynamics, has begun to be considered as a branch of mechanics. Reviews of the subject are given by Rosensweig (1966a, 1971a),Bertrand (1970),Shliomis (1974),and Khalafalla (1975). Fluid media composed of solid magnetic particles of subdomain size colloidally dispersed in a liquid carrier are the basis for the highly stable, strongly magnetizable liquids known as magnetic fluids or ferrofluids. The number density of particles in suspension is on the order of lOZ3/m3.It is the existence of these synthetic materials that makes the study of magneticliquid fluid dynamics (ferrohydrodynamics) possible. The practitioner of ferrohydrodynamics may well be content to accept the available ferrofluids with their empirically reported properties as given quantities. Others may desire a more full treatment. While a thorough discussion of magnetic fluid structure and properties could occupy an entire chapter, in this section an intermediate path is followed that emphasizes topics mainly concerning the fluid dynamical behavior. In addition, indication is provided of information in the literature relating to broader aspects of the fluids, e.g., preparation, physicochemical behavior, optical and acoustic properties, etc.

A. Introduction

The synthesis and systematic study of the properties of magnetic fluids was started in the 1960s (Pappel, 1965; Rosensweig et al., 1965). These ferrofluids have little in common with the magnetic suspensions of particles used in magnetic clutches which came into use in the 1940s. In these, the suspensions used were of a ferromagnetic powder such as carbonyl iron in a mineral oil. The dimensions of the particles were in the range 0.5-40 pm. The technical application of such suspensions was based on their property of congealing under the influence of a magnetic field (Rabinow, 1949). Ferrofluids differ from these coarse suspensions primarily by the thousand times smaller dimensions of the suspended particles (billion times smaller volume). Depending on the ferromagnetic material and method of preparation, the mean diameter varies from less than 3 to 15 nm. A properly stabilized ferrofluid undergoes practically no aging or separation, remains liquid in a magnetic field, and after removal of field completely recovers its characteristics, e.g., there is no magnetic remanence. The magnetic particles in the colloidal dispersion of a ferrofluid are constantly attracted in the direction of an applied field gradient. Their tendency to drift in the gradient is counteracted by diffusive motion due to thermal agitation. Boltzmann statistics gives a criterion for maximum par-

FLUID DYNAMICS AND SCIENCE OF MAGNETIC LIQUIDS

105

title size that may be stated as follows:

(In this equation and throughout this chapter, SI metric units are employed.) This criterion demands that in a monodisperse mixture the difference in concentration anywhere in the system will not exceed the average concentraN * m/K, T = 298 K, H = 1.59 x lo6 A/m tion. With k = 1.38 x (20,000Oe) and considering magnetite particles with domain magnetization of 4.46 x lo5 A/m (5600G),Eq. (1) gives d < 3.0 x lo-' m or 3.0nm, which falls at the lower boundary of the range for actual ferrofluids noted above. The finite volume occupied by the particles in a ferrofluid will often limit concentration variations more so than indicated here. It is interesting to compare the magnitude of magnetic force to gravitational force on a particle in the ferrofluid. The ratio of these forces is independent of particle size. magnetic force - p o M 1 VH I gravitational force g Ap

(2)

Under the extreme conditions of high field gradient occurring in certain devices, such as ferrofluid seals, the field gradient VH reaches magnitudes of 1.2 x 10" A/mz. For magnetite particles in a typical organic fluid carrier, using Ap = 4500 kg/m3 the force ratio exceeds 1.5 x lo5.This is a huge number, and many colloids ordinarily considered as stable under gravity settling conditions cannot perform as ferrofluids. Preparation of Magnetic Fluids

There are two broad ways to make a magnetic fluid: size reduction of coarse material and chemical precipitation of small particles. Size reduction has been done by spark evaporation-condensation, electrolysis, and grinding. Chemical routes include decomposition of metal carbonyls (Thomas, 1966;Hess and Parker, 1966)and precipitation from salt solutions (Khalafalla and Reimers, 1973% 1974).Thus far the grinding technique introduced by Pappel (1965)has been most used, but for large-scale production the synthetic methods may be more suitable. Grinding should be done in a liquid, preferably one with a low viscosity, and in the presence of a suitable dispersing agent. Long processing times are the rule (5to 20 weeks). Specialized techniques have been developed in finish treating the product. Exchange of the solvent increases the concentration of magnetic solids and removes excess dispersant from solution (Rosensweig,

106


1970). Under some conditions it is possible to exchange surfactant on the particle surface (Rosensweig, 1975). In addition, evaporative removal of solvent and dilution with carrier fluid are commonly used to adjust the particle number concentration. An electron micrograph of the particles in a ferrofluid prepared by grinding is shown in Fig. la, and a histogram of the particle sizes in Fig. lb. Ferrofluids have been prepared in numerous diverse solvents including water, glycerol, paraffinics, aromatics, esters, halocarbons, and some silicones (Rosensweig and Kaiser, 1967; Kaiser and Rosensweig, 1968).To be a good dispersant, molecules should have (1) A “head” that adsorbs on the particle surface. Examples are molecules containing a polar group of carboxyl, sulfosuccinate, phosphonate, phosphoric acid, or amine. With polymers less active groups like succinimide or vinyl acetate can be sufficient. (2) A “tail” (or loop) 1-2 nm in length that is compatible with the base fluid. Chemical structural similarity is a good criterion for this. It is found that bulges (methyl groups on polyisobutylene) or kinks (conjugated bonds in oleic and linoleic acid) in the tails or loops prevent crystallization (association with their own species) and therefore are favorable (Scholten, 1978). Polymers having several anchor groups are the most tenacious stabilizers available. A disadvantage of polymers is the excessive space they tend to occupy. The use of “solubility parameters” should be useful in predicting the compatibility of the anchored tail with the solvent. Finally, while the search for a dispersing agent can be aided by the general rules, only an experiment can be decisive. Aqueous molecular solutions of paramagnetic salts provide magnetization M greater than 2.3 x lo3 A/m in an applied field of 8 x lo5 A/m and have been used for sink/float separation of minerals (Andres, 1976a,b).These paramagnetic solutions may be preferred when high homogeneity is essential and high magnetic field is available. In another direction, statistical mechanical studies such as that of Hemmer and Imbro (1977)are tantalizing in defining molecular parameters of a liquid in order that ferromagnetism may exist; theoretically, the Curie temperature exceeds the melting temperature of a substance if the exchange interaction is sufficiently strong, and FIG. 1. (a) Electron micrograph of Fe,O, magnetic particles prepared by grinding in aqueous carrier fluid. The bar represents 100 nm. (Ferrofluid A 0 1 Ferrofluidics Corporation. Photo courtesy EXXON Research and Development Company.) (b) Particle size histogram with the gaussian fit (solid line). Open circles represent the number of particles.The bars in each interval represent the statistical uncertainty in sampling. (From McNab et a/., 1968.)

30r--50

1

00

g B

c

C .-

n

El

0 -

0 UI)

L

0

n

50

0

D

0

10 Particle diameter (nm)

(b)

p

108


crystalline order is not required. Gold cobalt eutectic melt is reported as ferromagnetic by more than one investigation, but the finding is controversial (Kraeft and Alexander, 1973).

B. Stability of the Colloidal Dispersion Four different interparticle forces are encountered in magnetic dispersions: van der Waals attraction, magnetic attraction, steric repulsion, and electric repulsion. 1. Van der Waals Attraction

The van der Waals-London or dispersion force is due to the interaction between orbital electrons in one particle and the oscillating dipoles they induce in the other. For two equal spherical particles it is given by the expression of Hamaker (Kruyt, 1952) E,=

- - -+- 2 A 6 ( I’

+ 41

(1 + 2)’

( 1 + 2)2

where A is a dimensional quantity that can be calculated from the (UV) optical dielectric properties of particles and medium. For most material combinations A is known only to within a factor of 3. For iron as well as for gamma Fe,O, and for Fe,O, in hydrocarbon media a value of 10- l9 N . m is taken as representative. 1 is the relative surface separation, defined as

1 = (rc/r)- 2

(4)

where rc is center-to-center distance and r is radius of a particle. From Eq. (3) it follows that E , is the same for any pair of equal-sizespheres at the same 1. This potential, plotted in Fig. 2, is powerful but effectively works over a short range only. 2. Magnetic Attraction

Calculation of the critical dimensions, below which a particle becomes absolutely single domain, leads to values of d ranging from tens of nanometers [33 and 76 nm for iron and nickel respectively, see Brown (196911 to several hundred nanometers for materials with strong magnetic anisotropy [4 x lo2 and 13 x lo2 nm for manganese-bismuth alloy and for barium ferrite, see Wohlfarth (1959)l. Thus the particles of a ferrofluid may be regarded as single domain and, hence, uniformly magnetized. The magnetic


Steric repulsion

109

.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.9 1.0 f=s/r

FIG.2. Influence of film thickness 6 on the agglomeration stability for magnetic particles with radius r = 5 nm.

potential energy EM of a particle pair is then exactly described by the formula for dipoles. When the magnetic moments of the particles are collinear, the energy is maximum and can be written as follows:

This potential, plotted in Fig. 2, is relatively long-range, changing slowly with separation of the particles. In absence of a field, thermal motion tends to disorient the dipoles and the attraction energy is lower; Scholten (1978) gives an expression for that case. 3. Steric Repulsion

Steric repulsion is encountered with particles that have long, flexible molecules attached to their surface. As mentioned previously, the molecules can be simple linear chains with an anchor polar group at one end, e.g., fatty acids, or long polymers with many polar groups along the chain so that adsorption occurs with loops. Except for the anchor part, the adsorbed molecules perform thermal movements. When a second particle approaches closely, the positions the chains take up are restricted. Just as when the volume of gas is decreased, this loss of space (entropy) requires work when done at constant temperature. For chains that have a tendency to bind solvent molecules, the approach also involves the energy of breaking these chain-solvent bonds. Polyethylene oxide in water is an example where this occurs. This second (enthalpic) effect can work both ways: if the polymer

110


molecules would rather associate with their own species, repulsion is reduced or even changed to attraction. This term makes steric stabilization very sensitive to the solvent composition. Calculation of the repulsion energy for adsorbed polymers is difficult and the results are uncertain. For the short chains used often in magnetic fluids, however, an estimate can be made of the entropic effect. The theoretical result of Mackor (1951) for planar geometry was extended to spheres with the following result (Rosensweig et al., 1965):

N is the number of adsorbed molecules per unit area and t = 6/r, where 6 is length of the chain, regarded as a rigid rod. 1 is relative surface separation defined previously. The cross-sectional area of an oleic acid molecule is about 46 x m2 and the extended length about 2 nm. For the sake of calculation it is assumed that N = 1 x 10'' molecules per square meter, corresponding to a fractional surface coverage of about 50%. Figure 2 gives the steric repulsion curve with 6 = 2 nm for a particle having radius r of 5 nm. The steric repulsion decreases with increasing separation of the particles, becoming zero at 1 = 0.8, corresponding to a separation of 26. A second curve illustrates steric repulsion for a shorter chain, 6 = 0.5 nm. The electric repulsion mechanism has not been used in actual magnetic fluids. It does play a role, however, in several preparative methods. The electric repulsion between particles is the Coulomb repulsion of charged surfaces; the surface charges result from ions removed from the surface or adsorbed from the liquid. The repulsive force is reduced by the screening action of the surrounding ions by a mechanism that is well understood (Verwey and Overbeek, 1948). 4. Stability Related to Net Potential Curves

The algebraic sum of the repulsion and attraction energies yields the net potential curves shown as dashed lines in Fig. 2. For 6 = 2 nm the net curve displays an energy barrier of about 25kT, more than sufficient to prevent agglomeration. In comparison the net curve for 6 = 0.5 nm corresponds to attraction between the particles at all separation distances and hence the system fails to stabilize. These trends are rather informative even though the calculations are only crudely quantitative. The steric stabilization mechanism is not available in liquid metals, and to date no truly stable dispersions have been produced (Rosensweig et al.,


111

1965; Shepherd et al., 1972). Although fine iron particles produced by electrodeposition in mercury are well wetted by the mercury, in a magnetic field gradient an iron-free portion of mercury is expelled from the mixture. The concentrated magnetic portion that remains is stiff (Bingham plastic). Exploitation of work function differences may permit electric charge stabilization in the future, and there is need for a fundamental investigation of the surface and colloidal physics of these systems.

C . Equilibrium Magnetic Properties

Suspended in a fluid each particle with its embedded magnetic moment m is analogous to a molecule of a paramagnetic gas. At equilibrium the tendency for the dipole moments to align with an applied field is partially overcome by thermal agitation. Langevin's classical theory can be applied to give the superparamagnetic result provided there is neghgible particle-toparticle magnetic interaction (Jacobs and Bean, 1963). The orientation energy of a particle of volume V with dipole moment m = M V making an angle 0 with the magnetic field H is

U = -pomH cos 0

(7) Boltzmann statistics gives the angular distribution function over an ensemble of particles and from it the average component m is the direction of the field

Unlike in the original application to paramagnetism, here the magnetic moment per particle is a function of temperature. For spherical grains m = Zd3Md/6 (9) The magnetization M is related to Md ,m, and m through the volume fraction 4 of suspended solids

MfMd = &i/m Combining Eqs. (8), (9), and (10) gives

M

__ = coth (PMd

1

a -U

= L(a)

(10) II po MdHd3

a =-

6

kT

(11)

where L(a) denotes the Langevin function. Figure 3 gives magnetization curves computed from Eq. (1 1) for various particle sizes.

112


0

01

02

03

04

05

Applied induction, B(teslas1

FIG.3. Calculated magnetization curves for monodisperse spherical particles with domain magnetization of Fe,O, (4.46 x lo5 A/m). (After Kaiser and Rosensweig, 1968.)

The asymptotic form of the Langevin function for values of the parameter t( small compared to one is L(a) N t(/3. Thus the initial susceptibility is given by

and the approach to saturation, aB1

M=4M

x po MdHd3

Bibik et al. (1973) determined particle size from a plot of M versus 1/H using Eq. (13). In weak fields the chief contribution to the magnetization is made by the larger particles, which are more easily oriented by a magnetic field, whereas the approach to saturation is determined by the fine particles, orientation of which requires large fields. Thus d computed from Eq. (12) always exceeds the value found from Eq. (13). When the initial permeability is appreciable, it is no longer permissible to neglect the interaction between the magnetic moments of the particles. Shliomis (1974) discusses this case, assuming particles are monodispersed, by a method similar to that used in the Debye-Onsager theory of polar liquids. As a result, formula (12) is replaced by


113

In actual ferrofluids there are two additional influences that must be accounted for in relating composition to the magnetization curve (Kaiser and Miskolczy, 1970a). One of these influences is the distribution of particle size which may be determined by means of an electron microscope. The other is the decrease of magnetic diameter of each particle by the amount d, , where d,/2 is the thickness of a nonmagnetic surface layer formed by chemical reaction with the adsorbed dispersing agent; for example, for magnetite one takes ( d , / 2 )= 0.83 nm corresponding to the lattice constant of the cubic structure; the dispersant may be oleic acid that enters into a reaction with the Fe304. Iron oleate is formed which possesses negligible magnetic properties. For a solid particle of 10-nm diameter the volume fraction of magnetic solids is 0.58. Thus, a better agreement is observed between the experimental magnetization curves and theoretical curves calculated with the formula

The dead surface layer mechanism is consistent with experiments of various authors, discussed in Bean and Livingston (1959), that detect no decrease of the spontaneous magnetization of subdomain particles having no sorbed layers for diameters down to at least 2 nm. Mossbauer and magnetic data of NiFe,O, dispersions indicate that loss of magnetization due to sorption of dispersing agent on the surface layer is not due to a magnetic “dead” layer as such; the cations at the particle surface are magnetically ordered but pinned to remain at large angles with respect to the direction of an applied field (Berkowitz et al., 1975). The net effect in Eq. (15) is unchanged.

Formation of Chains and Clusters

Interesting results pertaining to the formation of chains of colloidal magnetic particles and to the effect of a uniform magnetic field on this process were obtained by de Gennes and Pincus (1970) and Jordan (1973). They considered some of the properties of the equation of state of a “ rarified gas ” of ferromagnetic particles suspended in an inert liquid. Allowance was made for the departure of the gas from ideality so far as this resulted from the magnetic attraction between the particles, neglecting any other forces that may be present. By considering pair correlations between particle positions, it was found that in strong external fields the ferromagnetic grains tend to form chains parallel to the field direction. The mean number of particles n ,

114


in the chain is n , = [I

- 4(+//n2)e2']-'

where 1is the (dimensionless) coupling coefficient,

1 = pom2/4nd3kT

(17)

which measures the strength of the grain-grain interaction. When the second term on the right side of Eq. (16) exceeds unity the approximations break down. It may be that clusters rather than chains will then form in the liquid. At zero external field and 1% 1, there also exists a certain number of chains according to the prediction. Their mean length given by no = [1

- 4(+/13)e2']~1

(18)

is smaller than in a strong field, and they are oriented in a random manner. For a magnetite particle with outer diameter 10 nm and A = 0.8 nm, the magnetic diameter is d = 8.4 nm, so with M = 446 kA/m and T = 298 K the coupling coefficient from Eq. (17) is 1= 0.78. With 4 = 0.05 in Eq. (16), the value of n , = 1.35 and from Eq. (18), no = 1.50.The particles are essentially monodisperse and the colloid has little clustering or agglomeration. A particle of elemental iron must be smaller to avoid clustering. The domain magnetization of iron is about four times that of magnetite, so from Eq. (17) the magnetic diameter yielding the same value of 1 is reduced by a factor of 42/3to 3.4 nm. Peterson and Krueger (1978) studied in situ particle clustering of ferrofluids in a vertical tube subjected to an applied magnetic field. The clusters redistribute under gravity by sedimentation in the tube, with concentration detected by a Colpitts oscillator circuit. The clustering is pronounced for present-day water-base ferrofluids and nearly absent for many other compositions such as well-stabilized dispersions in diester or hydrocarbon carrier fluid. Clustering, when it occurs, is reversible with removal of the field, thermal agitation being effective in redispersing the agglomerates. The technique should be useful in evaluating stability of new ferrofluid compositions.

D. Magnetization Kinetics

A ferrofluid may be defined as superparamagnetic if its magnetization obeys the Langevin magnetization law, Eq. (1 1). This superparamagnetic behavior may have two origins.

FLUID DYNAMICS A N D SCIENCE OF MAGNETIC LIQUIDS

115

1. Intrinsic Superparamagnetism of the Grains This corresponds to reversal of the magnetic moment within the grain, there being no mechanical rotation of the grain itself. This relaxation mechanism for sufficiently small subdomain particles was first pointed out by Neel. Reversal of magnetization is possible by surmounting of an energy barrier KV between different directions of easy magnetization relative to the crystalline axis of the grain material. K is the crystalline anisotropy constant and V the volume of a grain. The relaxation time TN is given by 1

TN

KV

= fo ""P(=)

-

wheref, is a characteristic frequency of order lo9 Hz. The transition between ferro- and superparamagnetism is for (KV/kT) 20. f

> 20

TN + 00

ferromagnetism

At a constant temperature around the value KV/kT of 20, TN varies by a factor of lo9 for a variation of the volume by a factor of 2. The critical diameter is in the range of the actual grain diameter (10 nm) of ferrofluids. TN for an oleic acid stabilized ferrofluid in kerosene carrier is given as TN N lO-'sec by Martinet (1977, 1978). Brown (1963a,b) obtained a theoretical result relating frequencyf, to the precessional decay process that accompanies return of magnetic moment of a particle to the equilibrium orientation following an initial perturbation.

2. Superparamagnetism Induced by Brownian Motion Since the particles are suspended in a liquid they are free to rotate and that offers an additional mechanism for reversing the orientation of their magnetic moment. The Brownian rotational relaxation time T g is now of hydrodynamic origin (Frenkel, 1955). T~ = 3Vqo/kT

where qo is the liquid carrier viscosity. The main mechanism for varying T g is change of Q,. For kerosene or aqueous base ferrofluids values calculated from Eq. (21) give Tg 2: lo-' sec. The rate of Brownian rotational relaxation when field is present is determined by solutions to the Fokker-Planck equation (Martensyuk et al., 1974).

116


Shliomis reviews the mechanisms for relaxation of magnetization (1974). He also develops numerical estimates of relaxation for iron particles in a ferrofluid. Above 8.5 nm the orientation of the magnetic moment is controlled by Brownian rotation of the particles. For smaller iron particles the chief relaxation process is the Nee1 mechanism. 3 . Experimental Measurements Martinet (1977, 1978) experimentally investigated the lag angle between the magnetization and the field. In rotating magnetic field this influence appears as a perpendicular component of the susceptibility. Measurements were facilitated by rotating the fluid sample (100 to 700 Hz) in a static magnetic field. In other tests the carrier is polymerized to a solid that mechanically traps the grains (styrene-divinylbenzene mixture replacing kerosene carrier, polyvinyl alcohol substituted for water carrier). For polymerized samples zB 00 SO that zB & zN. A polymerized sample containing 6.5-nm cobalt particles possessing a high crystalline anisotropy (KV/kT N 15) furnished a reference sample in which magnetic moment could not fluctuate spontaneously. Experimental ratios of M , / M I I followed the theoretical trend: invariance with angular rate R and decreasing ratio with increase of field intensity H. A sample polymerized from kerosene parent fluid having magnetic moment that could easily fluctuate inside the grain ( K V k T = 0.7) gave a result in support of The theory: M , / M I Iat R = lo3Hz was too small to observe (less than sample of cobalt particles in fluid carrier gave ratios of M , / M in excess of 8x Mossbauer spectroscopy was utilized.by Winkler et al. (1976) to distinguish between diffusional rotational relaxation and collisional relaxation due to particle-particle impacts; spectral data are given for diester-base ferrofluid. Mossbauer investigations reported by McNab et al. (1968) for ferrofluid composed of Fe304in kerosene carrier gives values [see Eq. (19)]

-

K = (6.0 & 1.0) x lo3 N/mZ

llfo = t o= (9.5 & 1.5) x lo-" sec

(energy barrier) (frequency factor)

in agreement with the order-of-magnitudecalculations of Nee1 and Brown. Water-base ferrofluids normally contain particles of larger size than the particles in organic carrier; a commercially available water-base ferrofluid gave a distribution with geometric mean diameter 10.8 nm and volumeweighted mean diameter 16.5 nm (Keller and Kundig, 1975). This reconciles with the statement of Sharma and Waldner (1977) that water-base ferrofluids give no (intrinsic) superparamagnetism in Mossbauer expenments.

K U I D DYNAMICS A N D SCIENCE OF MAGNETIC LIQUIDS

117

Bogardus et al. (1975) describe a pulsed field magnetometer to measure magnetic moment of a ferrofluid as a function of time. After the applied field is removed from a particular water-base ferrofluid, the magnetization was characterized by a fast decay (< 1 p e c ) and a gradual decay on the order of 4 msec [see Eq. (2111. The fast component is attributed to intrinsic processes within the particle, and the slow part to particle rotation. The mechanisms and behavior of the magnetic relaxation processes are important to no-moving-part pumps (Moskowitz and Rosensweig, 1967) and gyroscopes (Miskolczy et al., 1970), as well as to topics of magnetoviscosity and flows having internal rotation.

E. Viscosity It is axiomatic that a ferrofluid is a material having concomitant magnetic and fluid properties. Ferrofluid retains its flowability in the presence of magnetic field even when magnetized to saturation. Nonetheless, the rheology is affected by presence of the field. The following sections discuss the viscosity of ferrofluid in the absence, then in the presence of an applied magnetic field.

1. N o External Field This situation is the same as for nonmagnetic colloids of solid particles suspended in a liquid (Rosensweig et al., 1965). Thus, theoretical models are available for determining the viscosity, with the earliest being that of Einstein (1906, 1911) derived from the flow field of pure strain perturbed by the presence of a sphere. The resulting formula relates mixture viscosity qs to carrier fluid viscosity qo and solids fraction 4 (assuming for the moment that the particles are bare of coatings) ?s/'lo = 1

+ $4

(22)

This relationship is valid only for small concentrations. For higher concentrations a two-constant expression may be assumed: rls/'to =

1/(1 + a 4 + M 2 )

(23)

It is insisted that this expression reduce to Eq. (22) for small values of 4 and this determines a = - 5. At a concentration 4cthe suspension becomes effectively rigid and so the ratio qo /qs goes to zero. This determines the second constant as b = ($& - l)/&. A value of +c = 0.74 corresponds to close packing of spheres. Uncoated spherical particles of radius r, when present in a ferrofluid at volume fraction 4, will when coated with a uniform layer of dispersing agent having thickness 6, occupy a fractional volume in the fluid

118 of 4( 1 +


Combining these relationships gives

A plot of measured values of (qs - qo)/$qs versus 4 yields a straight line (see Fig. 4). Values of 6/r determined from the intercept at 4 = 0 and from the slope using 4, = 0.74 are in good agreement, yielding 6/r = 0.84. With 6 = 2 nm the particle diameter found in this manner is 4.8 nm. This is less than the mean size determined from an electron microscope count. This directional variance is expected due to the presence of a particle size distribution; small particles with their sorbed coatings tie up a disproportionate share of the total volume in the dispersion.

'4

t

FIG.4. Reduced experimental viscosity data for oleic acid stabilized ferrite dispersions. (After Rosensweig et al., 1965.)

When the suspended particles are nonspherical, theory predicts an increase in the coefficient of 4 in Eq. (22); due to Brownian rotation a larger volume is swept out by a particle of given size. As an example, an axial ratio of 5 increases the coefficient from 2.5 to 6.0(Kruyt, 1952). From the above considerations it follows that highly concentrated (high saturation moment) ferrofluids of greatest possible fluidity are favored by small coating thickness 6, large particle radius r, and spherical shape particles. These desired trends for 6 and r are opposite to the conditions favoring stabilization as a colloid, so in any actual ferrofluid compromises must be made using intermediate values of these parameters.

2. External Field Present When magnetic field is applied to a sample of magnetic fluid subjected to shear deformation, the magnetic particles in the fluid tend to remain rigidly aligned with the direction of the orienting field. As a result larger gradients


119

in the velocity field surrounding a particle are to be expected than if the particle were not present, and dissipation increases in the sample as a whole. Rosensweig et al. (1969) measured the effect of vertically oriented magnetic field on the viscosity of thin horizontal layers of ferrofluid subjected to uniform shear in a horizontal plane. Dimensional reasoning leads to the hypothesis that where qH is viscosity in presence of the field, qs viscosity of the ferrofluid in absence of field, y the shear rate, qo the carrier fluid viscosity, M the ferrofluid’s magnetization, and H the applied field. The data, shown in Fig. 5, roughly define a single curve with the following ranges:

0< r < 10 sec-’. Viscosity 100 N . sec/m2. Under pressure of 133 Pa (1 torr). ‘Average over range 298 K to 367 K. Freezing point. 3.2 kPa.

8.1 x 8.1 x 8.1 x 5.2 5.0

10-4 10-~

10-4 10-4

122


The pH of water-base ferrofluids may be adjusted over a range of acidic and alkaline values. Electrical conductivity of a particular sample having saturation magnetization of 16,000 A/m was about constant at 0.2 S/m at ac frequencies (Kaplan and Jacobson, 1976). The same investigators report some new magnetoelectric effects on capacitance. The origin of osmotic pressure appears to be a controversial subject. Scholander and Perez (1971) measure osmotic pressure of a water-base ferrofluid and discuss the results in terms of alternate theories. Any of the ferrofluids in Table I may be freeze-thawed without damage. Goldberg et al. (1971) report the polarization of light by magnetic fluid. A number of investigators have published studies of optical properties subsequently. Hayes (1975) relates transmission and scattering of light to particle clustering. 11. FLUIDDYNAMICS OF MAGNETIC FLUIDS

The fluid dynamics of magnetic fluids differ from that of ordinary fluids in that stresses of magnetic origin appear and, unlike in magnetohydrodynamics, there need be no electrical currents (Neuringer and Rosensweig, 1964). While theoretical expressions are known for the forces acting between isolated sources of electromagnetic field, there is no universal law describing magnetic stress set up within a magnetized medium. However, satisfactory relationships may be derived using the principle of energy conservation, taking into account the storage of energy in the magnetostatic field. Relationships for stress obtained in this manner are found to depend on detailed characteristics of the material, particularly the dependence of magnetization on state variables. A. Magnetic Stress Tensor and Body Force

Cowley and Rosensweig (1967) derived the following expression for stress tensor for magnetic fluid having arbitrary single-valued dependence of magnetization on magnetic field under the condition that local magnetization vector is collinear with the local field vector in any volume element.

where in Cartesian coordinatesj is the component of the vectorial force per unit area (traction) on an infinitesimal surface whose normal is oriented in the i direction. The Kronecker delta 6,, is unity when the subscripts are equal and vanishes when they are not equal. B, and Hiare components of the magnetic field of induction B and the magnetic field intensity H,respectively.


123

In the S.I. metric system the units of B are Wb/mz or T and H is given in A/m. The constant po has the value 471 x lo-’ H/m. Specific volume u has units of m3/kg. Since instruments for measuring magnetic field intensity are called gaussmeters, and probably will remain so for a long time, it is useful to remember that one tesla (T) is equivalent to lo4 G. Another derivation of the stress tensor leading to the result of Eq. (30) is developed by Penfield and Haus (1967) based on the principle of virtual power. The magnetization M describes the polarization of the fluid medium and is related to B and H through the defining equation

+

B = po(H M) (31) For a ferromagneticfluid without hysteresis the magnitude ofM, denoted by M, has the properties:

where M , is the saturation magnetization. As a consequence of collinearity, B=pH

(35)

M = xH (36) with the permeability p and the susceptibility x representing scalar quantities generally dependent on N and u. p and x are related to each other from their definitions,

x = (P//cLo)- 1

(37)

At a given point H i Bj = pNi H j = H j B ihence T jis symmetrical, so the fluid medium is free of torque. According to its definition the magnetic stress tensor T gives the total magnetic force F, on a volume V, of magnetic field as expressed by the following surface integral: F,=$j;-nds

(38)

where the surface S encloses V , and n denotes the unit normal vector facing outward from the volume. In formulating solutions to given problems it is sometimes most convenient to evaluate stress over an enveloping surface, as indicated by Eq. (38), for example, if field is known everywhere at the surface or has a simpler expression there. Alternatively, using Gauss’s divergence

124


theorem the surface integral in Eq. (38) may be transformed to a volume integral,

where V * T = f , components

appears as the magnetic body force density having fi

= aTj/axj

(40)

Thus from Eq. (30)

The last term in the above may be expanded as

-

where the Maxwell relationship V B = 0 was used. Collecting the components of Bj (aHi/axj) into vectorial form gives the term (B V)H. Since the field vectors are collinear by assumption,

(B V)H = (B/H)(H V)H Using the vector identity

(H * V)H = iV(H * H) - H x (V x H)

(43)

(44)

with V x H = 0 permits (B V)H to be expressed in terms of the vector magnitudes.

-

(B V)H = (B/H)$VH* H = BVH

(45)

Thus the vector force may be written as

Newton's law of motion applied to an infinitesimal element of the magnetic fluid gives

where q is the vector velocity of a fluid element and D/Dt = a/& + q * V the


125

substantial derivative following the fluid motion. The right side of Eq. (47) is the sum of the body forces acting upon a unit volume. The terms familiar from fluid mechanics are

fp = pressure gradient = -Vp(p, 7')

(48)

fg = gravity force = -grad I),

(49)

f,

= viscous

with $ = pgh

force = qV2q (viscosity assumed isotropic

for simplicity) (50) where p(p, T) is thermodynamic pressure and magnetic force density f, was given previously. Substituting the force expressions into Eq. (47) gives the equation of motion aq

p-

at

+ p(q

*

+ POMVH - Vpgh + qV2q

V)q = -Vp*

(51)

where p* is defined as

An isolated volume element dV with magnetization M subjected to an applied field Ho experiences magnetic force po(M * V)Ho (see Fig. 6). For

R0 t 8Ho

FIG.6. A small cylindrical volume of magnetically polarized substance with geometric axis 6s aligned with the magnetization vector M. Poles of density u = p, M appear in equal number and opposite polarity on the ends of area a d . Field H, may be taken as force on a unit pole, hence the force experienced by the volume element is

6F = -H,ua, = po 6H,

+ po(Ho + 6HO)oad

bad

where 6H, is the change of H, along the direction of 6s. Thus 6H, = (6s * V)H, = ( 6 s / M )x (M * V)H, and the differential force becomes 6F = po(M V)HOa,6s.Thus the force per unit volume, 6F/ad 6s becomes

-

force/volume = po(M * V)H, Volume of the element is ad 6s and dipole moment is uad 6s = p o Ma, 6s = p,Mad 6s SO that poM represents the vector moment per unit volume.

126


soft magnetic material M is collinear with Ho and by the same argument that led to Eq. (45), the volumetric force density is expressible in terms of field magnitudes as po MVHo .The resemblance between this expression and the term po MVH in Eq. (46) motivated the expressing of Eq. (46) in that form. However, it is noted that external applied field Ho rather than the local field H enters into the force expression for the isolated volume element. For a whole body the summation of forces produced on the body by itself must vanish so that jjjupo MVH d K must give the same total force as jjjupo MVHo d V , when the integration is carried out over the whole volume of the body. These results are consistent with force on the whole body obtained by integrating Eq. (46).

I[-

V(P0

[ BMu (dv)H*

dHJdV, =

(F)H,

-[[ (Po 1

dH}n dS = 0 (54)

Here the surface S enclosing volume V, is taken just outside the body in a surrounding nonmagnetic medium. B. Alternate Forms

There is arbitrariness in the grouping of magnetic terms in Eqs. (46) and (51) that relates to alternate expressions for stress and force seen in the literature. Thus fm

joaMu dHl + POMVH = - V [ p o jo u aMz d H 1 - V ( p o [ M dH + poMVH

= -.(Po

H

0

i

H

- poMVH - po = -V[po

jo u dM~ d

H- )po

/

V M dH

+ poMVH

0

H

0

V M dH

where V M is evaluated at constant H. The term po jt o(dM/au)dH represents magnetostriction. The magnetostriction term may be omitted in problems of incompressible flow with no effect on the results. For uniform

127


t V M dH = 0 within the fluid region; this integral takes magnetic fluid po s on a finite value then only at an interface. From this perspective the magnetic forces originate only at interfaces. The general force density of Cowley and Rosensweig reduces to the expression of Korteweg and Helmholtz valid for linearly magnetizable media in the following manner: aM H.

T

H. T

(574

+

In the above, use was made of B = p o ( H M ) = pH so that M = [(p/po)- 1]H and permeability p is assumed to depend only on p and not on H . Collecting terms establishes the desired identity.

Another modification of force density found in the literature corresponds to the adding and subtracting of po HVH = V{po st H d H } with the expression for magnetic force of Eq. (56).Neglecting striction in the second equality of Eq. (56) this procedure gives, H

f, = - V ( p o joHM d H l + poMVH = -V

JO

Formulation of magnetic force in terms of the coenergy w‘ H2

wt

=

jop(al

or,, H’)

HZ

128


by Zelazo and Melcher (1969)generalizes the magnetic treatment to account for spatial variation of properties within the magnetic fluid region. The ai’s represent intensive properties of the fluid such as temperature and composition. Table I1 summarizes alternate expressions for stress and body force density in magnetic fluids. Similar to striction, magnetization force density terms which take the form of the gradient of a pressure have no influence on hydrostatics or hydrodynamics of incompressible magnetic liquids. Byrne (1977) gives a survey of the force densities indicating relationships to,early literature dealing with magnetic stresses in solids. C . Generalized Bernoulli Equation For inviscid flows the equation of motion (51) may be rewritten with the aid of vector identities as a4 q2 p -at- p q x w = -grad p * + p 2 + p g h - p , ( M d H )

where o = V x q is the vorticity. For irrotational flow o = 0 there will exist a velocity potential cf, such that q = -grad @. Then if grad T = 0 or 6 M / 6 T = 0 there is obtained as the integral of the equation of motion the generalized Bernoulli equation which follows (Neuringer and Rosensweig, 1964):

M denotes the field averaged magnetization defined by

Asymptotic values of M

l H M = - j MdH H O may be found from Eqs. (34) and (63)

xi is the initial susceptibility, (aM/aH),. For time-invariant flow a@/& = 0 and g ( t ) = const, so generalized Bernoulli equation reduces to P*

+ p ( q 2 / 2 )+ pgh - p o M H = const

(65)

In the absence of an applied field p* = p and the term proportional to disappears. With one or another term absent the remaining terms provide

TABLE I1 MUTUALLY CONSISTENT S T RTENSORS ~ AND ASSOCXATED FORCEDENSITIESFOR ISWROPIC MAGNETIC FLUIDS Stress tensor ( T j )

Formulation Compressible nonlinear media Cowley and Rosensweig (1967); Penfield and Haus (19671 Incompressible nonlinear media

1

- po f H MdH ' 0

Force density (f,,,)

dH

+ -2H 2

i

+ &2 H 2

I+

HiBj

bij

6,,

+ HiBj

Assumptions/Dehitions

aMv

-V[p0

/ + poMVH lo( x )d H~ ,

Jb

V x H = 0,

MllH

V xH=O,

MJIH

VxH=O,

MllH

VxH=O

B=pH

H

= -po

Incompressible nonlinear media Chu (1959)

- 1IoHBdHlbij + H i B j

H

-V[

H

BdH+BVH=-j -0

Incompressible nonlinear media Zelazo and Melcher (1969)

-W'6ij

VHM dH

VHBdH 0

+ pHiHj

w' = ["'$p

dH2

'0

p = p(al . . . an, I f 2 )

Compressible linear media Korteweg and Helmholtz (see Melcher, 1963)

~2

-

ap

~2

v-p- -vp 2 ap 2

VXH-0

1

= :PO($

Incompressible linear media

Maxwell

p = p(p) in the fluid

VXH=O

- $pH2bij

Vacuum stresses

H2)

B=pH

+ pHi Hi

- $po H2bij

+ p o Hi Hj

H2 - -vp 2

0

B = pH, p = constant in the fluid

M=O

130


several important examples from ordinary fluid mechanics. With h = const, the remaining relationship between pressure and velocity describes the operation of the venturi meter, Pitot tubes, and the pressure at the edge of a boundary layer. In hydrostatics with q = 0 the pressure term combined with the gravity term describes, for example, the pressure distribution in a tank of liquid, while the gravity term combined with the term containing speed yields an expression for the efflux rate of material from a hole in the tank. In similar manner, combination of the “fluid magnetic pressure” pm = po MH with each of the remaining terms produces additional classes of fluid phenomena. As an additional feature of magnetic fluid flow that must be considered in concert, the existence of jump boundary conditions is crucial, and this topic is developed next. Thus from Eq. (30) the traction on a surface element with unit normal n is

T*n= The difference of this magnetic stress across an interface between media is a force oriented along the normal which may be expressed as follows:

[ T e n ] = -[T,,]n=

(67)

The square brackets denote difference of the quantity across the interface, and subscript n denotes the normal direction. It is noted that the argument of the bracketed quantity vanishes in a nonmagnetic medium. In deriving this relationship use is made of the magnetic field boundary conditions [B,] = 0 and [H,] = 0, where subscript t denotes the tangential direction. When the contacting media are both fluids, the stress difference from Eq. (67) may be balanced by actual thermodynamic pressures p ( p , T)giving the following result when one medium is nonmagnetic.

P*

= Po - (Po/2)M,2

(68) po is pressure in the nonmagnetic fluid medium and p* was defined previously. While it is familiar to require continuity of pressure across a plane fluid boundary when considering ordinary fluids, this is no longer the case with fluids possessing magnetization. Instead, it is seen from Eq. (68) that magnetic stress at the interface produces a traction (po/ 2 ) M i . D. Summary of Inviscid Relationships

The following equations represent a consistent set of governing relationships for the inviscid flow of magnetic fluids.


Stress tensor H. T

Force density

Bernoulli equation (incompressible, steady flow) p*

+ p zq2 + pgh - p o A H = const

Boundary condition

P* = Po

-

(Po/2)M:

Definitions

l H M = - j MdH HO These relationships are applied to several basic responses of the magnetic fluids as detailed in the following section. E. Basic Flows 1. The Conical Meniscus

Neuringer and Rosensweig (1964)conducted experiments and performed analysis of a vertical current-carrying wire which emerges from a pool of magnetic liquid. In response to the magnetic field, the liquid rises up in a symmetric conical meniscus around the wire (see Fig. 7). The steady current Z produces an azimuthal field with magnitude H = 1/2nr,

(69)

where r, is radial distance. At the free surface M , = 0 so the boundary condition (68) reduces to p: = po . With q = 0 the constant of the Bernoulli equation (65) is evaluated at h ( a )where H = 0 giving po + p g h ( a ) = const. Then, evaluating terms of the Bernoulli at a surface point where the field is finite gives, with minor rearrangement, Ah = h - h ( a ) = poMH/pg

(70)

132


v

Current-carrying rod

( 0 )

FIG. 7. Sketch (a) and photograph (b) of free surface surroundinga current-carrying rod. . (After Neuringer and Rosensweig, 1964.) ,

Then from (Ma) and (69) for small applied fields

while from (64b) and (69) for saturated fluids,

the latter representing a hyperbolic cross section. Krueger and Jones (1974) calculated the surface shape and found good agreement with experiment assuming the Langevin theory of superparamagnetism and a realistic distribution of particle sizes. Next the problem is solved using the stress tensor of Chu to illustrate the use of a different formulation from Table 11.

The force density corresponding to the stress tensor of (73) is

-Vj

BdH+BVH= 0

-j V H B d H H

H

f,=

0

If evaluation off, is restricted to points within the magnetic fluid and the


FIG.7(b)

133

134

RONALD E. ROGENSWEIG

magnetic fluid has the same composition at all points, then V H B = 0 and the body force ascribed to the fluid by the Chu formulation disappears.

f, = 0

(74) Therefore the Bernoulli equation consistent with this formulation has no magnetic term and is the same as for a nonmagnetic fluid. The boundary condition to be satisfied at the free surface must be worked out anew, consistent with (73). Denoting “normal” by subscript n, “tangential” by subscript t, “ liquid ” with 1, and “ vapor” with v, (73) gives the following stress elements: Liquid side

T,, = 0

since 6,,=0

and H , = O

H

17; = T,, = - jo B dH

since H , = 0

Vapor side T,, = 0

Tv= T,, =

-I, B, H

dH

Since Bl = po(H + M) and B, = po H , the stress difference is

If the density of the vapor is neglected, then

P1- P v = - PBAh

(76)

while the balance of all forces at the interface require P l - P Y = T - T ,

(77)

Combining (75), (76), and (77) gives Ah = p O M H / p g This is just the result given previously as (70).

(78)

2. The Classical Quincke Problem

This problem has practical utility in measurement of magnetization in magnetic liquids (Bates, 1961). Figure 8 illustrates an idealized geometry consisting of two parallel magnetic poles. The height and width of the poles


135

Free inter

FIG.8. The classical Quincke experiment showing the rise of magnetic fluid between the poles of a magnet. (After Jones, 1977.)

is much greater than the separation, and also, the poles are assumed to be highly permeable so the magnetic field between them is uniform. The poles are immersed part-way into a reservoir of magnetic liquid of density p . Pole spacing is sufficiently wide that capillarity is not important, so the height to which the liquid rises between the poles is a function of the applied magnetic field H. Because the density of air is very small compared to that of the magnetic liquid, the external ambient pressure may be assumed constant at PO' This problem is easily solved with the Bernoulli equation (65), considering a point 1 chosen at the free surface outside the field and a point 2 at the free surface in the field region. From Eq. (65),

P:

+P

A = PZ

+ P9h2 - POMH

(79)

From the boundary condition formulated as Eq. (68),

P:

= Po

and

PZ = Po

(80)

Combining these relationships, Ah = hz

- h1 = P o M H / p g

(81) Normally the experiment is carried out with fluid in a vertical glass tube.

3. Surface Elevation in Normal Field

For the problem shown in Fig. 9 (Jones, 1977) the plane-parallel poles of a magnet produce vertical field, so the magnetic field is perpendicular to the magnetic liquid interface. The magnetic fluid responds with the surface elevation change Ah. The magnetic field above and below the interface are H 2 and H1,respectively. They are related by the boundary condition on the normal component of B,that is, POPI

+ M ) = POHZ

(84

136


tic fluid

FIG. 9. Uniform magnetic field imposed normal to free ferrofluid interface. The fluid magnetization is assumed smaller than required to produce the normal field instability. (After Jones, 1977.)

Using the Bernoulli equation of (65) gives

P:

+ PShl = P f + PShz - Po AH,

(831

while Eq. (68) for boundary conditions gives P: = Po Combining (83) and (84) gives an expression for Ah.

Ah = h;

- hi =

PS

poMHl

+ p0- 2

Compared to the Quincke result of Eq. (81) the surface elevation is greater in this problem by the amount po M 2 / 2 for the same value of field H in the fluid. Berkovsky and Orlov (1973) analytically investigate a number of problems in the shape of a free surface of a magnetic fluid. 4. Jet Flow The jet flow of Fig. 10 illustrates the coupling that may occur between flow speed and applied magnetic field. The magnetic field is provided by a uniformly wound current-carrying solenoid. Attraction of the fluid by the field accelerates the fluid motion along the direction of its path which is assumed horizontal. The magnetic boundary condition on field at station 2 requires continuity of the tangential component of H.Thus H ; = H; From the Bernoulli relationship of Eq. (65),

(86)


137

Winding of a currentcarrying solenoid I

/

o

o

o

o

Ferrofluid jet

FIG. 10. Free jet of magnetic fluid changes cross section and velocity with no mechanical contact. (After Rosensweig, 1966b.)

The boundary conditions on the fluid parameters from Eq. (68) give

P: = Po

P t = Po Thus 4;

(89) The incompressible fluid velocity satisfies the continuity relationship V q = 0. This may be integrated using the divergence theorem as follows, assuming the jet possesses a round cross section everywhere. - 4: = 2Po A H 2 /P

Combining (89) and (90) gives for the ratio of jet diameter,

5 . Modified Gouy Experiment The technique illustrated in Fig. 11 provides a means for gravimetric measurement of the field averaged magnetization A. Originally the technique was used for measurement of weakly paramagnetic liquids having constant permeability (Bates, 1961), while the present treatment extends the analysis to nonlinear media of high magnetic moment. A tube of weight w, having cross-sectional area a, and containing ferrofluid is suspended vertically by a filament between the poles of an electromagnet furnishing a source of applied field Ha.The top surface of the ferrofluid at plane 1 experiences negligible field intensity, while at plane 2 the field H 2 within the fluid is uniform at its maximum value. The force F is given as the sum of

138


Gravity

Tube containing ferrofluid

FIG.11. Analysis of the modified Gouy relationship.

pressure forces and weight of the containing tube.

F = ( P t - Pokt + WI (92) Note that p* is regarded as capable of exerting a normal stress on a surface in the same manner as ordinary pressure. From the Bernoulli equation of (65) applied between sections 1 and 2,

P:

+ PShl = P t + PSh2 - P o M H 2

(93)

From (68) the boundary condition at the free surface is

P:

= Po

(94)

Combining the above and solving for M gives

where Fo = w, + pg(h2 - h,)a, and represents the force when field is absent. H 2 is less than the applied field Ha due to influence of the sample shape H2 = Ha - DM

(96)

For circular cylinders the demagnetization coefficient D equals 4. Additional examples of magnetic fluid hydrostatics and hydrodynamics are developed later in discussing devices.

6. Convective Flows The isothermal flow of magnetic fluid without free surfaces, to first approximation, assuming absence of magnetorheological influence, is independent of applied magnetic field. However, given a magnetic fluid having a temperature-dependent magnetic moment, body forces may appear when temperature gradients are present, and a number of new phenomena have been investigated. (See also Section IV,C regarding thermomagnetic


139

pumping.) Neuringer (1966) analyzes stagnation point flow of a warm ferrofluid against a cold wall, and parallel flow of warm ferrofluid along a cold flat plate. Numerical results are calculated describing the velocity and temperature profiles when the field source is a dipole. An increase in the magnetic field strength leads to a decrease in the heat flux and skin friction. More recently, Buckmaster (1978)in a study of boundary layers shows that magnetic force can significantly delay or enhance separation. Indeed, if the force is unfavorable, the separation point can be moved all the way forward to a front stagnation point; whereas if the force is favorable, separation can be delayed to a point arbitrarily close to a rear stagnation point. A study of Berkovsky et al. (1973)examines heat transfer across vertical ferrofluid layers placed in a gradient magnetic field (Fig. 12).Equation (51)

VH C

-____

FIG. 12. Geometry for convective heat transfer in a closed volume. (After Berkovsky and Bashtovoi, 1973.)

together with the convective conductive equation for heat flow are numerically integrated subject to the Boussinesq approximation. Experiments with a kerosene-base ferrofluid agree well with the computed values. Heat transfer is increased when the temperature gradient and magnetic field gradient are in the same direction; there is a decrease when the directions are opposite. With increased heat transfer the results are correlated over the range R > lo3,R* > lo3,and 2 < c/w < 10 by the formula NU = 0.42(~/w)-~.~~[R + 4(R*)0.91]0.23

(97)

where Nu is Nusselt number, R is the usual Rayleigh number [see Eq. (12911,

140


and R* = Ipo ZAH/pwg/lo IR. L = (ko + /lo)Mo,where ko = M-' aM/BT is the pyromagnetic coefficient and /lo is the thermal expansion coefficient. Berkovsky and Bashtovoi (1973) review additional results and prospects for convective heat-transfer processes in magnetic fluids.

7. Other Studies

While the literature is somewhat large to permit a review of all work, several interesting directions for further research are indicated by the following studies. As part of their pioneering work, Papell and Faber (1966) simulated zero- and reduced-gravity pool boiling using ferrofluid in a field gradient. It is likely that a complete simulation is not possible due to the influence of magnetism on various surface instabilities; as a potential benefit, however, the new mechanisms should permit enhancement and control of the rate of boiling. Miller and Resler (1975) investigated ferrofluid surface pressure jump in uniform magnetic field. The surface pressure jump at one surface cannot be measured by itself, as any method of measurement will naturally involve two surfaces. A differential manometer was attached to a glass sphere containing ferrofluid with one tube connected at the equator and another at the top of the sphere. The experimental pressure jump followed the directional trend predicted by theory but exceeded prediction by a factor of about 2.2. Magnetization of the ferrofluid as a function of field was not measured directly by these investigators or reported by them, and hence there is a question concerning the interpretation of results. The experiment furnishes a fundamental means to check the continuum theory and deserves to be repeated. Jenkins (1972) formulates constitutive relations for the flow of ferrofluids. His treatment permits anisotropy, hence gives another approach to the incorporation of antisymmetric stress. In considering particular flows (Jenkins, 1971) a peculiar conclusion is reached that the swirl flow in a rotating uniform magnetic field is theoretically not possible. There is need to reconcile the conclusion with the experimental evidence. The ferroelectrohydrodynamics of suspended ferroelectric particles is treated by Dzhaugashtin and Yantovskii (1969). To date an electrically polarizable analog of ferrofluid does not seem to have been produced; it is likely that depolarization by free charge may defeat such efforts. In this sense it may be said that ferrofluid owes its existence to the absence of magnetic monopoles in the environment. The relativistic hydrodynamic motion of magnetic fluids is developed by Cissoko (1976).

K U I D DYNAMICS AND SCIENCE OF MAGNETIC LIQUIDS

141

F. Instabilities and Their Modification

Subtle and unexpected fluid dynamic phenomena are associated with flow instabilities of magnetic fluids. These phenomena offer practical value in a number of cases where magnetization of the appropriate orientation and intensity prevents instability and extends the operating range of the equilibrium flow field. Conversely, in other situations magnetization upsets the fluid configuration, setting limitations that would not exist in the absence of the field. The user of magnetic fluids can benefit from an awareness of both of these aspects. In the following, attention is initially devoted to two uniform layers having an equilibrium plane interface of infinite extent. This apparently simple situation is actually rich in physical interest as will be seen. The stability of the equilibrium is examined in response to flow speed, gravitational force, interfacial tension, magnetic properties, and magnetizing field. 1. Formulating the Problem

The stability of flow may be ascertained from the behavior of the interfacial boundary between the fluid layers. The sketch of Fig. 13 illustrates nomenclature for these systems. Since an arbitrary initial disturbance of the interface may be represented as the superposition of harmonic terms, it is only necessary to consider the evolution of one such term having an arbitrary wavelength.

FIG.13. Nomenclature for interfacial stability of magnetic fluids. (After Zelazo and Melcher, 1969.)

The deflection of the interface may be represented as

5 = toexp ut[cos(yt - k,y - k,z)] (98) Each of the parameters u, y, 5 and the wave numbers k, and k, are taken as real valued. Hence Eq. (98) represents a traveling wave having a velocity termed the phase velocity of magnitude y/k, where k = (ki + kf)”’ and amplitude &, at time t = 0. If u = 0 the disturbance is neutrally stable and Eq. (98) describes a traveling wave of constant amplitude; while for u > 0 the disturbance grows in amplitude with time, and the flow is said to be

142


unstable. Values of u c 0 correspond to stable flows. In circumstances where y = 0 the disturbance is termed static. Determination of definite expressions for the parameters is obtained from solving the ferrohydrodynamic equations in their small disturbance (linearized) form. The algebraic work is simplified when Eq. (98) is represented in alternate form as

< = toRe exp i ( o t - k,y

- k,z)

where Re denotes the real part, i is the imaginary number complex as indicated by the following:

(99) and o is

0,

w=y-iu

(loo)

2. General Dispersion Relation for Moving Nonlinear Media with Oblique Magnetic Fielri Zelazo and Melcher (1969) develop a dispersion relation for stationary layers that may be generalized (M. Zahn, personal communication, 1977)to include motion of the media; the result appears as follows: (o- k, U,)2p, coth ka = gk(pb - pa)

+ (o- k, ub)2pb coth kb

(101)

+ k 3 Y - kZ/Y

where

z = b i ( H l , a -H:,b)2flb1actxc& cash B b b cash Baa sinh Bbb sinh @,a] sinh Baa cash @bb &Excash Baa sinh Bbb]

- k;pO(Hi,, - H i J 2

Y = PO[Bbctx

+

B = [ r : x ( k ; r ; , + kzZrzoz) - kF(G,)21”2/elx

(102a) (102b) (102c) (102d)

B = p(H2)H P = Po(X

+ 1)

(102e) (102f)

The magnetic fluid has magnetization density M that depends on H as illustrated in Fig. 14, where xo is the chord susceptibility and xt the tangent susceptibility. Fluid velocities U, and Ub are directed along the y direction and field is oriented in the xy plane. 6jk is the Kronecker delta function. Superscript degree (“) denotes equilibrium flow value; subscript a denotes “ above ” and b “ below.”

143


H (103A/m)

FIG.14. Nomenclature and typical appearanceof magnetization curve for a magnetic fluid. (After Zelazo and Melcher, 1969.)

Thus the dispersion equation (101)relates angular velocity 0 to the wave vector components of the harmonic disturbance (perturbations) of the interface. In general the individual waves propagate at different velocities, dispersing away from each other.

3. Reduction of the Dispersion Relation for a Linear Medium The dispersion relation is rather complicated, so that initially an understanding of its content is facilitated by considering the simpler form it takes when the magnetic media are linearly magnetizable ( p = const).

Y = k(pb sinh ka cosh kb + pa cosh ka sinh kb)

The dispersion relation then reduces to (0- k,

U,)’p,, coth k,a 4-(0 - k, U b ) ’ p b coth kb = gk(Pb - pa)

+k 3 9

1

kfH;(~ ~pb)’ k z / - b p b ( H ! - H:)2 tanh ka + p,, tanh kb - p b coth kb + p,, coth ka It is noted that the MH-IIa result in Melcher (1963)differs from Eq. (104)in the denominator of the final term (surface current is absent at the infinitely permeable wall so tangential field is continuous). Equation (104)reduces further for thick layers (a + 00, b + m ) to the following: (0- k,

U,)’P,,

+ (0- k,

Ub)’Pb

1

= & ( p b - pa)

+k 3 9

k 2 k p b ( H ! - H:)2 - k,”H,’(pa - pb)’

-

pb

-k p a

pb

+ pa

1

(105)

144


4. No Mean Flow: The Rayleigh-Taylor Problem

The classical Rayleigh-Taylor problem treats the stability of a dense fluid overlaying a less dense fluid. The instability of this fluid configuration provides an explanation for the familiar fact that liquid spills from an inverted bottle despite the fact that atmospheric pressure can support a water column 10 m in length. The instability is prevented in a familiar demonstration using a layer of stiff paper placed in contact over the vessel mouth. The instability phenomenon has broader ramification than might be thought, particularly when kinematic acceleration or deceleration is considered in place of gravitational acceleration. The following simplifications are imposed on Eq. (105), corresponding to the absence of mean flow and the presence of tangential applied field.

ua=o

u b = o

H”,o

H!=o

/J+=pO

(106)

From (105) there is obtained for the dispersion equation,

where M is oriented along the y direction. In the usual nonmagnetic case (Lamb, 1932) M = 0 and (107) reduces to

When P b > pa, corresponding to the more dense layer on the bottom, o is real so 4 = to cos(wt - k y y - k , z ) and the solution describes traveling waves that are neutrally stable. With dense fluid overlaying less dense fluid, < pa and the right side of (108) is negative. o is imaginary (not complex), y = 0, and u is real. Thus 4 = toe”‘ cos(k, y + k, z) corresponding to static waves that are unstable. This is Rayleigh-Taylor instability. As Fig. 15 illustrates, the expression given by Eq. (108) leads to negative values of w2,hence imaginary values of w, when the (negative)gravitational term of (108) is larger in magnitude than the stabilizing interfacial tension term. Incipient instability corresponds to the value k = k* obtained from (108) when w = 0.

A* = 2n/k* is the Taylor wavelength. Wavelengths shorter than A* are stabilized by interfacial tension; another familiar demonstration stabilizes the interface with capillary forces created by an open mesh screen placed over


145

I(-) FIG.15. Dispersion in *k plane for Rayleigh-Taylor instability.

the interface. The effect is dramatized by passing a fine wire through the openings of the mesh into and out of the fluid. As seen from the last term of (107), magnetization provides a stabilizing (stiffening)influence for disturbances propagating along the field lines, while self-field effects are absent for perturbations propagating across the lines of field intensity. Convenient experiments for verifying the dispersion relations with tangential and normally applied magnetic fields use rectangular containers, partly filled with magnetic fluid, driven by a low-frequency transducer to vibrate in the horizontal plane. By shaking the container at appropriate frequencies, it is possible to excite resonances near the natural frequencies of the interface. These occur as the box contains an integral number n, of halfwaves over its length such that k, = n,n/l,, k, = 0. In a typical measurement as illustrated in Fig. 16a, the resonant condition is established by varying the driving frequency in order to approach the resonance from above and, again, from below. In all cases the fluid depth was great enough to ignore the presence of the container bottom. Magnetic field was produced by Helmholtz coils. From Eq. (107) the relative frequency shift to produce standing waves is given as [(Wl

- W;)/W;]1’2 = F ,

(110)

Figure 16b displays a satisfactory agreement between experimental values and theoretical prediction given as the solid line. The resonance frequencies shift upward with increasing magnetization.

146


Ft-,

(b)

FIG.16. (a) Experimental arrangement to determine influence of tangential field on resonance of surface waves on a magnetic fluid. (b) Data illustrating shift of resonance to higher frequencies with increase of fluid magnetization. (After Zelazo and Melcher, 1969.)

5. Gradient Field Stabilization Whereas uniform tangential magnetic field stiffens the fluid interface for wave propagation along the field direction, waves propagating normal to the field remain uninfluenced by the field and become unstable when dense fluid overlays less dense fluid. However, an imposed magnetic field having a gradient of intensity is capable of stabilizing the fluid interface against growth in amplitude of waves having any orientation. The theory for gradient field stabilization is more complex than for the uniform field selfinteraction and includes the self-field influence as a special case (Zelazo and Melcher, 1969). To be stabilizing the field intensity must increase in the direction of the magnetizable fluid whether the magnetizable layer is more dense than the underlaying fluid or the magnetizable layer is less dense and underlays the nonmagnetic fluid, in which case buoyant mixing should be prevented. Normal field possessing a gradient of intensity is less satisfactory for this purpose than tangential field having the requisite gradient due to the desta-


147

bilizing tendency of uniform field oriented normal to the interface (see Section II,F,6). The following expression gives the interface criterion in order for tangential gradient field to prevent the Rayleigh-Taylor instability. PcoM(dH,/W

’d P ,

(112) Surface tension, which was neglected, only further increases the stability. Zelazo and Melcher (1969) tested the relationship of (112) in adverse gravitational acceleration using wedge-shaped steel pole pieces to provide the gradient in imposed field intensity; they report quantitative agreement between theory and experiment. However, it appears that the gradient field extended over the whole volume of the magnetic fluid, so the experiment was unable to distinguish between field gradient support of the liquid and gradient field stabilization of the interface. Rosensweig (1970, unpublished) devised a demonstration to illustrate the effectiveness of employing gradient field that is localized at the magnetic fluid interface. Figure 17 illustrates the stably supported liquid column. In - Pb)

t-

I;

I/PI

I

1

FIG.17. Field gradient stabilization of Rayleigh-Taylor instability illustrating mechanism for a magnetic fluid contactless valve.

one apparatus a sealed glass tube T of 8 mm i.d. and 330 mm length contained magnetic fluid with p = 1200 kg/m3, po M = .012 Wb/m2 (120 G). The field is furnished by a ring magnet M,, slid over the tube, having face-to-face magnetization and made of oriented barium ferrite, 25 mm 0.d. by 7.5 mm thick. As shown in Fig. 17a the fluid column of length 1, is initially supported against gravity by pressure difference p1 less p 2 , the lower interface stabilized by the gradient field. At Fig. 17b the magnet slid to a lower

148


position resulting in a lowering of the fluid column as a whole. Figure 17c depicts the system after the next change of magnet position in this sequence. As the magnet is raised, fluid that is passed over flows to the tube bottom while the overlaying fluid remains in place. The features are each in accord with the expectations of the interface stabilization phenomena. The containment of the fluid is effectively accomplished with a magnetic bamer which may serve as a nonmaterial valve. 6. Normal Field Sut$ace Instability Magnetic field oriented perpendicular to the flat interface between a magnetizable and a nonmagnetic fluid has a destabilizing influence on the interface. The phenomenon was first reported by Rosensweig (1966a) who observed the phenomenon upon producing a magnetizable fluid that was severalfold more concentrated than the fluids previously available. The phenomenon is evoked in its essential form when the applied magnetic field is uniform (see Fig. 18a). As shown in the photograph of Fig. 18b, at the onset of transition the interface displays a repetitive pattern of peaks. The spatial pattern is invariant with time at constant field. Cowley and Rosensweig (1967) gave the analysis and confirming experiments for a nonlinearly magnetizable fluid forming an hexagonal array of peaks (spikes). Consider Eq. (105) with U4=

ub=O

H y = O

Pa=/lo

Pb=P

(113)

In Eq. (114),w appears as a square term only, while the right-hand side is a real number. Thus the value of w is either real or imaginary but never complex. From Eq. (loo),when o is imaginary, an arbitrary disturbance of the interface initially grows with time as described by the factor e”‘,where o is the imaginary part of w. Thus the onset of instability corresponds to the dependence of w2 on k as sketched in Fig. 19a. It is evident from the figure that transition occurs if both the following conditions are met w2 = 0 (115a) and aoz/ak = o (115b) The instability occurs as a spatial pattern that is static in time. Applying the conditions of (115) gives


149

FIG.18. (a) Experimental apparatus for producing the normal field instability in vertical applied magnetic field. Power source supplies current through the ammeter A to the coils C and subjects the ferromagnetic fluid F to an approximately uniform magnetic field. (b) Photograph illustrating the normal field instability of a magnetic fluid free surface. A small source of light at the camera lens is reflected from local flats on the fluid surface. (From Cowley and Rosensweig, 1967.)

Comparing Eq. (117) to Eq. (109) it is seen that k , = k*, i.e., the critical wave number for the normal field instability corresponds to the Taylor wave number. Equation (116) gives the critical value of magnetization M , ; this is the lowest value of magnetization at which the phenomenon can be ob-

150

RONALD E. ROSENSWEIG W2

(b)

I

1.1

1.2

13 1.4 1.5 1.6 1.7 1.8

9

P /Po FIG.19. (a) Dispersion in o-k plane at onset of the normal field instability. (b) Experimental data for appearance of the normal field instability agree with the predictions of theory. (After Cowley and Rosensweig, 1967.)

served. Experimentally and theoretically, when magnetization is increased from zero by increasing the applied magnetic field, the fluid interface is perfectly flat over a wide range of applied field intensities up to the point where transition suddenly occurs. The phenomenon is striking in this regard, and due to its critical nature, the condition for onset is easily and accurately detectable. The corresponding instability at the free surface of a liquid dielectric in a constant vertical electrical field has been studied experimentally by Taylor and McEwan (1965) and theoretically by Melcher (1963). Equation (116) applies for the linear medium. In comparison, the results


151

of Cowley and Rosensweig (1967) for the nonlinear medium are (118a) ro = ( P C P I / P W 2

(118b)

where pc = B o / H o and p1 = (dB/dH), . According to Eq. (118) the role of permeability in Eq. (116) is replaced by the geometric mean of the chord and tangent permeabilities when the medium is nonlinear. As a numerical example, for a pool of magnetic fluid exposed to air at atmospheric pressure with P b = 1200 kg/m3, pa x 0, .F= 0.025 N/m, p,p, = 2p& and g = 9.8 m/sec2, the critical magnetization is M c = 6825 A/m, (86 G). If the fluid’s saturation magnetization were less than this value, it could not display the instability regardless of the intensity of the applied magnetic field. One can offer the following physical explanation for the appearance of peaks on the liquid surface. Suppose that in a uniform vertical field there arises a wavy perturbation of the magnetic fluid surface. The field intensity near the bulges of the perturbations is increased, but in the hollows it is decreased in comparison with the equilibrium value. Therefore the perturbation of the magnetic force is directed upward at the bulges but downward at the hollows; that is, a tendency exists to amplify the perturbation of the surface. On the other hand, the surface-tension and gravity forces are directed opposite to the displacement of the parts of the surface from the equilibrium; that is, they impede the displacement. As long as the warping of the surface is small, all the forces produced by it are proportional to the value of the displacement. The elastic coefficients representing the ratio of force to displacement for surface tension and gravity are independent of applied field intensity. However, the coefficient in the perturbation of the magnetic force is also proportional to the square of the magnetization of the magnetic fluid. Therefore, at sufficiently large magnetization, the destabilizing magnetic force exceeds the sum of the other two forces and instability sets in. Comparisons of the predicted critical magnetization from Eq. (118a) with experiments are shown in Fig. 19b for magnetic fluid with air and water interfaces. Subscript 0 denotes properties of the kerosene carrier liquid. Density p of the magnetic fluid was varied by changing the particle concentration in the kerosene carrier. The comparison between theory and experiment in these tests is excellent, as were the predictions for spacing between peaks. Additional study of the normal field instability develops conditions for appearance of a square array of peaks versus the hexagonal array using an

152


energy minimization principle, and calculates the amplitude of the peaks from nonlinear equations (Gailitis, 1977). Zaitsev and Shliomis (1969) analyze hysteresis of peak disappearance as field is decreased in terms of bifurcation solutions; a critical experiment is needed to confirm this prediction. 7. Kelvin-Helmholtz Instability Classical Kelvin-Helmholtz instability relates to the behavior of a plane interface between moving fluid layers. Wind-generated ocean waves and the flapping of flags are two manifestations of the instability. A rather basic situation in ferrohydrodynamics is the inviscid wave behavior at the interface between layers of magnetized fluid having permeabilitiespa and &,.The following considers the case of applied magnetic field with intensity H , oriented parallel with the unperturbed surface. Gravity is oriented normal to the field and the fluid layers move at speeds U, and Ub relative to fixed boundaries. The dispersion relationship is obtainable as a special case of Eq. (104) for linear media with H: = HJ:= 0. This problem was originally treated abstractly in the monograph of Melcher (1963). (0 - k,

U,)’P, coth ka

+

(0 - k,

Ub)’Pb coth kb

This expression is a quadratic in a.For simplicity, considering the case when k = k, ,a + 00, and b + 00, o becomes complex and concomitantly the flow is incipiently unstable when the following conditions are satisfied:

(120b) The critical wave number is found from (120b) as

-P ~ J ) / ~ ] ” ~

(121) which again correspondsto the critical wavelength 2 4 k Cin Rayleigh-Taylor instability. Eliminating k from (120a) then produces a criterion for instability in the magnetic Kelvin-Helmholtz problem: = kc = [&b


153

The larger the difference in permeability and applied field across the interface, the greater is the velocity difference that can be accommodated before instability occurs. Low density of a layer also promotes stability. 8. Stabilization of Fluid Penetration through a Porous Medium

The interface between two fluids can be unstable when the more viscous fluid is driven through the voids in a porous medium by a less viscous fluid. The phenomenon was analyzed by Saffman and Taylor (1958) using small signal stability analysis. Recently, Rosensweig et al. (1978) demonstrated that if a layer of magnetizable fluid is used to push the more viscous fluid, the interface can be stabilized for sufficiently small wavelengths with an imposed magnetic field. The simplest case, that of a two-region problem, is illustrated in Fig. 20.

FIG. 20. Perturbations on the interface separating two dissimilar fluids penetrating through a porous medium. (After Rosensweig et al., 1978.)

Magnetic fluid pushes nonmagnetic fluid in the presence of tangential applied field. The fluid motion is normal to the interfacial boundary between the two fluids. In a porous medium the details of the interstices are not known, but the local average fluid velocity is adequately described by Darcy’s law O = -Vp-/3q+F (123) where p is the hydrodynamic pressure, /3 = qH / K is the ratio of fluid viscosity q,, to the permeability K which depends on the geometry of the interstices, and F is any other internal force density. In the present m e F is composed of gravitational and magnetization forces. Additional governingequations are the incompressible continuity relationship V q = 0 and the magnetostatic field relations, V x H = 0 and V * B = 0. The interfacial perturbation again may be represented by Eq. (99). Linearization and solution of the governing equations subject to requiring force

154


balance at the interface leads to the dispersion relation

where

If v is negative any perturbation decays with time, while if it is positive the system is unstable and any perturbation grows exponentially with time. Thus surface tension and magnetization tend to stabilize the system; gravity also stabilizes the system if the more dense fluid is below ( p b > pa). If p b < pa, there results an instability of a more dense fluid supported by a less dense fluid. Unlike in Rayleigh-Taylor instability, in fluid penetration it is viscous drag rather than inertia that controls the dynamics. The magnetic field only stabilizes those waves oriented along the direction of the field. Then with k, = 0 so that k = k,, Eq. (124) reduces to

where ro = (pfi/pi)1’2as given previously, and =

(pa

- Bb)F

+ &a

- Pb)

(127)

Surface tension stabilizes the smallest wavelengths (largest wave numbers), and magnetic field stabilizes intermediate wavelengths. However, the system is unstable over a range of small wave numbers readily found from the above relationships when G > 0. When G I 0 the system is stable for all wave numbers whether magnetization is present or not. Additional analysis for a magnetic fluid layer having a finite thickness results in the same stability condition discussed above. The interfacial stability is independent of the layer thickness. The photographs of Fig. 21a illustrate an experimental verification of the magnetic stabilization of fluid penetration. The test utilizes a Hele-Shaw cell consisting of two parallel plates separated by a small distance do in the z direction as shown in Fig. 21b. The flow in the cell models Darcy’s law with the correspondence, (128a) (128b)

FLUID DYNAMICS AND SCIJiNCE OF MAGNETIC LIQUIDS

155

FIG.21. (a) Fluid penetration from left to right through a horizontal Hele-Shaw cell. Plate spacing d = 0.52 mm, aqueous base magnetic fluid viscosity qb = 1.18 mN . sec/m2, oil viscosity qa = 219 mN . sec/m2, interfacial tension Y = 30 mN/sec, velocity V = 0.3 mmisec. (b) Schematic drawing of Hele-Shaw cell. (After Rosensweig et al., 1978.)

9. Thermoconvective Instability

In ordinary fluid mechanics there is a well-known convective instability which arises in a fluid supporting a temperature gradient. Owing to thermal expansion, the hotter portion of the fluid has a smaller body force acting on it per unit volume than does the colder fluid. The fluid, if it is heated from below, may then be considered top-heavy, subject to a tendency to redistribute itself to offset this imbalance, a tendency which is counteracted by the viscous forces acting in the fluid. Theoretical treatment of this phenomenon predicts that the fluid will undergo this convective redistribution when the value of a dimensionless number R, the Rayleigh number, exceeds a certain critical value Ro . The Rayleigh number in ordinary fluids acted upon only by the gravitational body force is given by

156


Ro is 1708 for a horizontal layer of fluid and 1558 for a vertical layer. By analogy it is clear that a similar phenomenon may occur in a ferrofluid subjected to a body force po MVH. The body force depends on the thermal state of the fluid since M = M(T, H ) with (aM/aT), < 0. Thus an increase of temperature T in the direction of the magnetic field gradient tends to produce an unstable situation; the colder fluid is more strongly magnetized, then is drawn to the higher field region, displacing the warmer fluid. Thermoconvective instability of magnetic fluids may be investigated through linearizing the equation of motion, the equation of heat conduction, and the equation of continuity. Shliomis (1974)gives linearized relations for normalized perturbation velocity, temperature, and pressure valid at the limit of stability when equilibrium is replaced by stationary convective motion and the excitations neither decay nor build up with time. The set of equations has the same form as in the problem of ordinary convective stability when a generalized combination of parameters R, plays the role of Rayleigh’s number.

dT A o = - dz ,

dH dz ’

G0 -

1 av

Po = ;@,

ko = - L(E) (131) M aT

Instability is indicated by the following criterion :

Rg

’Ro

1708 Ro = (1558

(horizontal layer) (vertical layer)

(132)

Mechanical equilibrium of an isothermal liquid (Ao = 0) is always stable since the “Rayleigh number” is then always negative

R=

__(POPS kt tl

+ PoMkoGo)2

(133)

so that the inequality R < Ro is known to be satisfied (Cowley and Rosensweig, 1967). In the absence of magnetism (M = 0) Eq. (130)reduces to Eq. (129) provided

T o B o ~ / ~4 o ~1 o

(134)

Taking values To = 300 K, Po = 5 x (K-’), g = 9.8 m/sec, co = 4 x lo3 W * sec/kg . K, and A . = lo3 K/m gives (T,Po g/coA , ) = 3.7 x lo-’ thus justifying the neglect of the adiabatic expansion term.


157

The ratio of the third term (magnetocaloric cooling) to the first term in the brackets of (130)is also generally a negligibly small number. CLOMkoGoTo/~coA,-4 1 (135) Taking ko = K-', M = 29,900A/m, Go = 8 x lo5 A/m2, To = 300 K, p = lo3 kg/m3, and c, A, the same as above gives a ratio of 2.3 x from Eq. (135).Thus both adiabatic terms in (130)can be neglected so the effective Rayleigh number takes the simpler form given by Lalas and Carmi (1971)and Curtis (1971).

R, =

6 (fi, pg + p, M k , G o ) kt rl

Using the numerical values considered above, po M k , G/fiopg = 6.2,so it is seen that the magnetic mechanism dominates over the gravitational mechanism. In the preceding discussion, the field gradient Go was considered as constant throughout the fluid layer. This approach is justified if Go %- G i , where Gi is the gradient of the magnetic intensity induced by the temperature gradient A.

Finlayson (1970) analyzed thermal convective instability in the case where applied field is uniform and magnetically induced temperature change is appreciable. The governing parameter is the dimensionless group

where xt

=

(g) T

(139)

For values of M > 3 x lo5 A/m the critical value of Rf approaches 1708 for any value of x,. Then the magnetic mechanism produces convection provided the following criterion is satisfied: Rf > 1708

(horizontal layer)

(140)

111. MAGNETIC FLUIDSIN DEVICES The unique fluid dynamic phenomena of magnetic fluids have led to numerous exploratory device applications and several proven technological applications. Often a small amount of magnetic fluid plays a critical role and

158


suffices to make rather massive devices possible. The fluid contained within the recesses of the device may not be visible to the user. A good example is provided by magnetic fluid rotary shaft seals. These seals and several other device applications make up the subject matter of this section. A . Seals

The concept of a magnetic fluid shaft seal was developed about the time that ferrofluid became available (Rosensweig et al., 1968b). Sealing regions of differing pressure is now the most well developed of the proposed ferrohydrodynamic applications (Moskowitz, 1974). Figure 22 illustrates schematically how ferromagnetic liquid can be employed as a leak-proof dynamic seal between a rotary shaft and stationary Fwmanent

piece

Ferroiagnetic liquid

permeable

f Cylindrical per manen t ring magnet

FIG. 22. Two basic types of magnetic fluid rotary shaft seals: (a) nonmagnetic shaft, (b) magnetizable shaft. (After Rosensweig et al., 1968b.)

surroundings. The space between the shaft and stationary housing is loaded with ferromagnetic liquid held in place as a discrete ring(s) by the magnetic field. The design in Fig. 22a is adapted for the sealing of shafts that are nonmagnetic. External field generated by the permanent magnet emanates from a pole piece of the outer member and reenters another pole piece located on the same member. The magnetic field in the gap region is oriented tangential to the shaft surface. The alternate configuration of Fig. 22b is essentially that employed in commercially available ferrofluid seals. A magnetically permeable shaft and outer housing are used as part of a lowreluctance magnetic circuit containing an axially magnetized ring magnet mounted between stationary pole blocks. Focusing structures are employed to concentrate the field in a small annular volume with the ferrofluid introduced into this region. In this configuration the magnetic field is oriented transversely across the gap. Using this arrangement, pressure differences up to about lo5 P can be supported across a single-stage seal. The design

159


shown in Fig. 22b is particularly adaptable to staging whereby much larger pressure differences can be sustained (Rosensweig, 1971b, 1977). The multistage seals of this type have reached wide-spread commercial usage (see Rotary Seals Catalog Handbook, Ferrofluidics Corporation, Burlington, Massachusetts). 1. Principle of Magnetic Fluid Seals

To analyze seals utilizing the transverse orientation of field consider the sketch of Fig. 23 which illustrates a path through one liquid stage. Pressure 4 is greater than pressure 1 with the fluid displaced somewhat toward direction 1. The interface 4/3 is in a region where field is assumed uniform and

FIG.23. Relationship for pressure difference across one stage of a ferrofluid seal.

oriented tangential to the interface between ferrofluid and the surrounding nonmagnetic medium. Interface 2/1 is located in a relatively weak portion of the fringing fluid, and gravitational force is negligible. From the Bernoulli equation (65) applied between points 3 and 2 within the ferrofluid, p': - p 0 ( M H ) 3 = P; - p O ( M H ) Z Or P': - P t = p O [ ( f i H ) 3 - ("H)2] (141) Since the normal component of magnetization is zero at both interfaces, the boundary condition (68) gives

P': = P 4

and

Pt

=P1

(142)

Hence

The integral in this equation appears frequently in ferrohydrodynamics; Fig. 24 illustrates its meaning as an area under the magnetization curve. In a well-designed seal, the field H , is negligible compared to H 3 so that the burst pressure of the static seal is given closely as Ap = p o M H (144) where MH is evaluated at the peak value of field (Rosensweig, 1971a).

160


I

Soluration

"3

H2 Magnetic field H

FIG. 24. The shaded area under the magnetization curve relates to pressure difference across a stage of a magnetic fluid seal.

The development above assumes that the ferrofluid is uniform in composition. Actually the particle concentration is greater in high intensity portions of the field and leads to values of Ap that exceed the theoretical value of Eq. (144) in seals that have been idle for a period of time. Since viscosity increases rapidly as particle concentration increases [see Eq. (24) of Section I,E], a seal is stiff to rotate initially. Rotary motion of the seal shaft stirs the fluid, tending to equalize the particle concentration, reduce the torque, and

FIG.25. Experiment to determine static loading of a one-stage magnetic fluid (plug) seal. (After Perry and Jones, 1976.)


161

return the operating pressure capability to the value predicted by Eq. (144). The relationship of Eq. (144) was tested quantitatively by Perry and Jones (1976) utilizing a ferrofluid plug held magnetically by external pole pieces and contained in a vertical glass tube (see Fig. 25). Static loading of this seal was accomplished with a leg of immiscible liquid overlaying the plug. From Fig. 26 it is seen that calculated and measured static burst pressures are in very good agreement for various ferrofluid seals operated over a range of magnetic inductions. + Hydrocarbon- bose I0 = 1.5 mrn 0 Hydrocorbon- bose

-

5j04 -

ID=2 . 8 m m

Woter-bose ID = 1.5 rnm

L

W

2

f

440~-

-g 3.10~\

a

+8 0

210~ Hydrocarbon

1.10~ 0.3 0'

I

0.04 0.06 0.08 010

012

( 4

FIG.26. Calculated and measured static burst pressures of magnetic fluid seals are in very good agreement. (After Perry and Jones, 1976.)

2. Seal Applications Bailey (1976) describes studies in the development of ferrohydrodynamic sealing undertaken for application to a 150-mm diameter feedthrough for a cryogenic liquid helium system. The rotating member in this case was driven up to 3000 rpm with cooling water continuously circulated near the seal to maintain the temperature at a low enough level to avoid evaporative loss of the ferrofluid carrier liquid. Performance and cost of ferrohydrodynamic seals were compared with other more conventional techniques in Bailey's excellent review article. The ferrohydrodynamicseal is also developed for a superconductorgenerator as a feedthrough to a low temperature environment (Bailey, 1978). The 48-mm diameter seal was tested at about 3500 rpm for a period greater than a year. No sign of wear or any imminent failure was obvious on final

-

162


inspection. General Electric Company’s Neutron Devices Department at St. Petersburg, Florida has had seals running 24 hours per day 7 days per week for over 4 years with no operational failures due to the seals. The seals operate at 200 rpm on 12.5-mm diameter shafts and seal against a pressure of 100 kP on a resin degasser system (Perry, 1978).Levine (1977)reports the use of seals requiring high reliability on two space simulators for testing communications satellites. With 1600 hours running time accumulated in continuing tests, the seals had performed reliably and consistently. Ten stages of diester-based ferrofluid in 0.1-mm radial gaps around a 120-mm diameter shaft running at 3000 rpm were tested when sealing helium gas at 100 kP differential pressure. No leak was detected using a mass spectrometer to search for helium leaks. This indicates that the leakage m3 per year and represents an improvement over rate was below 6 x mechanical face seals when sealing gases of many orders of magnitude (Bailey, 1978). The ferrofluid seals are suggested for long-term use in inertial energy storage wherein high-speed flywheels rotate in vacuum enclosure (Rabenhorst, 1975).A patent advocates porous pole pieces to serve as reservoirs of fluid, but no working example is given (Miskolczy and Kaiser, 1973). Another patent (Hudgins, 1974) utilizes a ferrofluid seal combined with a labyrinth seal and a pressurized air cavity for use in gearbox transmission systems. The pressurized air cavity prevents the internal fluids from contacting the ferrofluid. In developing a high-power gas laser, NASA finds that the key to achieving a completely closed-cycle system without using makeup of C0,-helium-nitrogen gas mixture is a multistage ferrofluid seal surrounding the blower drive shaft (Lancanshire et al., 1977). The system operates over a pressure range of 13 to 106 kP, coupled to a 187-kW motor. Moskowitz and Ezekiel (1975) review successful commercial applications of vacuum sealing as well as the sealing of high differential pressure. The seals have also found wide application as exclusion seals preventing liquid, vapor, metallic, and nonmetallic contaminants from reaching machinery parts such as grinding spindles, textile wind-up heads, and digital disk drives. Thus, the disk in a computer magnetic disk drive whirls at high speed, with the read/write head floating on a cushion of air 2.5 pm above it. The magnetic fluid seals keep out contaminants such as 5-pm smoke particles or dust that can cause a “crash” with computer memory loss (Person, 1977). Evidence of technical seal activity in the Soviet Union is given by Avramchuk et al. (1975).The development in Japan of a magnetic fluid seal for use in the liquid helium transfer coupling of a superconducting generator is described by Akiyama et al. (1976). Rotary seals of ferrofluid in contact with water or other liquids generally


163

leak at modest rates of rotation. Calculation from Eq. (122) reveals this trend is consistent with a mechanism of leakage due to Kelvin-Helmholtz instability. The same calculation for sealing against gas yields stability to high rotational rates, again in accord with experience. B. Bearings

Passive bearings based on magnetic fluid flotation phenomena produce hydrostatic levitation of movable members. Dynamic bearings using the magnetic retention of the fluids offer additional novel characteristics. 1. Review of the Phenomenology

Samuel Earnshaw as early as 1839 propounded the theorem that stable levitation of isolated collections of charges (or poles) is not possible by static fields [Jeans, 1948; also Stratton, 1941; a recent discussion of Earnshaw’s Theorem is given by Weinstock (197611. As one consequence, it is impossible to find “all-repulsion” combinations of magnets to float objects free of contact with any solid support. However, it is interesting to note that Braunbeck in 1938 deduced that diamagnetic materials and superconductors escape the restrictions underlying Earnshaw’s theorem. These special materials may be successfully suspended although the diamagnetics suffer from very low load support and superconductors require cryogenic refrigeration. More recently Rosensweig (1966a,b) discovered first the levitation of nonmagnetic objects immersed in magnetic fluids subjected to an applied magnetic field and then (Rosensweig, 1966c)the self-levitation of immersed permanent magnets when no external field is present. These phenomena are reviewed in the following: a. Passive levitation of a nonmagnetic body. Consider a container of magnetic fluid as shown in Fig. 27a. Assume magnetic field and gravitational field are absent so the pressure is uniform at a constant value everywhere within the magnetic fluid region. Two opposed sources of magnetic field are brought up to the vicinity of the container as shown in Fig. 27b. If these sources have equal strength, the field is zero midway between and increases in intensity in every direction away from the midpoint. Since the magnetic fluid is attracted toward regions of higher field, the fluid is attracted away from the midpoint. However, the fluid is incompressible and fills the container so the response it provides is an increase of pressure in directions away from the center. Next, the fate of a nonmagnetic object introduced into this environment may be considered. In Fig. 27c the object is located at the center point. Since pressure forces are symmetrically distributed over its surface, the object

164


1-

MAGNET FIELD SOURCE

c q-

I _ _ _ _ I

I

rkG=

P = UNIFORM

-J!u -

-

FLUID PRESSURE INCREASES AWAY FROM CENTER

MAGNETIC FLUID IN ABSENCE OF ANY FIELD (0)

OBJECT IN STABLE EOUlLlBRlUM ~-~ (PRESSURE FORCES BALANCED) ~~

---

---

OBJECT EXPERIENCES A RESTORING (UNBALANCED PRESSURE FORCES)

FIG.27. Passive levitation of a nonmagnetic body in magnetized fluid. (a) Initial field-free state in absence of applied magnetic field, (b) magnetized state of fluid containing null point of field at center, (c) equilibrium position of the permanent magnet disk in the ferrofluid space, (d) restoring force arises when the levitated magnet is displaced from the equilibrium position. (From Rosensweig, 1978.)

attains a state of stable equilibrium. Displaced from the equilibrium position as shown in Fig. 27d, the object experiences unbalanced pressure forces that establish a restoring force. This is the phenomenon of passive levitation of a nonmagnetic body. b. SeIf-leuitation in magneticpuid. In Fig. 28a the magnetic field again is absent and pressure is constant throughout the fluid. The field of a small permanent magnet, for example a disk magnetized from face to face, is shown in Fig. 28b. With this magnet immersed at the point in the fluid as in Fig. 28c the field is symmetrically disposed and pressure in the fluid, although altered by the field, is symmetrically distributed over the surfaces of the magnet. Accordingly, the magnet experiences an equilibrium of forces in this position. When the magnet is displaced from the center as shown in Fig. 28d, the field distribution no longer remains symmetric; consideration of the permeable path encountered by the magnetic flux readily leads to the conclusion that field is greater over that magnet surface facing away from the center. Again employing the notion that fluid pressure is greatest where the magnetic field is greatest, it is clear that the magnet is subjected to


165

FIELD OF MAGNET

MAGNET MAGNETIC FLUID

(a

,

(b)

FLUID PRESSURE INCREASES TOWARD CENTER

EQUILIBRIUM POSITION (BALANCED PRESSURES) (C

(d)

DISPLACED MAGNET EXPERIENCES RESTORING FORCE (UNBALANCED PRESSURESDUE TO PERTURBED FIELD)

FIG.28. Self-levitation in magnetic fluid. (From Rosensweig, 1978.)

restoring forces that will return it to the center. This is the phenomenon of self-levitation in magnetic fluid. Figure 29 illustrates a corollary to the two types of levitation introduced above, the mutual repulsion of a magnet and a nonmagnetic object when both are immersed in magnetic fluid in a region removed from any fluid

MAGNETIZED BODY

(OR OTHER FIELD SOURCE)

FIG.29. Generalization of the levitational phenomena recognizes the mutual repulsion of a magnet and a nonmagnetic object when both are immersed in ferrofluid. (From Rosensweig, 1978.)

166


boundary. This interaction is without analog in ordinary magnetostatics wherein both bodies must possess magnetic moments if there is to be a static interaction between them. In the present instance, it will be realized that the mutual force is not the result of a direct interaction between the two bodies but is due to magnetic fluid attracted into the space between these bodies. Nonetheless, the net effect is the mutual repulsion of the bodies.

2. Formulation of the Force on an Immersed Body; Levitation

Consider a body, either magnetic or nonmagnetic, with surface S immersed in magnetic fluid, as shown in Fig. 30. The net force acting on the

FIG. 30. Force on arbitrary (magnetized or unmagnetized) body immersed in magnetic fluid in presence (or absence) of an external source of magnetic field.

body is given generally by the expression

# (T-n-pn)dS

F,=

(145)

where n is the outward facing unit normal vector, and T is the magnetic fluid stress tensor having components given by Eq. (30).

-

T n = T,n

Tn=-PoJ” 0

(-)aMv

= HnBl= H I B ,

+ I; t

d H - - ’ O H2* H,T

(146)

+ H,B,

(147a) (147b)

The expression for normal force may be simplified using the Bernoulli expression of Eq. (65) applied between a field-free region of the fluid where


167

pressure is po and a point in the fluid near the surface of the body.

Thus,

Substituting (149) and (147b) into (146) and the result into (145) gives a result (J. V. Byrne, private communication, 1978)which may be expressed as

where the constant term po vanished using the divergence theorem. Equation (150) allows force on an arbitrary immersed body to be computed from field solutions; an analogous expression in terms of the surface tractions may be written to determine torque. If the immersed body is nonmagnetic, the integral over a surface Sijust inside the body disappears since there is no magnetic force on the matter within the surface.

Subtracting (151) from (150) using [B,] = 0 and [HI] = 0, where brackets denote outside value minus inside value, proves that tangential force is zero, hence that the surface force is purely normal; and what remains may be expressed as follows after some reduction.

The surface integral of Eq. (152) depends on the object’s shape and size as well as the magnetic field variables. Present magnetic fluids acted upon by laboratory magnet sources are able to levitate, against the force of gravity, any nonmagnetic element in the periodic table. A condition for stable levitation at an interior point of the fluid space is now simply developed. In addition to F, = 0 in Eq. (152)it is required, at the equilibrium point, that a positive restoring force accompany any small displacement. Since (MH R,2/2) increases asymptotically with H it follows that to levitate, a magnetic field must possess a local minimum

+

168


of field magnitude, i.e., for any displacement, 6H>O

(153) As a special case, the magnetic force on a nonmagnetic immersed object can be given explicit expression in the limit of intense applied field, i.e., (M,2/2)/MH -% 1 (154) Then with uniform grad p,, where p , = po MH is the fluid magnetic pressure, the magnetic force from (152) using the divergence theorem is

F,=

-

fl p,n d S = -\\\grad

pmdV

(155)

= - VpoM grad H

Due to the minus siG in (155) [and (15211 the force is equal and opposite to the magnetic body force on an equivalent volume of fluid. Repulsion and stable levitation owe their existence to the presence of the minus sign. Expression for levitation force on spheres and ellipsoids in a linearly polarizable medium are given by Jones and Bliss (1977). Expressions for levitation force in the case of nearly saturated magnetic fluid are developed for objects shaped as cylinders, spheres, and plates by Curtis (1974). 3. Measurements

A ceramic disk magnet having density 4700 kg/m3 with direction of magnetization perpendicular to the faces self-levitated about 5 mm off the bottom of the vessel containing magnetic fluid of density 1220 kg/m3. The force of repulsion between magnet and the bottom surface of the container when measured for various displacements of the magnet from its equilibrium position gave the results shown in Fig. 3 1 (Rosensweig,1966b).The theoretical curve results from a calculation by the method of images and gives excellent agreement with the data. The flotation of nonmagnetic objects was investigated using a sink/float technique in a field gradient that could be varied. Spheres of glass, ceramic, and coral gave the data of Fig. 32. The ordinate of the graph represents buoyant weight of the immersed object and the abscissa approximates the theoretical force. It is seen that the experimental values fall reasonably close to the parity line. However, it would be desirable to determine the force relationship to a greater precision. Another investigation (Kaiser and Miskolczy, 1970b) demonstrated the flotation of dense solids in the series from diamond (sp. gr. = 3 4 , zirconia, niobium, copper, molybdenum, silver, tantalum, and tungsten (sp. gr. = 19.3).


0

0.I

1

169

Experimental, F< 0 Experimental, F > O

-- Theoretical osymptote 2

5 10 20 Distance from wall, a/2, mm

3

FIG. 31. Levitational force of a magnetized disk immersed in magnetizable fluid versus distance from the horizontal lower plane of the fluid space. (ARer Rosensweig, 1966b.)

FIG.32. Flotation of relatively dense, nonmagnetic materials in magnetizable fluid subjected to an applied magnetic field gradient.

4. Computationsfor a Model Bearing (Rosensweig et al., 1968a.)

A model for analysis of a bearing is illustrated in Fig. 33. The bearing has two-dimensional geometry with magnetic poles distributed along the top and bottom surface of the levitated member. The surface density of poles is specified as sinusoidal in the x direction. The gaps of thickness 6 between an upper wall at surface S2 and a lower wall at surface SI are filled with

170

R O N A L D E. ROSENSWEIG

magnetic fluid. The float is displaced by an amount A setting up a restoring force per unit area Fb. The distribution of magnetic field in the spaces occupied by magnet, fluid gaps, and the nonmagnetic wall is solved exactly, on the assumption of uniform fluid magnetic permeability p. Inserting the field component values into the stress tensor of Eq. (30) and summing contributions over surfaces s, and S 2 gives the following result for normalized force:

(1 57a) (157b) (157c) = 2412 where 12 is wavelength of the repetitive pole pattern and M , is amplitude of the magnet’s magnetization. It can be shown that peak field at the surface in the absence of magnetic fluid is M,/2. The selection of surfaces S , and S 2 adjacent to the nonmagnetic walls is convenient for evaluating the force. However, a property of the stress tensor approach is that other choices must lead to the same result. The relationship for net force on the floated member as predicted by the foregoing analysis is shown in the graph of Fig. 34 in terms of bearing stiffness. For a magnetic fluid having a given permeability, the bearing stiffness is maximized for a particular choice of pole spacing relative to gap length. The optimum pole face spacing falls in the range of kh between

k


0

171

0.8 1.0 1.2 1.4 KA NORMALIZED INITIAL SPRING CONSTANT 0.2

0.4

0.6

FIG.34. Prediction of model bearing stiffness versus pole spacing with fluid relative permeability as parameter. (After Rosensweig et al., 1968a.)

about 0.1 and 0.5 or A/d in the range from 60 down to 12. It is fortunate that the magnet dimensions are much larger than the dimensions of the gap as this makes fabrication of optimized bearings not too difficult. Perhaps opposite to intuition, high permeability of the ferrofluid reduces the bearing stiffness in some cases-the crossover as kd increases along the curve for pr = 10 is an example.

5. Reductions to Practice A magnetic fluid spindle bearing was developed incorporating principles described above (Rosensweig, 1973, 1978); it is shown in the photograph of Fig. 35. The cylindrical outer race is nonmagnetic, while samarium cobalt ring magnets of alternating magnetization stacked on a rotable shaft make up the inner race. With magnetic fluid in the gap, the inner and outer members float apart from each other completely free of mechanical contact with each other. The levitational support is totally passive, requiring no energy input. Starting friction is astonishingly low (or perhaps nonexistent); the outer member rotates under its own minute imbalance, oscillating as a damped pendulum before coming to rest. The device operates silently and was driven to 10,OOO rpm with a fiber loop as a prototype for use in textile machinery. Unlike ball bearing spindles, there are no parts that wear or become noisy in operation.

172


FIG.35. A magnetic fluid spindle bearing. The outer race is supported purely by magnetic fluid forces, with no mechanical contact of relatively movable members and no source of energy.

A cylindrical slide bearing utilizing the magnetic fluid repulsion idea furnished the basis for an integrating accelerometer to sense increments of linear velocity by inertial response (Rosensweig et al., 1968a). The application required a very low order of sticking friction, or stiction as it is termed by instrumentation scientists. Experimentally the device was responsive to


173

FLEXIBLE SURROUND 7

1

NETlC FLUID

FIG. 36. Magnetic fluid levitational force centers the voice coil in high-fidelity loud speakers. The captive fluid also transfers heat away from the voice coil and dampens vibrations. (ARer Teledyne Acoustic Research,Bulletin 700056, Norwood, Massachusetts.)

an input acceleration of less than lo-' g. This is the tilt equivalent to the angle subtended by a 10-mm coin at a distance of 1 km. A recent patent (Hunter and Little, 1977)claims novelty in the use of closed loop operation with electromagnet drivers to measure acceleration in such applications as inertial platform leveling and thrust termination. Another study evaluates the sensitivity of a floated magnet as a material sensor (Cook, 1972) A very different type of application for the bearing principle is illustrated in the sketch of Fig. 36. Magnetic fluid centers the voice coil of an otherwise conventional loudspeaker replacing mechanical spiders ordinarily used for

Motor shaft

Nonferromognetic housing

Disk magnets axially polarized

FIG.37. Components of an inertial damper based on levitation of a magnetic mass in magnetic fluid. (After Lerro, 1977.)

174


this purpose. As a key benefit, the heat generated in the voice coil is effectively transferred through the fluid to the surrounding structure. This innovation increases the amplifier power the coils can accommodate, hence the sound level the speaker produces. The application was recently adopted into commercial practice (see, for example, Teledyne Acoustic Research, Bulletin 700056, Norwood, Massachusetts). A new inertial damper that mounts on the end of a stepping motor shaft or similar device has been developed by Ferrofluidics Corporation (Lerro, 1977). A schematic diagram of the damper is shown in Fig. 37. The ferrofluid acts in a dual function: it provides bearing support for the floated seismic mass and acts as an energy absorber that dissipates energy through viscous shear in the damper. The unit is lighter and less expensive than conventional devices using mechanical bearings. 6. Other Developments in Bearings

The one-fluid hydrostatic bearings already discussed and other principles leading to magnetic fluid bearings are summarized in Table 111. The hydrostatic magnetic fluid bearings eliminate wear, noise, starting torque, power input, and lubricant flow. The hydrodynamic-type magnetic fluid bearings achieve an increase in load carrying ability relative to the TABLE I11 MAGNETIC FLUIDBEARING CONFIGURATIONS

Generic type

Number of fluids

Hydrostatic

One

Hydrostatic

Two

Hydrodynamic

Hydrodynamic

One

Two

Principle

References

Mutual repulsion of magnet and nonmagnetic member Ferrofluid seals periphery of nonmagnetic fluid that carries the load. Ferrofluid held in working gap by magnetic attraction and pumping action. Fluid develops load support from dynamic motion. Ferrofluid seals nonmagnetic fluid in which dynamic support force develops as a result of bearing rotational motion.

Roseiisweig (1978) Rabenhorst (1972); Rosensweig (1973, 1974);

Schmieder (1972) Styles et al. (1973)

Styles (1969)


175

hydrostatic type while sacrificinglow starting torque and wear at start-up or shutdown. Demonstrated state-of-art monofluid bearings of the hydrostatic type support 1 kP with an ultimate potential capability in excess of 650 kP assuming efficient magnets and highly magnetic ferrofluid equivalent to 50 vol% iron in colloidal suspension. A bifluid hydrostatic bearing s u p ported 25 kP using eight sealed stages with conceivable potential for approaching 6500 kP.It is hoped more investigatorswill turn their attention to the problems and opportunities offered by this field.

C. Dampers Dampers utilize the viscous property of magnetic fluids to dissipate kinetic energy of unwanted motion or oscillations to thermal energy. A spectrum of applications have been studied ranging from delicate instrumentation to mass transportation vehicles. There appears to be significant opportunity for further developments in this area. A survey of damper applications is given in the following: 1. Satellite Damper

This ferrofluid viscous damper was developed by Avco Corporation for application on NASA's Radio Astronomy Explorer satellite. The damper (Fig. 38) consists of a small quantity of magnetic fluid hermetically sealed in Riaid connection

Damper boom axis

FIG. 38. Viscous damper designed for service in a radio astronomy Earth satellite; the period of oscillation is approximately 90 min. (After Coulombre et al., 1967.)

176


a vane mounted on the central body of the satellite. The ferrofluid is acted upon by a permanent magnet mounted on a long damper boom. Relative angular motion of the damper boom with respect to the satellite central body produces magnetic force on the ferrofluid causing the fluid to dissipate energy by flowing through a constriction in the vane (Coulombre et al., 1967). Operation of the device resulted in smooth fluid damping with no residual oscillations in contrast to devices wherein mechanical friction is present. 2. Stepping Motor Damper

A stepping motor provides discontinuous angular positioning of an electrically torqued rotor. Due to the sudden on-off motion required, settling time is often a problem. Ferrofluid between the stator and rotor of a stepper motor reduces settling time by as much as a factor of three or four and thus provides damping, without addition of any hardware to the basic motor. The ferrofluid is retained within the gap between the moving and stationary parts by the magnetic field that already is present there. The ferrofluid damping concept has also been applied to DArsonval meter movements and for the damping of flappers using squeeze film viscous motion. 3. Instrument Damper A wave and tide gauge developed by Bass Engineering Co. employs a Bourdon gauge coupled by a linkage to an optical readout. The linkage cannot be supported at all points, hence is subject to oscillations that interfere with the readings. A ferrofluid damper module was developed with viscous fluid held in place magnetically allowing the gauge to operate at any angle of tilt. The principle is widely applicable. 4. Electromagnetic Transportation

Several national groups are investigating the feasibility of concepts for electromagnetic flight at ground level. In the MIT concept a vehicle carrying 200 passengers is levitated with 300 mm clearance above a trough-shaped guideway using superconductive magnets onboard the vehicle. The vehicle levitates at speeds above 15 km/hr, and with modifications of a partially evacuated enclosure can travel at speeds above 200 km/hr. Means for damping roll, pitch, and yaw oscillations must be provided, and one proposal of the developers is to utilize magnetic fluid in a manner similar to the satellite damper described previously. Magnetic field is already present as part of the system to provide coupling to the fluid. At present, however, electrical feedback control methods are dominant.

FL.UID DYNAMICS AND SCIENCE OF MAGNETIC LIQUIDS

177

D. Transducers Transducers convert an input of one physical sort to an output of another sort. Ferrofluids have furnished the basis for transducers of many kinds, and a survey of devices is given below. Opportunity is ample for the conceptualization and development of additional devices to perform specific functions. 1. Acoustic Transducers

Cary and Fenlon (1969) assess the suitability of ferrofluids for acoustic transducer and receiver applications. The piston motion of a ferrofluid induced by an applied field gradient provides a feasible alternative to conventional magnetostrictivetransducers. For an applied static induction field of 0.4 T and a gradient of 10 T/m, it is possible to achieve an overall efficiency that is greater than provided by ferromagnetic solids. Ferrofluids also appear to provide a desirable alternative for pressure-sensing applications in severe environments, for example detonations, where piezoelectric and pyroelectric materials suffer fatigue and failure. These workers also concluded that ferrofluids offer the prospect of obtaining broadband frequency response when radiating or receiving in liquid media 2. Pressure Generator A simple method of generating faithful sinusoidal pressure variations utilizes magnetic fluid's property of linear magnetization at small applied 1976). A schematic diagram of the pressure generator field intensities (Hok, is shown in Fig. 39. Using a drop of magnetic fluid instead of diaphragm, the pressure chamber part can be assembled from a few pieces of tubing and an O-ring, and this part does not have to be fixed to the electromagneticactuator. A working device was found useful over at least the frequency range 0 to 100 Hz in the dynamic calibration of manometers for the clinical environment. Electromagnet

I

generator

Polyethylene tube I silicon

O-ring

\

Transducer or under

Woter

FIG.39. Fluidic pressure signal generator using magnetic Liquid. (After Hok,1976.)


178 3. Level Detector

An elegantly simple device proposed for measuring angle of tilt consists of a hollow cannister partially filled with a magnetic fluid (Stripling et al., 1974). Using magnetic induction pickoff, the device provides readout signals to remote areas in a manner not possible with conventional surveying and leveling devices. The device eliminates the need of an air supply required by air bearing level indicators. 4. Current Detectors

An indicator of current flow described by Sargent (1976) employs magnetic field of the current to draw ferrofluid into a chamber having a transparent face. With current off the ferrofluid flows out of the chamber by capillary action. The device furnishes on-off electrovisual indications. This device has a close relationship to display devices described in Section III,E,l on graphics. Bibik (1975)describes an ammeter device in which current produces field that is focused in a straight, narrow gap between permeable pole pieces. A column of magnetic fluid is drawn to various lengths along the gap which is provided with a calibrated scale along its length. 5. Accelerometers Accelerometers employing the levitation of an inertial-proof mass in magnetized fluid are described previously in the section on bearings. A sketch of an accelerometer is shown in Fig. 40; a prototype integrating PERMANENT / PICKOFF ELECTRODES /MAGNETS /

/

I

BUOYANCY CHAMBER

MAGNETIC POLE s

\

PROOF MASS

BYPASS CHANNEL

FIG. 40. Integrating accelerometer based on levitation of an inertial proof mass in magnetic fluid. (Fluid omitted for clarity.) (After Rosensweig et al., 1966.)


179

accelerometer was built and tested. This device performed up to its design expectations, with a lower limit to detectable input measured as less than lop5g. The scale factor of the instrument is sensitive to changes in fluid viscosity induced by particle concentration gradients and temperature change. These disadvantages are mitigated by operation of the device as an accelerometer rather than an incremental velocity meter, using forcing coils to maintain the proof mass at a central position. Another principle for an accelerometer is suggested in which magnetic particles of a ferrofluid are contained in a hollow chamber completely filled with the liquid suspension (Schmieder, 1970). An acceleration of the case generates a signal due to relative motion between the particles and a set of sensing coils. No analysis of performance was given. 6. Liquid Level Sensor Ferrofluid is specified in a liquid level sensor consisting of a magnetic float surrounding a guide cylinder (Carrico, 1976). The ferrofluid is introduced into the gap between the movable members in order to exclude the process fluid, giving a wider range of temperature operation.

E . Graphics In the field of graphics, ferrofluids are being considered for display devices and as a means for printing hard copy. The important properties of the fluid utilized in these devices include magnetic positioning, magnetic sensing, capillary latching, and deformability. The opaque optical property is important to all these applications. 1. Displays

An early patent for displays specifies the use of an immiscible, transparent fluid and an opaque magnetic fluid of about the same density enclosed in a flat container having a transparent face (Rosensweig and Resnick, 1972). The transparent fluid preferentially wets the wall displacing the opaque fluid when the fluids are moved. The magnetic fluid is positioned with magnets to mask and unmask alphanumerics, or the fluid itself may be shaped into a predetermined pattern. A fascinating property of magnetic fluid is revealed when the flat container of the two fluids is subjected to a uniform magnetic field oriented normal to the flat surface (Romankiw et al., 1975). As shown in Fig. 41 the fluid collects into an intricate maze, a liquid magnetic labyrinth similar to that observed in a demagnetized bubble platelet. According to Romankiw

180


FIG.41. Liquid magnetic labyrinth. Water-base ferrofluid ( p 0 M = 0.02 T at saturation) together with kerosene transparent fluid contained between glass plates with spacing of400 pm and subjected to 0.01-Tinduction field. (See Romankiw et al., 1975.)


181

et al. (1975) the stable configuration is determined by minimizing the total energy. Etotal

= Einterfacial

+ Edemag

-k

Eapplied

(158)

where Eintedacial=

Edemag

Eapplied

=

(ywall)(wall length)(wall height)

-$

=-

MHdemag

MHapplied

d6 d6

Thus, although the ferrofluid has no spontaneous magnetization or anisotropy, the application of a constant bias field perpendicular to the plates induces a moment in the ferrofluid. Since the relative susceptibility of ferrofluid is low, typically less than two, the magnetization in the magnetic fluid is reasonably uniform, so that the general form of these energy relationships is the same as for magnetic bubble domains. Indeed, at higher magnetic fields, liquid magnetic bubbles are formed; that is, one of the stable configurations is the cylindrical domain appearing as a bubble or a hole when viewed in either reflected or transmitted light. An analogy between liquid magnetic bubbles and magnetic bubble domains is more than superficial (Romankiw et al., 1975). Many of the general properties are similar, such as mutual repulsion between the cylindrical domains, attraction to permeable alloy overlays, damping of bubble motion, and the relationship of optimum cylindrical domain diameter to the height of the cylinder and overlay thickness. The size of the liquid magnetic bubbles is ideal for direct use as picture elements. A display using this technology would utilize an array of shift registers, moving the generated elements into the display from behind a shadow mask that can be etched onto the cover plate. Uniform illumination behind the plates is sufficient since the contrast is 100%. Olah (1975) developed magnetic fluid display devices requiring no external force or energy to maintain the readout in a preset condition. The device can be configured as a pattern with seven compartments arranged in the form of a numeral 8 (see Figs. 42 and 43). By selectively filling the compartments with opaque liquid, the numerals from 0 to 9 can be formed similar to more familiar liquid crystal displays, light emitting displays, and the like. Cavities are provided in a housing of two joined blocks permitting fluid to circulate from one cavity to another in response to a pulse of magnetic field. The cavities are filled with two immiscible fluids, one of which is ferrofluid as specified by Rosensweig and Resnick (1972). The cavities are geometrically

182


Transoarent

Opaque black

FIG.42. Magnetic fluid numeric display based on remote positioning of captive fluids using a pulse of current. (After Olah, 1975.)

configured to provide surface tension forces that latch the display in a given state. Units with character size of a few millimeters have been produced as prototypes for use in electronic calculators and related machines. Sensing the impedance of a driver coil can determine the state of the element. Transparent Ferrof bid (opaque)

,Reflecting layer Transparent block

FIG. 43. Cross section of the display device illustrating the cavities and passage configuration leading to latching based on fluid surface-energy effect. (After Olah, 1975.)

Long-term compatibility and stability of the contacting liquid phases is a key requisite in these graphics applications. More research is needed in defining and controlling the state of the interfacial surface separating the phases. Each phase is a complex, multispecies solution containing surface active molecules and other constituents. The influence of fluid motion and fluid properties on emulsification may play an important role.

2. Printing In data processing equipment there is a need for an output printer that is high speed and quiet to handle graphical and pictorial data. Print speeds in excess of 5000 lines per minute are desired as compared to 2000 lines per minute available with mechanical impact printers. Until recently, ink jets


183

considered for this application were restricted to the use of electrostatic principles. Electrostatic technology has some technical problems with the use of high voltage electronics and interaction of charged droplets during transit that motivate the investigation of magnetic systems. Fan (1975; see also Fan and Toupin, 1974) discusses the development of a magnetic ink jet printer employing ferrofluid in which a uniform stream of ferrofluid droplets is produced by introducing periodic disturbances along a flowing jet. The droplets are deflected by passing through a magnetic field gradient of a transducer coil. To eliminate interaction between successive droplets, the path length in the transducer is made shorter than the separation distance between the droplets. Fan’s process illustrated in Fig. 44 introduces a deflector as an additional element to select droplets generated as a steady stream. Directionally, the response time is expected to be faster than in the earlier system suggested by Johnson (1968). y - Deflector

x-Def lector

Selector,

or magnet catcher

/

Paper

FIG.44. Magnetic fluid ink jet printing. (After Fan, 1975; see also Fan and Toupin, 1974.)

F. Other A few other diverse device applications for ferrofluids are indicated in this section, including the use as a mechanical actuator, as a sorter in the production of semiconductor chips; and as a superior means for magnetic inspection of metallurgical alloy structure. 1. Actuators

An actuator that converts electrical energy to mechanical motion and work is the subject of a patent application (Sabelman, 1972), in which ferrofluid and a coil are contained within an elastomeric capsule. Shown in Fig. 45, energization of the coil by flow of current distorts the capsule to give, for example, radial expansion and axial contraction. The distortion is due to redistribution of the ferrofluid under the influence of the magnetic field. The uses for this device appear to be very broad, with a particular

184


/

F l e x i i l e skin

Fer ro f lu id

FIG.45. Magnetic fluid deforms shape of elastomeric capsule to provide actuating force. (After Sabelman, 1972.)

application as an artificial muscle for prosthetic devices illustrated in the sketch of Fig. 46. This actuator has advantage over conventional solenoids employing a movable core since sliding parts that wear are eliminated, and force at full stroke may be maximized. A recent patent describes a magnetic fluid-actuated control valve having

I

- Coil

energized COII deenergtzed

I

FIG.46. Ferrofluid actuator used as an artificial muscle. (After Sabelman, 1972.)

FLUID DYNAMICS AND SCIENCE OF MAGNEnC LIQUIDS

185

utility in controlling flow from a pressurized reservoir implanted in the body, e.g., artificial pancreas, sphincter for bladder control, or other orthotic devices (Goldstein, 1977). 2. Chip Sorter A complete circuit for a quartz electronic watch may be about 2.5 x

mm thick, formed by the hundreds on a 50-mm silicon wafer. Each circuit may have up to lo3 transistors and each transistor may have to be tested five or six times. This precision job is done by needle probes that hover over the circuit and imprint each defective circuit with a magnetic fluid. Then, magnets rather than hand-held tweezers lift out the imperfect circuits (Broy, 1972). The usual coarse suspension of micron-size particles such as found in magnetic inks is conventionally used in this application, and it seems likely that ferrofluids (particle size 10 nm) could prevent problems associated with clogging of the quills.

-

3. Observing Ferromagnetic Microstructures

Gray (1972) has developed a technique using ferrofluid that improves resolution and magnification of the Bitter technique by tenfold is microscopic observation of magnetic patterns on ferromagnetic alloy microstructures. Experimental apparatus for the magnetic examination of specimens using ferrofluid is shown in the sketch of Fig. 47; the photograph of Fig. 48 illustrates the procedure for applying the ferrofluid to the sample. Previous broad use of the Bitter technique was limited because of the difficulty in preparing fresh colloid that did not agglomerate, whereas the ferrofluid has indefinitely long shelf life. In addition, the ferrofluid eliminates undesirable chemical etching of the sample surface caused by the old preparations. Gray shows photomicrographs of delta ferrite domain patterns; the technique is useful in the laboratory and may be of value in locating precursors to failure in weldments of structures.

Immersion oil Objective lens

FIG.47. Magnetic etching apparatus for observing ferromagnetic microstructures. (After Gray, 1972.)

186


FIG. 48. Technique for applying magnetic fluid to metallurgical sample. (From Gray, 1972.)

In magnetic bubble garnet films, ion implantation can produce a layer with planar magnetization near the surface of the film. The first direct detection of the layer by observation of planar domains associated with bubbles in the underlying garnet was accomplished by the Bitter pattern method using a ferrofluid (Wolfe and North, 1974). The method uses a thin layer of the fluid pulled by capillary action into the space between a glass slide and the sample. The magnetized particles of the ferrofluid interact with local magnetic fields of the garnet domains which are made visible by transmitted unpolarized light.

Iv.

PROCESSES BASEDON

MAGNETIC FLUIDS

The magnetic fluid in a process application contacts other streams of matter or energy undergoing physical, or possibly chemical, change. Whereas device applications sometimes utilize but a minute quantity of magnetic fluid, the typical process application employs the fluid in volume quantity. As the first example sink/float separation processes are discussed; they are sometimes referred to as magnetohydrostatic separations.


187

A . Magnetohydrostatic Separation

One of the most attractive large-scale applications of magnetic fluids involves their capability to separate materials and minerals. Rather than depend on the inherent magnetism of the particles to be separated, a ferrofluid technique in which the medium itself is magnetized separates nonmagnetic particles having a broad density range. Central to the concept of magnetohydrostatic separation is the magnetic levitation force discussed in detail in the Section III,B on bearings. The levitation force is exerted on an immersed body in a magnetic fluid placed in an inhomogeneous magnetic field. As an approximation, the apparent density pa of the magnetic fluid is given by the expression P a = Pt - Vo(M/g)(dH/dz)

(159)

in which p l is the true density, g is the gravitational constant, z is vertical distance in upward direction, and dH/dz is field gradient. With H decreasing in the direction of positive z, the sign of dH/dz is negative so the apparent density exceeds the true density. Using available ferrofluids and state-of-theart electromagnets, it is possible to float any known material that is less magnetic than the ferrofluid. The technique has advantage compared to use of conventional heavy liquids such as bromoform, methylene iodide, thalium formate aqueous solutions, and slurries such as ferrosilicon or magnetite in water. These media provide a limited range of densities (sp. gr. to 5.0) and a disadvantage that the halogenated organics and the heavy salt solutions are toxic. A system for continuous separation of solid feed is based on a series of patents assigned to Avco Corporation (Rosensweig, 1969; Kaiser, 1969; Kaiser et al., 1976). A module (see Fig. 49) was constructed to separate shredded automobile nonferrous scrap metal and also many types of industrial scrap metals. For a throughput capacity of 1 metric ton of mixed metals per hour the power requirement is about 40 kW with cooling water consumption of 400 liters per hour. Greater hourly throughput is possible in larger systems. Standard degreasing equipment may be used to clean the separated solids and recover ferrofluid for recycle to the process. Development of magnetohydrostatic separation along closely similar lines has proceeded in Japan at Hitachi Ltd. (Nogita et al., 1977). A prototype separation system with a throughput capacity of 0.5 metric ton per hour was used to separate components of discarded automobiles and household electrical appliances. Nonmagnetic metals such as aluminum, zinc, and copper were recovered at a yield of 80% and a purity of90%. The solids ranged in size from 6 to 30 mm, and the resolution is stated as 0.3 specific gravity units.

188


I FERROFLUID

,

FE

Y

I N MAGNETIC GAP

FIG.49. Material handling system in the continuous sink/tloat separation process using magnetic fluid. (From Metal Separation System Brochure, Avco Corporation, Lowell, Massachusetts.)

In a different configuration developed by Khalafalla and Reimers (1973b), the magnetic displacing force is directed in a horizontal direction causing particles of various densities to follow different trajectories in the fluid under the influence of vertical gravitational force. An advantage is that weaker, hence cheaper, magnetic fluid may be used and the applied magnetic field intensity can be less. Since the process is dynamic, it likely suffers some loss of resolution due to influence of particle shape on the fluid mechanical drag force. Aluminum-copper-zinc mixtures from solid waste incineration were separated in bench scale tests using a magnetic induction of 0.24 T and a magnetic fluid with saturation induction of 0.0215 T. Study of magnetohydrostatic methods of mineral separation developed in the Soviet Union using aqueous solutions of paramagnetic salts is described in a monograph of Andres (1976a,b). For H o ( N O ~ at ) ~a solution density of 1930 kg/m3 the cgs magnetic susceptibility is 233 x emu/cm3;the corresponding magnetization in an applied field of 1 T is po M of 0.00293 T (29.3 G). The paramagnetic solutions are optically transparent and permit striking visual demonstrations of separations when an intense field gradient is applied to a vial containing solid pebbles of different densities; glass, pyrite, galena, and other substances may be levitated in this manner. Additional studies reported by Zimmels and colleagues (1977) calculate the distribution of force due to particular shaped pole pieces.


189

B. LiquidlLiquid Separations In liquid/liquid separation systems, magnetic fluid is mixed with oilcontaminated water. This renders the oil phase magnetic allowing a physical separation of the oil and water in a magnetic separator. Specificapplications include removing suspended oil from shipboard ballast and bilge waters, removing lubricating oils and cutting oils from factory wastewaters, and cleanup of oily wastewater discharged from rolling and pickling processes in steel mills. A skid-mounted commercial unit is offered (Houston Research, Inc.) for reducing oil in water from 200 ppm to less than 5 ppm. A published paper discusses equipment means to separate oil-water emulsion and to remove an oil spill from the surface of the ocean (Kaiser et al., 1971). An 88-kg portable system was developed to collect oil polluting the water around loading docks and oil storage facilities (Anonymous, 1971). Additional details are given in a U.S. patent (Kaiser, 1972). Another study relates oil removal efficiency to flow rate of the emulsion, amount of ferrofluid added, and the strength of applied magnetic field (Yorim e and Tozawa, 1976). It is interesting to note that buoyant, ferromagnetic, sorbent particles with an affinity for oil are the basis for an alternative technique for control of maritime oil spills (Turbeville, 1973). C . Energy Conversion (Resler and Rosensweig, 1964)

A direct conversion of thermal energy to energy of fluid motion can be based on the change of ferrofluid magnetization with temperature, which is most pronounced near the Curie temperature. The conversion process may be analyzed with reference to flow in a tilted, uniform cross section tube as sketched in Fig. 50. Cold fluid enters at section 1, isothermal entry into a

Mh H=consfanf

I

\

Heat addition

FIG.50. Direct conversion of heat energy to flow work is accomplished by magnetocaloric pump in which ferrofluid having temperature-dependent magnetic moment is heated in presence of magnetic field.

190


magnetic field is complete at section 2, heat addition at constant magnetic field is done between sections 2 and 3, and isothermal flow of heated fluid out of the field is completed at section 4. From Eq. (65) with q = const, there is obtained for sections 1 to 2 and 3 to 4, respectively, P1

= Pf

+ Pdh2 - h l ) - POWT,)H

(160)

P4

= P?

+ Pg(h3 - h 4 ) - P 0 M ( T 3 ) H

(161)

where i@(T,)< (M(T2).In the heated zone of 3 to 4 with grad H = 0 the direct utilization of the equation of motion gives the relationship

Pj - P1 = Pdh2

(162)

- h3)

Eliminating the starred variables gives the following expression for overall pressure increase of the process: (P4 - PI) = POHAM

- Pdh4

- hl)

(163)

Analysis of the process conducted as a closed cycle with provision for regenerative heat transfer and accounting for magnetocaloric energy effect reveals that conversion efficiency can approach the Carnot limit set by the second law of thermodynamics (Resler and Rosensweig, 1967). The process is scientifically interesting in the coupling of thermodynamics with magnetics. Additional cycle analysis and a design configuration has been studied using this no-moving-part converter in application to topping cycles for nuclear reactors and high reliability, light-weight space power supplies (Donea et al., 1968).A laboratory proof of principle demonstration has been given (Rosensweig et al., 1965),and current effort is reported in development of magnetic fluids based on liquid metal carrier to increase the heat release rate in the process (Popplewell et al., 1977).The concept is also suggested for removal of heat from nuclear fusion reactors where high-intensity magnetic field is available as part of the process (Roth et al., 1970). In more modest use of the principle there would seem to be much opportunity for applications in heat pipes, self-actuated pumps in thermal loops, and other devices. D. Other

Several other processes using magnetic fluids deserve review. For the most part these processes have been studied by few investigators. However, the concepts appear promising and worthy of further development.

FZUID DYNAMICS A N D SCIENCE OF MAGNETIC LIQUIDS

191

1. Magnetic Separation by Fluid Coating (Shubert, 1975) In this type of process, particulate mixtures of essentially nonmagnetic materials are separated by selectively coating the surfaces of a component of the mixture with a magnetic fluid. Thereafter, the particulate mixture is subjected to a magnetic separation yielding a magnetic fluid-coated fraction and a nonmagnetic fraction. The process is especially intended for beneficiation wherein a mineral concentrate is recovered from its ore. The selective wetting of surfaces may be achieved by techniques well known in froth flotation practice. Coatings that yield a hydrophobic but aerophilic surface are also organophilic and so are readily wet by a hydrocarbon-base ferrofluid. The minimum amount of magnetic fluid required is that sufficient to form a thin coating on the surfaces of those particles wettable by the fluid. Copper ore of chalcocite in a siliceous matrix was ground and separated from a water slurry using a kerosene-base magnetic fluid following a ferric chloride pretreatment. Zinc sulfide in sphalerite ore separates from the gangue after a dilute sulfurous acid treatment. In another example, waste anthracite coal fines are separated directly from ash.

2. Biochemical Processing

Enzymes function as highly specific catalysts for chemical change. A difficulty of the technique in the past has been their separation from the reaction mixture after the chemical change occurred. A trend in recent work is the immobilization of the enzyme in a polymer support that can be readily removed from the reaction mixture (Adalsteinsson et al., 1977).The separation is facilitated by entrapping ferrofluid in the polymer to form a suspended gel containing the enzyme. Typically, a 1-pm gel particle contains lo2 to lo3 magnetic particles. The resulting magnetic gels can be manipulated using either conventional magnetic filtration or more effective highgradient procedures (Liu, 1976).The technique also circumvents difficulties encountered in packed bed contacting wherein pressure drop is too great due to deformation of the polymer beads. Procedures for preparing and separating small, magnetically responsive, polymer particles should be useful for manipulating a variety of immobilized biochemicals besides enzymes-for example, in radioimmunoassay procedures and affinity chromatography (Mosbach and Anderson, 1977).

192


3. Lubrication A concern of lubrication engineers has been to introduce the appropriate lubricant in the desired location and keep it there. Conventional lubricant retention methods include: mechanical seals, oil-impregnated retainers, wicks, pumping of lubricant, splashing of lubricant, surface treatment to prevent creep, and the like. Magnetic lubricants add the capability of being retained in the desired location by means of an external magnetic field. Ezekiel (1974) reviews concepts using magnetic lubricants in pivots, hinges, ball bearings, gears, and pistons.

LISTOF SYMBOLS Thickness of fluid layer above interface, m Dipolar element cross section area, m3 Cross section area of tube, mz Hamaker Constant, N * m Temperature gradient, K/m Thickness of fluid layer below interface, m Vector induction field, Wb/m2 Component of induction in jth direction, Wb/m2 Height of heat transfer zone, m Specific heat of magnetic fluid, WBg. K Spherical particle diameter, m Wall spacing, m Diameter of fluid jet at sections 1 and 2, respectively, m Reduction of particle diameter due to inert surface layer, m Demagnetization coefficient Energy per particle pair, N . m; energy terms associated with magnetic fluid bubbles when subscripted, N . m ith component of f,, N Characteristic frequency, sec-'. foro = 1 Force, N Restoring force on unit area of levitated slab, N/m2 Magnetic body force density, N/m3 External force density, N/m3 Magnetic force on a body immersed in ferrofluid, N Magnetic force on a volume of ferrofluid, N Gravitational constant, 9.8 m/sec2 Terms defined by Eq. (127), kg/sec2 * m2 Spatial gradient of magnetic field intensity, A/m3 Magnetic field magnitude, A/m Component of magnetic field in ith direction, A/m Component of magnetic field at interface in normal and tangential direction, A/m Vector magnetic field, A/m


I 10

1, m

6 M Md

Mln

A

M "t

n0 "m

"s

n

N Nu P Pm

P* 4

a r rc

r0 'd

R

Rf 6s SO

S t

T

r

F

T

U ua,

V

V

K

ub

Applied magnetic field, A/m Electric current, A Boltzmann constant, 1.38 x N . m/sec; also, wave number, rad/m Critical wave number for onset of normal field instability, rad/m Pyromagnetic coefficient, K-'.ko = - M - ' ( a M / d T ) , Thermal conductivity, W/m . sec . K Wave vector component in y- and z-direction, rad/m Crystalline anisotropy constant, N/mZ Relative surface separation of spherical particles, (rc - 2r)/r Length of region containing standing waves, m Length of heated fluid layer, m Magnetic moment of a particle, A . mz Average magnetic moment in direction of field, A . mz Magnetization magnitude, A/m Domain magnetization, A/m Magnetization of permanent magnet, A/m Field averaged magnetization, A/m. A = H - ' j f M dH Vector magnetization, A/m Number of particles with diameter di in a sample Number of particles in a chain in absence of external field Number of particles in a chain in strong external field Number of nodes less one in standing wave Outward facing unit normal vector Surface concentration of sorbed molecules, m-z Nusselt number (convective to conductive heat transfer ratio) Fluid pressure, N/m2 Fluid magnetic pressure, po M H ,N/m2 Pseudopressure defined by Eq. (52), N/m2 Fluid speed, m/sec Fluid velocity, m/sec Radius of a spherical particle, m Center-to-center separation of spherical particles, m Geometric mean of chord and tangent permeabilities Radial distance from line current, m Rayleigh number [defined by Eq. (12911 Dimensionless group [defined by Eq. (138)] Length of elementary dipole, m Term defined by Eq. (125) Surface area, m2 Surface layer thickness normalized to particle radius Absolute temperature, K Surface tension, N/m Component of magnetic stress tensor representing ith component of stress on surface having normal oriented in jth direction, N/m2 Magnetic stress tensor, N/m2 Orientation energy of isolated dipole, N . m Fluid velocity above and below interface, respectively; m/sec Specific volume, m3/kg Volume of a particle, m3 Control volume, m3

193


Velocity of fluid interface, m/sec Liquid volume, m3 Width of heat transfer zone, m Weight of tube, kg Coenergy, N/m2 Distance from origin along Cartesian coordinate axes, m Defined by Eqs. (102b,a) Greek Symbols Argument of Langevin function Ratio of ferrofluid viscosity to porous medium flow permeability, N . sec/m4; defined by Eq. (102c) if subscripted a or b Thermal expansion coefficient, KAngular frequency, rad/sec Distance between float surface and wall with no displacement, m Displacement of levitated mass, m Defined by Eq. (102d) Viscosity coefficient of ferrofluid in isotropic range, N . sec/m2 A viscosity coefficient of magnetic fluid in presence of external magnetic field, N . sec/m2 Viscosity coefficient of magnetic fluid in absence of external magnetic field, N . sec/m2 Viscosity coefficient of carrier fluid, N . sec/m2 Crystalline anisotropy constant, N/m2 Coupling coefficient Magnetic permeability, E / H ; H/m* Permeability of free space, 4n x lo-' H/m Relative magnetic permeability, p/po Deflection of interface, m Amplitude of interfacial deflection mode, m Expressions defined by Eqs. (157b,c) Particle mass density, kg/m3 Apparent density, kg/m3 Liquid density, kg/m3 Surface density of (apparent) magnetic poles, poles/m2 Brownian rotation relaxation time, sec Ntel (intrinsic) relaxation time, sec Characteristic precessional time, sec Frequency, rad/sec Volume fraction magnetic solids Velocity potential, m2/sec Magnetic susceptibility, M/H Apparent susceptibility Initial susceptibility, aM/aH at H = 0; d dilute, c concentrated Gravitational potential function, N . m Complex frequency, rad/sec

'

* The symbol H for the unit of henry should not be confused with the symbol H for the parameter of magnetic field magnitude.


195

Frequency of standing waves in absence of magnetization, rad/sec Frequency of standing waves in magnetized medium, rad/sec Vorticity vector, secFluid rotational rate, rad/sec

0 0

0 ,

w

R

REFERENCES Adalsteinsson, O., Lamotte, A., Baddour, R. F., Colton, C. K., and Whitesides, G. M. (1977). MIT Industrial Liaison Report NSF (Rann) No. GI34284. Massachusetts Institute of Technology, Cambridge, Massachusetts. Akiyama, S., Fujino, J., Ishihara, A., Ueda, K., Nishio, M., Shindo, Y. and Fuji, H. (1976). Proc. lnt. Cryog. Eng. Con$, 6th, 1976, p. 432. Andres, U. Ts. (1976a). “ Magnetohydrodynamic and Magnetohydrostatic Methods of Mineral Separation.” Wiley, New York. Andres, U. Ts. (1976b) Mater. Sci. Eng. 26, 269. Anonymous (1971). Ordnance, July-August. Avramchuk, A. Z., Kalinkin, A. K., Mikhalev, O., Orlov, D. V., and Sizov, A. P. (1975). Instrum. Exp. Tech. (Engl. Transl.) 18, Part 2, 900. Bailey, R. L. (1976). Proc. ASME Design Technol. Transfer Con/., 2nd (Montreal). Bailey, R. L. (1978). In “Therrnomechanics of Magnetic Fluids” (B. Berkovsky, ed.), p. 299. Hemisphere, Washington, D.C. Bates, L. F. (1961). “Modem Magnetism,” 4th ed. Cambridge Univ. Press, London and New York. Bean, C. P., and Livingston, J. D. (1959). J. Appl. Phys. 30, 120s. Berkovsky, B. M., and Bashtovoi, V. G. (1973). Heat Transfer-Sou. Res. 5, No. 15, 137. Berkovsky, B. M., and Orlov, L. P. (1973). Magnetohydrodynamics (Engl. Transl.), No.4,38. Berkowitz, A. E., Lahut, J. A., Jacobs, I. S., Levinson, L. M., and Forester, D. W . (1975). Phys. Rev. Lett. 34, No. 10, 594. Bertrand, A. R. V. (1970). Rev. Inst. Fr. Pet. 25, 16. Bibik, E. E. (1975). Russian Patent 473,098. Bibik, E. E., Matygullin, B. Ya., Raikher, Yu. L., and Shliomis, M. I. (1973). Magnetohydrodynamics (Engl. Transl.) No. 1, p. 68. Bogardus, E. H., Scranton, R., and Thompson, D. A. (1975). IEEE Trans. Magn. mag-11, No. 5, 1364.

Brown, W. F., Jr. (1963a) Phys. Rev. 130, 1677. Brown, W. F., Jr. (1963b). J . Appl. Phys. 34, 1319. Brown, W. F., Jr. (1969). Ann. N.Y. Acad. Sci. 147,463. Broy, A. (1972). N.Y. Times, March 26, p. 3. Buckrnaster, J. (1978). In “Thermomechanics of Magnetic Fluids” (B. Berkovsky, ed.), p. 213. Hemisphere, Washington, D.C. Byme, J. V. (1977). Proc. IEE 24, No. 11, 1089. Calugkru, Gh., Badescu, R., and Luca, E. (1976). Rev. Roum. Phys. 21, No. 3, 305. Carrico, J. P. (1976). U.S. Patent 3,946,177. Cary, B. B., Jr. and Fenlon, F. H. (1969). J. Acoust. Soc. Am. 45, No. 5, 1210. Chu, B.-T. (1959). Phys. Fluids 2, No. 5, 473. Chung, D. Y., and Isler, W. E. (1977) Phys. Lett. A61, No. 6, 373. Cissoko, M. (1976). C. R. Hebd. Seances Acad. Sci., Ser. A 283,413. Cook, E. J. (1972) “Feasibility Evaluation of a Ferromagnetic Materials Sensor,” Rep. No. 0101-F. Arthur D. Little, Inc., Cambridge, Massachusetts.

196


Coulombre, R. E., d’Auriol, H., Schnee, L., Rosensweig, R. E., and Kaiser, R. (1967). “Feasibility Study and Model Development for a Ferrofluid Viscous Damper,” Rep. No. NAS5-9431, AVSSD-0222-67-CR. Goddard Space Flight Center, Greenbelt, Maryland. Cowley, M. D., and Rosensweig, R. E. (1967) J. Fluid Mech. 30,671. Curtis, R. A. (1971) Phys. Fluids 14, No. 10,2096. Curtis, R. A. (1974) Appl. Sci. Res. 29, No. 5, 342. de Gennes, P. G., and Pincus, P. A. (1970) Phys. Kondens. Mater. 11, 188. Donea, J., Lanza, F., and Van der Voort, E. (1968) “Evaluation of Magnetocaloric Converters,” Rep. EUR 4039e. Euratom, Ispra Establishment, Varese, Italy. Dzhauzashtin, K. E., and Yantovskii, E. I. (1969) Magn. Gidrodin. 5, No. 2, 19. Einstein, A. (1906) Ann. Phys. (Leipzig) [4] 19, 289. Einstein, A. (1911) Ann. Phys. (Leipzig) [4] 34, 591. Ezekiel, F. D. (1974). Des. Eng. Con$, Chicago Paper No. 74DE21. Fan, G. J. (1975). Proc. Magn. Magn. Mater., 21st Annu. Con$ AIP No. 29. Fan, G. J., and Toupin, R. A. (1974) German Patent 2,340,120. Finlayson, B. A. (1970). J . Fluid Mech. 40, No. 4, 753. Frenkel, Y a I. (1955) “Collection of Selected Works” (transl.), Dover, New York. Gailitis, A. (1977). J. Fluid Mech. 82, Part 3,401. Goldberg, P., Hansford, J., and van Heerden, P. J. (1971) J . Appl. Phys. 42, No. 10, 3874. Goldstein, S. R. (1977) U.S.Patent 4,053,952. Gray, R. J. (1972) Proc. 4th Annu. Int. Metallogr. Meet., 1971 ORNL-TM-3681. Hall, W. F., and Busenberg, S. N. (1969). J . Chem. Phys. 51, No. 1, 137. Hayes, C. F. (1975). J. Colloid Interjiace Sci. 52, No. 2, 239. Hemmer, P. C., and Imbro, D. (1977). Phys. Rev. A 16, No. 1, 380. Hess, P. H., and Parker, P. H. (1966). J. Appl. Polym. Sci. 10, 1915, Hok, B. (1976). Med. Biol. Eng., March, 193. Hudgins, W. A. (1974). U.S.Patent 3,858,879. Hunter, J. S., and Little, J. L. (1977). US.Patent 4,043,204. Jacobs, I. S., and Bean, C. P. (1963) Magnetism 3,272. Jeans, J. H. (1948). “The Mathematical Theory of Electricity and Magnetism,” Chapter 7, Sect. 192. Cambridge Univ. Press, London and New York. Jenkins, J. T. (1971). J. Phys. (Paris) 32,931. Jenkins, J. T. (1972) Arch. Ration. Mech. Anal. 46, No. 1,42. Johnson, C. E., Jr. (1968). US.Patent 3,510,878. Jones, T. B. (1978). I n “Thermomechanics of Magnetic Fluids” (B. Berkovsky, ed.),p. 255. Hemisphere, Washington, D.C. Jones, T. B., and Bliss, G. W. (1977). J . Appl. Phys. 48, No. 4, 1412. Jordan, P. C. (1973). Mol. Phys. 25, No. 4,961. Kaiser, R. (1969). U.S.Patent 3,483,968. Kaiser, R. (1972). U.S.Patent 3,653,819. Kaiser, R., and Miskolczy, G. (1970a). J . Appl. Phys. 41, No. 3, 1064. Kaiser, R., and Miskolczy, G. (1970b). IEEE Trans. Magn. mag-6,No. 3, 694. Kaiser, R., and Rosensweig, R. E. (1968) “Study of Ferromagnetic Liquid, Phases I1 and 111,” Rep. No. NASW-1581.NASA Office of Advanced Research and Technology,Washington, D.C. Kaiser, R., Mir, L., and Curtiss, R. A. (1976). U.S.Patents 3,951,784-5. Kaiser, R., Miskolczy, G., Curtiss, R. A., and Colton, C. K. (1971). Proc. J t . Con$ Preo. Control Oil Spills, Washington. Kaplan, B. Z., and Jacobson, D. M. (1976). Nature (London) 259,654.

FLUID DYNAMlCS AND SCIENCE OF MAGNETIC LIQUIDS

197

Keller, H., and Kundig, W. (1975). Solid Srute Commun. 16, 253. Khalafalla, S. E. (1975). Chemtech 5, 540. Khalafalla, S. E., and Reimers, G. W. (1973a). U.S.Patent 3,764,540. Khalafalla, S. E., and Reimers, G. W. (1973b). Sep. Sci. 8, No. 2, 161. Khalafalla, S. E., and Reimers, G. W. (1974). U.S.Patent 3,843,540. Kraefi, B., and Alexander, H. (1973). Phys. Kondens. Materie 16,281. Krueger, D. A., and Jones, T. B. (1974). Phys. Fluids 17, No. 10, 1831. Kruyt, H. R. (1952). “Colloid Science,” Vol. I. Am. Elsevier, New York. Lalas, D. P., and Carmi, S. (1971). Phys. Fluids 14, No. 2,436. Lamb, Sir H. (1932) “Hydrodynamics,” 6th ed. Dover, New York. Lancanshire, R B., Alger, D. L., Manista, E. J., Slaby, J. G., Dudning, J. W., and Stubbs, R. M. (1977). Opt. Eng. 16, No. 5, 505. Lerro,J. P. (1977). Des. News, Sept. 5, 46. Levine, M. B. (1977). 9th Space Simulation Con$ Los Angeles. Liu, Y. A., ed. (1976). Theory Appl. Magn. Separ. IEEE Trans. Magn. mag-12, No. 5. Mackor, E. L. (1951). J. Colloid Sci. 6,492. McNab, T. K., Fox, R. A., and Boyle, J. F. (1968). J. Appl. Phys. 39, No. 12, 5703. McTague, J. P. (1969). J. Chem. Phys. 51, 133. Martinet, A. (1977). J. Colloid lntevace Sci. 41, 391. Martinet, A. (1978). In “Thermomechanics of Magnetic Fluids” (B. Berkovsky, ed.). Hemisphere, Washington, D.C. Martinet, A. (1977b). J. Colloid Interface Sci. 41,391. Martsenyuk, M.A., Raikher, Yu. L.,and Shliomis, M.I. (1974). Sou. Phys.-JETP 38,No. 2, 413. Melcher, J. R. (1963) “Field Coupled Surface Waves.” MIT Press, Cambridge, Massachusetts. Miller, C. W.,and Resler, E. L., Jr. (1975). Phys. Fluids 18, No. 9, 1112 Miskolczy, G., and Kaiser, R. (1973). U.S.Patent 3,740,060. Miskolczy, G., Litte, R., and Kaiser, R. (1970) Ferrofluid Particle Gyro,” Tech. Rep. AFFDL TR-70-5. Air Force Flight Dynamics Laboratory, Wright Patterson Air Force Base, Ohio. Mosbach, K., and Anderson, L. (1977). Nature (London) 270,259. Moskowitz, R. (1974) ASLE Trons. 18, No. 2, 135. Moskowitz, R., and Ezekiel, F. D. (1975). 1975 SAE Of-Highway Vehicle Meet. Paper No. 750851. Moskowitz, R., and Rosensweig, R. E. (1967) Appl. Phys. Lett. 11,301. Neuringer, J. L. (1966). lnt. J . Non-Linear Mech. 1, No. 2, 123. Neuringer, J. L., and Rosensweig, R. E. (1964) Phys. Fluids 7, No. 12, 1927. Nogita, S., Ikeguchi, T., Muramori, K.,Kazama, S.,and Sakai, H. (1977). Hitachi Rev. 26, No. 4, 139. Olah, E. E. (1975). US.Patent 3,863,249. Pappel, S. S. (1965). U.S. Patent 3,215,572. Pappel, S. S., and Faber, 0. C., Jr. (1966). N A S A Tech. Note I)-3288 Penfield, P., and Haus, H. A. (1967). “Electrodynamics of Moving Media.” MIT Press, Cambridge, Massachusetts. Perry, M.P. (1978). In “Thermomechanics of Magnetic Fluids” (B. Berkovsky, ed.),p. 219. Hemisphere, Washington, D.C. Perry, M.P., and Jones, T. B. (1976). IEEE Trans. Magn. mg-12,798. Persson, N. C. (1977). Des. News, April 18. Peterson, E. A., and Krueger, D. A. (1978). J . Colloid Interface Sci. (in press) Popplewell, J., Charles, S. W., and Chantrell, R. (1977). Energy Conuers. 16, 133. Rabenhorst, D. W. (1972). U.S. Patent 3,682,518.

198


Rabenhorst, D. W. (1975). Energy Sources 2, No. 3, 251. Rabinow, J. (1949). J . Franklin Inst. 248, 155. Resler, E. L., Jr., and Rosensweig, R. E. (1964). AIAA J . 2,No. 8, 1418. Resler, E. L.,Jr., and Rosensweig, R. E. (1967). J . Eng. Power A3 89,399. Romankiw, L. T., Slusarczuk, M. M. G., and Thompson, D. A. (1975). IEEE Trans. Magn. mag-11,No. 1, 25. Rosensweig, R. E. (1966a). Int. Sci. Technol. 55, 48. Rosensweig, R. E. (1966b). AIAA J . 4, No. 10, 1751. Rosensweig, R. E. (1966~).Nature (London) 210,613. Rosensweig, R. E. (1969). U.S.Patent 3,483,969. Rosensweig, R. E. (1970). US.Patent 3,531,413. Rosensweig, R. E. (1971a). “Ferrohydrodynamics.” Encycl. Dictionary Phys., Suppl. 4, 411. Pergamon, Oxford. Rosensweig, R. E. (1971b). US. Patent 3,620,584. Rosensweig, R. E. (1973). US.Patent 3,734,578. Rosensweig, R. E. (1974). U.S. Reissue Patent 27,955 (Original No.3,612,630 dated October 12, 1971) Rosensweig, R. E. (1975). U.S.Patent 3,917,538. Rosensweig, R. E. (1977). Japanese Letters Patent 862,559. Rosensweig,R. E.(1978). In “Thermomechanics of Magnetic Fluids” (B. Berkovsky, ed.),p. 231. Hemisphere, Washington, D.C. Rosensweig, R. E., and Kaiser, R. (1967). “Study of Ferromagnetic Liquid. Phase I,” Rep. No. NASW-1219. NASA Office of Advanced Research and Technology, Washington, D.C. Rosensweig, R. E., and Resnick, J. Y. (1972). U.S. Patent 3,648,299. Rosensweig, R. E., Nestor, J. W., and Timmins, R. S. (1965). Mater. Assoc. Direct Energy Convers., Proc. Symp. AIChE-I. Chem. E . Ser. 5, 104. Rosensweig, R. E., Litte, R., and Gelb, A. (1966). Proc. 4th Symp. Unconventional Inertial Sensors, Washington, D.C., Avco Corp. Report No. AVSSD-0291-66-PP. Rosensweig, R. E., Litte, R., Miskolczy, G., and Pellegrino, J. J. (1968a).“ FHD Sensor Develop ment,” AFFDL-TR-67-162. Air Force Flight Dynamics Lab., Wright Patterson Air Force Base, Ohio. Rosensweig, R. E., Miskolczy, G., and Ezekiel, F. D. (1968b). Mach. Des. 40, 145. Rosensweig, R. E., Kaiser, R., and Miskolczy, G. (1969). J . Colloid Interface Sci. 29,No.4,680. Rosensweig, R. E., Zahn, M., and Vogler, T. (1978). In “Thermomechanics of Magnetic Fluids” (B. Berkovsky, ed.), p. 195. Hemisphere, Washington, D.C. Roth, J. R., Rayk, W. D., and Reiman, J. J. (1970) NASA Tech. Memo 2106. Sabelman, E. E.(1972). NASA, Jet Propulsion Laboratory, S/N235,295,Pasadena, California. Saffman, P. G., and Taylor, G. I. (1958). Proc. R. SOC.London,Ser. A 245,312. Sargent, R. W. (1976). U.S.Patent 3,935,571. Schmieder, R. W. (1970). U.S. Patent 3,516,294. Schmieder, R. W. (1972). Nucl. Instrum. & Methods 102,313. Scholander, P. F., and Perez, M. (1971). Proc. Natl. Acad. Sci. U.S.A. 68, 1093. Scholten, P. C. (1978). In “Thermomechanics of Magnetic Fluids’’ (B. Berkovsky, ed.), p. 1. Hemisphere, Washington, D.C. Sharma, V. K., and Waldner, F. (1977). J . Appl. Phys. 48, No. 10, 4298. Shepherd, P. G., Popplewell, J., and Charles, S.W. (1972). J. Phys. D 5, 2273. Shliomis, M. I. (1972). Sou. Phys.-JETP (Engl. Transl.) 34, No. 6, 1291. Shliomis, M. I. (1974). Sou. Phys.-Usp. (Engl. Transl.) 17,No. 2, 153. Shubert, R. H.(1975). U.S.Patent 3,926,789. Stratton, J. A. (1941). “Electromagnetic Theory,” p. 116. McGraw-Hill, New York.


199

Stripling, W. W., White, H. V., and Hunter, J. S. (1974). U.S. Patent 3,839,904. Styles, J. C. (1969). U.S. Patent 3,439,961. Styles, J. C., Tuffias, R. H., and Blakely, R. W., Jr. (1973). U.S. Patent 3,746,407. Taylor, G. I., and McEwan, A. D. (1965). J. Fluid Mech. 22, 1. Thomas, J. R. (1966). J . Appl. Phys. 37, 2914. Turbeville, J. E. (1973). Enuiron. Sci. Technol. 7 , No. 5 , 433. Verwey, E. J. W., and Overbeek, J. Th. G. (1948). “The Theory of the Stability of Lyophobic Colloids.” Am. Elsevier, New York. Weinstock, R. (1976). Am. J . Phys. 44, No. 9, 392. Winkler, H., Heinrich, H.-J., and Gerdau, E. (1976). J . Phys. (Paris), C 6, Suppl. 12, 261. Wohlfarth, E. P. (1959). Adu. Phys. 8, 87. Wolfe, R., and North, J. C. (1974). Appl. Phys. Lett. 25, No. 2, 122. Yakushin, V. I. (1974). Magnetohydrodynamics (Engl. Transl.) No. 4, p. 19. Yorizane, M.,and Tozawa, 0. (1976). Bull. Jpn. Pet. Inst. 18, 183. Zaitsev, V. M.,and Shliomis, M.I. (1969). Dokl. Akad. Nauk SSSR 188, 1261. Zelazo, R. E., and Melcher, J. R. (1969). J . Fluid Mech. 39, 1. Zimmels, Y., Tuval, Y., and Lin, I. J. (1977). IEEE Trans. Magn. mag-13, No. 4, 1045.


The Edelweiss System J . ARSAC,* CH. GALTIER.t G. RUGGIU.? TRAN VAN KHAI.t AND J . P. VASSEURt$ I. General-Purpose Operating System ............................................... 202 I1. New Trends in Computer Architecture ............................................ 203 A. Von Neumann Architecture ................................................... 203 B. Syntax-Oriented Architecture .................................................. 204 C. Indirect Execution Architecture ................................................ 204 D. Direct Execution Architecture ................................................. 204 E. Choosing the High-Level Language ............................................ 204 111. New Trends in Programming ..................................................... 205 A. Empirical Programming ....................................................... 205 B. TopDown Programming ..................................................... 205 C. Control Structures ............................................................ 206 D. Program Manipulation ........................................................ 206 IV. Principles of m~................................................................. 207 A. Description of EXBL ........................................................... 207 B. Procedure Calls ............................................................... 211 C. Example ...................................................................... 213 D. Systems of Regular Program Equations ........................................ 217 E. Programming Methodology ................................................... 222 F . Is EXEL a GO-TO-less Language? .............................................. 223 V. Large-Scale Systems: The EDELWEISS Architecture ................................. 223 A. Description of EDELWEISS ...................................................... 223 B. The Working Set in the ~DELWEISSSystem ..................................... 232 C. Analytical Model of EDELWEISS System ......................................... 244 VI. The Single-User Family ........................................................... 256 A. Description of E X E L E ~....................................................... 256 B. Operating e w L m ........................................................... 258 C. Internal Management of EX EL^ ............................................. 258 D. Three-Processor Implementation .............................................. 262 Appendix A ...................................................................... 263 Appendix B ...................................................................... 264 Appendix C ...................................................................... 267 References ........................................................................ 269

Institut de Programmation. Universite Paris VI. Paris. France. Laboratoire Central de Recherches. Orsay. France. f Present address: Thomson-Brandt. 173 Boulevard Haussmann. B.P. 700-08. 75360 Paris Cedex 08. France.

t Thomson-CSF.

Copyright 8 1979 by Acdemic has, lnc M'tighla of reprodudion in any form r s m c d . ISBN 412-014648-7

202

J. ARSAC ET AL.

I. GENERAL-PURPOSE OPERATING SYSTEMS Most operating systems have been designed to take charge of all possible programs, written in all possible languages. Programmers are not concerned with problems of memory management: programs are written without any consideration of memory requirements or storage resources. The concept of virtual memory is the fundamental issue: first, programs are translated by compilers for a quasi-infmite memory, using logical addresses; then they are translated into physical addresses before execution. Implantation decisions are taken statically at load time before starting program execution. The program and its data are loaded, and space is assigned as working area to the program, so that it can be run for a long time (compared with the time needed for loading). This is possible only if there is a huge central memory in the computer. Fragmenting the program and its data so that a coherent part of it may be loaded and run for a long enough time is possible only if indications have been given by the programmer (overlays). Another way is to take implantation decisions dynamically at run time. This is possible if there exists some hardware mechanism that maps logical addresses onto physical ones. The central memory is split into pages, and only a small number of them are assigned to a program at each time. The hardware mechanism detects the fact that some logical addresses cannot be translated into physical addresses because the corresponding page has not been assigned to the program and may be not loaded from disks. Execution is then interrupted, and decisions are taken by the operating system to find a free page in core memory, assign it to the program, and if necessary, load its contents from disks. It may be necessary to save the previous contents of this page on disks before assigning it to the program. Thus, for each page fault, one or two disk accesses may be made, giving a very important overhead. If the execution time between two page faults is not long compared with disk access time, the program execution time is considerably increased (system thrashing) (Denning, 1968). Several strategies have been proposed to avoid this phenomenon, such as replacing the page that has been least recently used (algorithm LRU) (Hansen, 1973). It has been shown that whatever the program, the set of pages needed for the execution of a small part of it (working set) (Denning, 1968) will have a lifetime longer than the execution time of this part, and more or less independent of the program. The following orders of magnitude may be given : time needed for the execution of one machine statement: lop6sec lifetime of a working set: sec disk access time: lo-’ sec

THE EDELWEISS SYSTEM

203

Operating systems may be improved if the working-set lifetime is increased (while disk technology does not allow faster access). This lifetime is a general statistical property of programs. If working sets are derived from considerations of program semantics, they may be made such that they include all the statements of a repetitive part of the program (some loops) and the corresponding data and working area, and so have a much longer lifetime. This is exactly the same thing as fragmenting the program into smaller segments or atoms for consistent execution. Thus, both static and dynamic memory allocation lead to the same idea: every possibility of splitting a program into segments will improve performances of the operating system. Unfortunately, this is a difficult problem. It cannot be solved easily for classical languages as FORTRAN. Adding a sophisticated program to the operating system is not a solution; the time saved from page management will be lost in program processing. Moreover, such a program is not independent of the language in which the processed program is written. It can be thought of only if a unique language is used, designed in such a way that program segmentation is simple.

11. NEWTRENDS IN COMPUTER ARCHITECTURE With large-scale integration, the cost of hardware has been drastically reduced and the realization of specialized computer architecture has been considerably simplified. A computer may be specialized to accept only one high-level language, and so take into account every facility of this language. Four kinds of high-level language computer architectures have been recognized by Chu (1975). A. Von Neumann Architecture

This is a classical architecture, based on a central memory made of addressable cells in which the program is stored. There is no direct connection between the high-level language and the machine language except the one coming from the history of computer development. Most of the highlevel languages (FORTRAN, ALGOL, COBOL, m/1, etc.) have been designed as more or less powerful abbreviations of machine languages, using the same universal concepts-variables, assignments, labels, GO TOs, conditional jumps, subroutines, etc. The compatibility between the high-level language and the machine language is provided by a compiler. Specializing such an architecture to a single language simplifies the operations of operating systems: they do not have to manage a library of compilers and run-time environments. These systems do not fundamentally differ from general-purpose systems.

204

J. ARSAC ET AL.

B. Syntax-Oriented Architecture This is still an architecture based on von Neumann machines but with some specialization of the machine code for lexical analysis-processing of character strings, Polish notation of operators, etc. With such machines, the part of software is reduced, but there is still an important distance between the high-level language and the machine language. For instance, the user’s program is written with parentheses and translated into Polish notation. C . Indirect Execution Architecture

Such a system uses still two languages: the external high-level language, and the internal machine language. But the distance between these languages has been considerably reduced by taking the internal language closer to the external one. Most of the translation operations between these languages are made by hardware (SYMBOL system) (Rice and Smith, 1971). These systems differ from von Neumann architecture because they do not start from a machine architecture on which a high-level language is mapped through more or less sophisticated compilers. The high-level language is first given, then the machine architecture is designed, so that the internal language will be close to the external one. D. Direct Execution Architecture

In these systems, there exists only one language, the high-level language, directly executed by hardware. No translation is made, and a program is stored without any modification, which greatly simplifies debugging (Chu, 1972).

E. Choosing the High-Level Language Most of the high-level languages being presently used have been designed to be easily translated into machine language of von Neumann machines. They use assignments [except LISP (McCarthy, 1960) and some very new assignment-free languages (Arsac, 1977; Ashcroft and Wadge, 197511, GO TO statements, and procedures considered as sequences of statements written separately to reduce program length, or even compiled separately to economize compile time. As far as other architectures are concerned, there is no reason why such consideration should remain while designing a high-level programming language. It is made for description of algorithm, not for compatibility with


205

a machine architecture. Therefore, it must take into account the new trends in programming. 111. NEWTRENDS IN PROGRAMMING A. Empirical Programming

Dijkstra has been the first to insist on the poor state of the art in programming [Notes on structured programming (Dijkstra et d., 1972) first published in 1969 as a publication of the University of Maryland]. A symposium in Monterey (Goldberg, 1973)has discussed the high cost of software and gives poor programming techniques as the main reason for such a cost. The usual programming methodology may be characterized as follows: ( i ) Program design. The problem is analyzed and a project of program built, generally represented by flowcharts. Then a program is written. ( i i ) Program debugging. The program is compiled. Syntactic errors and maybe some very simple semantic errors are detected by compilers. They are corrected until finally the program is accepted by the compiler. The program is run on test data, and dumps are requested when bugs are detected at execution time. Some anomaly is localized in the dump and prevented by code modification. This process is iterated until a correct result is obtained for test data It does not prove that the program is correct (Dahl et al., 1972),but only that no more bugs can be detected from these data. Software unreliability is the direct consequence of this methodology. More structured or systematic programming (Wirth, 1973)must be used.

B. Top-Down Programming Instead of describing the whole program architecture by flowcharts, exhibiting very precisely what is only implementation detail (the G O TO of the program), its structure must be given in terms of global actions, which will be refined later. A number of implementation problems may thus be delayed. This is an important point. In early stages of the development, attention is paid only to the main actions, which can be checked carefully. Decisions of implementation are taken later, when a good knowledge of what has to be done has been collected. For instance, in the early stages, we shall speak of a stack, on which elements will be pushed or from which they may be popped. Later, when a sufficient knowledge of the structure of the working space will be obtained, it will be decided that the stack is represented as a vector, or a linked list of cells.

J. ARSAC ET AL.

206

Top-down approach may be used with every language. It is just a discipline of programming. It is recommended even for FORTRAN programmers (Ledgard, 1975). In this case, it appears in intermediary stages of the programming process and will not be very apparent in the final program. A good programming language should facilitate top-down programming, allowing the programmer to describe its algorithm in terms of actions, whose refinements are given later. C . Control Structures

The GO TO statements have a strong negative effect on program readability (Dahl et al., 1972). They are used for three main reasons: (1) selecting sequences of statements according to the result of predicate evaluation, (2) making loops, (3) avoiding copies of parts of programs.

Clarity is gained by using a selection statement:

IF...

THEN...

ELSE...

FI

in the first case, and some loop statement in the second case. The third case is more ambiguous; if it corresponds to some loop, it will be better to use a loop. If not, there is a real danger that two sequences which have been merged by use of GO TO are not really identical, giving bugs difficult to detect and more difficult to prevent. A lot of loop statements have been proposed (Knuth, 1974; Ledgard and Marcotty, 1975) for GO TO elimination, and their ability to represent every flowchart has been widely discussed (a summary of these discussions has been given in Kosaraju, 1974). D. Program Manipulation

Manipulating a program is now considered a normal way to improve it (Knuth, 1974; Loveman, 1977; Standish et al., 1976). It is made by applying successive transformations that preserve program meaning. They may be divided into two classes: (i) Syntactic transforms. These preserve history of computation. They do not modify the sequence of assignments and test evaluations, and so the corresponding execution time. They may act on the overhead introduced by action interpretation in the system.


207

(ii) Semantic transforms. We consider only local semantic transforms, which depend on local program property. For instance, may be changed into a(j):=y;

a(i)= x

+1

if and only if i # j

These transforms change the number of assignments or test evaluations, and so act on the execution time. The EXEL language described below provides a good system for program manipulation. IV.

PRINCIPLES

OF

EXEL

The system EDELWEISS is designed for the use of a single language named (stands for Experimental Language). EXEL is described in N o h and Ruggiu (1973) and Arsac (1974). Here will be given a presentation of its principles and some examples illustrating its capabilities. EXEL

A. Description of E X E L EXEL is a GO-TO-less control structure language. It may describe any control structure, either flowchart type or recursive. In EXEL, three different hierarchical levels exist: formulas, actions, and procedures.

1. Formulas

A formula is a sequence of operands, operators, and assignments. A formula is not expressed in EXEL, but in any convenient computational language (BASIC, FORTRAN, APL, etc.). This language is called in EXEL terminology the formula language. In the general philosophy of EXEL, operands are structured data. Operators perform operations on these operands. These operations may be the usual arithmetic ones-addition, multiplication, logarithms, circular functic]ns, etc. They may also be operations related to the structure of the operand (transposition, global operations on arrays, tree scanning, etc.). In general, the set of available formula operators depends on the data structures that may be described in the formula language. EXEL control structures describe the order in which formulas are executed. The various possible choices of the formula language lead to as many different EXEL systems: EXEL/BASIC, EXEL/FORTRAN, EXEL/APL. A formula describes a sequence of operations leading to a result; in an

208

J. ARSAC ET AL.

program, no control transfer may occur inside a formula. Thus GO TOs or IFs or similar branching or test instructions are not to befound inside a formula.

EXEL

2. Actions In order to give a feeling of what an action is, one can say that an action is a portion of a flowchart having only one entry point and one exit point. Names are given to actions. Actions are sequences of formulas and calls to other actions or procedures, linked together by EXEL control structures. Before giving a proper definition of what an action is, it is necessary to introduce the EXEL operators. a. Composition. Iff and g are two formulas, the flowchart

is interpreted as: computef, then compute g. This will be written in EXEL as fog

.

The EXEL " " is equivalent to the ALGOL ";". b. Alternation. If t , f , g are three formulas, and if t evaluates to a Boolean value (true, false), the flowchart

--Iis interpreted as: compute t ; if the result is true, computef; else compute g. Proceed after this. This will be denoted in EXEL as ct-bfOg3


209

Or, if one does not like symbols, IF t THEN f ELSE g FI Semantic extensions of the alternation operator are possible. For example, multiple alternations as

t must in this case evaluate to an integer i ; if 1 I i I n, computef;, else computef.. The first formula in an alternation ( t in the examples above) is called a test formula. c. Iteration Operator. Composition and alternation combined with action call have been shown to be sufficient to describe any program (Ruggiu, 1974). However, a redundant iteration operator has been added to EXEL, both for theoretical reasons and programming convenience. - are used to describe this operator. Three symbols “ {,” “},” and “EXIT” On occurrence of a “ },” control comes back just after the corresponding “ {.” This goes on until one EXIT is met. Then control is given after the “ 1.’ “ { ” and “ } ” may benested in a parenthesis structure, thus allowing definition of several levels of iteration. For this purpose, symbol EXIT is followed by a positive integer n ; EXIT n gets control out of n iteration levels. EXIT 1 is equivalent to EXIT. For abbreviation, EXIT n can be written as a”. EXIT n is defined for n 2 1. During program manipulations, operations may occur on this n (for example, operation + 1-see Section IV,C). It has been found a natural extension of this notation to allow also EXIT 0. This notation can be interpreted as a “do nothing” or an empty formula. (In some cases, the symbol R will be used as an EXIT 0 for sake of clarity). In program manipulations, EXIT 0 is treated exactly the same way as EXIT n, n 2 1. d. Action DeJnition. The three operations: composition, alternation, and iteration can be recursively composed, leading to sophisticated sentences:

f,

.c,+

O { f ~ . c , + E X I T O { f ~ .c Y + Of4.EXIT2fS.EXITl x } . f 6 I}

3

Such sentences are called actions; the formulas (fl, fz , ...) are atomic actions.

J. ARSAC ET AL.

210 3. Procedures

Actions are grouped in procedures. An EXEL procedure has three distinct parts : a header. It defines a result list, a data argument list or data list, and the name of the procedure. a body. It defines a set of actions with their names. The order of definition of the actions is irrelevant. an entry point. It gives the name of the first action to be executed. The Backus Normal Form (BNF) description of thejprocedures is as follows:

(header) :: = V((resu1t list))(procedure name)((data list)) The data list and the result list are lists of identifiers separated by commas. They are the formal parameters of the procedure. For example, (X1,Yl,Z)FON(F 1,A,X1,T) This header defines the procedure FON which has four data arguments (Fl,A,Xl,T) and three results (Xl,Yl,Z). Data arguments can be either variable type or procedure type. Result arguments are only variable type. (body):: = (action) 1 (body)(action) (action):: = (action name)+(action body)+ (entry point):: = (action name)V (procedure):: = (header)(body)(entry point) The (action body) is defined according to the rules given in the previous sections. At this point, two new atomic actions appear: the action calls and the procedure calls. (atomic action) :: = (formula) I (action call) I (procedure call) (action call):: = (action name) (procedure call) :: = ((result list))(procedure name)((data list)) The action calls define the control of the computations between the different actions of a same procedure. For example, let us consider the two actions: A l + c , + g . A1 OA2 I+ A2kh-1 A1 defines the following computation: while t is true, compute g; when t becomes false, compute A2. A2 means compute h.


21 1

Likewise, the procedure calls define the control between the different procedures of a same program, a program being defined by a set of procedures. The execution of the program itself is defined by a procedure call which starts the computation. Actions look like parameterless procedures on global variables. ALGOL copy rule may be used. There is no renaming, all the variables being global to the actions of a same procedure. Thus, an action call may always be replaced by the sequence of statements defining it. This is exactly the mechanism of substitution in mathematics: a variable name is replaced by the expression associated to this name. This has interesting applications in the field of program transformation. It has been seen that actions know only global variables. It goes the opposite way at the procedure level. All variables in a procedure are local to this procedure. This prevents undesirable side effects and collisions of names which are usual in procedure-type languages. There is however one exception to this rule. In order to give EXEL the ability to handle files without having to duplicate them in memory, the user may define global variables. They must be declared before execution and their names are syntactically differentiated. In the currently implemented systems, file names start by the character “0.” They are the only global variables to be found in the programs.

B. Procedure Calls 1. Syntax

The procedure calls are atomic actions. They may be used as regular formulas or to compute an operand inside a formula. In this case, the special character “ * ” indicates which result is to be used. A

+

A

+ (B,*,C)FON(G,A,C,D)

where FON is the procedure, the header of which has been described before, stands for the sequence:

.

(A,VT,C)FON(G,A,C,D) A + A

+ VT

VT is an auxiliary variable used for composition of the procedure call and the formula which follows it. Likewise actual parameters may be formulas: (A[3

+ I],B,D)FON(G,A + B,H(C),D)

212

J. ARSAC ET AL.

2. Argument Transfer and Evaluation a. Data. The computation rule of procedure calls is as follows. At the procedure call, a new context is defined by the global variables and the actual parameters. The point is that these actual parameters are evaluated only when they are needed during the execution of the called procedure. Therefore the context is recursively defined by the variables occurring in the actual parameters. This rule is consistent with the /3-reduction of the bcalculus; so it is correct, and like the delay rule (Vuillemin, 1974), it is generally optimal. b. Results. Generally the result of procedures are array-structured variables. They can be indexed by expressions. These expressions are simultaneously valuated: they are logically computed in parallel and in the context of the calling procedure before the call of the called procedure. There is no side effect nor ambiguity when the same variable occurs several times as a result of the call. For example, let the header of F1 be

(X,Y)F1(- - -) Then the call (A[1 2],A[2 3])Fl(---)

defines A if the values of X and Y of F1 satisfy the relation

X[2] = Y[1] in that case, after the call, A is a three-element vector, the value of which is

A = x[1lJ[2I;Y[21 If X [ 2 ] # ql],the call will generate an error. 3. Type Expression

In EXEL, the procedures and the variables are typed. These types define the functionality of the objects. An order relation is defined between the types: roughly, a procedure PIhas a type higher to that of P2if PIhas more data arguments than P2 or if these arguments have types higher than the types of the corresponding arguments of P2.In EXEL a procedure call is meaningful if the types of the real parameters are higher or equal to the formal parameters. This rule is more general than the usual rule which states that the types must be equal. It is a very powerful mechanism, allowing one to write procedure expressions. When the type of a real parameter is strictly higher than the corresponding formal parameter, this is called a type extension.


213

Example Let the procedure DER which computes the derivation of functions of one variable be

( )DER(F, x)+f(x)/dx where F is the procedure which computes the function f. The procedure DER can be directly called to compute the partial derivations of functions of two variables. Let g be a function of two arguments:

In the second call, ( )DER(g, 6), there is a type extension since g is a function of two arguments, whilefhas one argument. As a matter of fact, this call defines an auxiliary procedure that computes the partial derivation ag(x, 6)/8y, the derivation in x of which is computed in the first call. C. Example

An example of development of an EXEL program is given below. First the problem is stated in English, and the actions to be taken are expressed informally. Then they are stepwise replaced by formal EXEL/ALGOL-like programming. The program is a sorting problem. Let a[l : n] be a vector of n elements to be sorted in ascending order. The following program sorts it. Assertions are written between quotes. Sort

I-

“Only permutations have been made on a ” look for an i : a(i) > a(i

+ 1) ; reorder4

Reorder + IF none THEN R ELSE swap (a(i),a(i

+ 1)). Sort FI+

Reorder decreases the number of inversions in a (pairs of consecutive elements in reverse order) by one, so that the program stops. i2 is the empty formula. In this particular case, when control reaches 0,it goes after the FI, and there is nothing else to do. R is said to be injinal position (this term is more formally defined in Section IV,D,l). Thus, an R in final position acts as a STOP instruction. The program is made of two actions-Sort and Reorder-to be refined at the various stages of its development. Now, let us develop our Sort program a little more. The action look for “

J. ARSAC ET AL.

214

an i : a(i) > a(i + 1)”is not entirely defined. What i must be selected, if not unique? We decide to take the first one:

Sort + look for the first i : a(i) > a(i + 1). “Sorted(1j)” Reorder -I Reorder

IF none THEN R ELSE swap(a(i),a(i+ 1)). Sort FI

The assertion “Sorted(1j)” indicates the fact that for every j: (We selected the first i : a(i) > a(i + 1)) 1 Ij < i , a(j) I a(j + 1) The effect of Reorder on this assertion is considered. Assuming none is false, (swap(a(i),a(i+ 1)) leaves a(l : i - 1) unchanged, so that it remains sorted (if it is not void, that is to say i > 1). Thus, i > 1 ==- Sorted(1,i - 1)

+

Now,if i = 1, swap(a(i),a(i 1))gives a sorted part from 1 to 2 = i i = 1 =$ Sorted(1,i + 1). Assertions are put in Reorder: Reorder I-

“

+1

Sorted(1j)”

IF none THEN R ELSE swap(a(i),a(i

+ 1)).

“Sorted(1, IF i = 1 THEN i + 1 ELSE i - 1)”

Sort FI -I

Sort is replaced by its value in Reorder. Reorder I- IF none THEN R ELSE swap(a(i),a(i + 1)). “Sorted(1, IF i = 1 THEN i + 1 ELSE i - 1)’’

look for the first i : a(i) > a(i + 1). Reorder

FI -I We refine the action a little more: look for the first i -... It is made by starting from some initial value, then scanning a until an inversion is found. Look for the first i : a(i) > a(i + 1) I- initialize i ; scan a until a(i) > a(i + 1) -I

T H E EDELWEISS SYSTEM

215

The initialization is not the same in Sort and Reorder. In the action Sort, we do not have any information on a, hence the scan must be started from the beginning.

.

Sort I- i := 1 ; scan a until a(i) > a(i + 1) Reorder --I Reorder

I-

IF none THEN R ELSE swap(a(i),a(i+ 1)).

+ 1 ELSE i:=i - 1 FI. scan a until a(i) > a(i + 1) IF i = 1 THEN i:=i Reorder

FI -I In the action Reorder, we know that a is sorted from 1 to a given value, so that it does not have to be scanned for an inversion in this part. The same sequence occurring in Sort and Reorder, a new action is introduced: Sort

I-

Scan

I- scan

i:=1 ; Scan4

Reorder

I-

a until a(i) > a(i + 1). Reorder -I

IF none THEN R ELSE swap(a(i),a(i+ 1 ) ) .

IFi=lTHENi:=i+lELSEi:=i-lFI. scan FI + Now, the action scans a until a(i) > a(i + 1) is refined: Scan I- IF j = n THEN Reorder

ELSE IF a(i) > a(i + 1) THEN Reorder ELSE i:=i

FI FI --I

+ 1 . Scan

J. ARSAC ET AL.

216

The predicate " none " may be taken as i = n. Reorder is replaced by its value into Scan. IF i = n THEN R

Scan I- IF i = n THEN R

ELSE ... FI IF a(i) a(i + 1)

ELSE

THEN IF i = n THEN R ELSE swap(a(i),a(i + 1)). IF i = 1 THEN i:=i

ELSE i : = i

+1

-

1 FI

.

SCan

FI

ELSE

i := i

+ 1 . Scan

FI FI --I The value of inner predicates i = n is known: in the THEN alternate of the first selection, the value is TRUE. The selection is removed, and only the TRUE alternate is used. The same thing is done for the ELSE alternate.

Scan I- IF i = n THEN R ELSE IF a(i) > a(i

+ 1)

THEN swap(a(i),a(i + 1)). IF i = 1 THEN i:=i

+ 1 . Scan

ELSE i:=i - 1 . Scan FI

ELSE i : = i

+ 1 . Scan

FI FI -I

A new action Advance

I-

i:=i

+ 1 . Scan+


217

is introduced. The action scan being much too intricate, the test on i = n is set apart by introduction of a new action. So we have: Sort+.i:=l.Scan-r Scan

I-

Test

I-

IF i = n THEN SZ ELSE Test FI + IF a(i) > a(i + 1)

+

THEN swap(a(i),a(i 1)). IF i = 1 THEN Advance ELSE i:=i

-

1 . Scan

FI ELSE Advance FI + Advance I- i:=i + 1 . Scan+ We notice that Test is called only if i # n. After decrementation of i, this property remains true, so that in Test, Scan is called with i # n, and the predicate i = n of Scan has the value false. Thus, Scan may be replaced by its ELSE alternate in Test, giving: Sort+i:=l.Scan+ Scan I- IF i = n THEN SZ ELSE Test FI --I Test

I-

IF a(i) > a(i + 1)

THEN swap(a(i),a(i+ 1)).

IF i = 1 THEN Advance

.

ELSE i := i - 1 Test FI ELSE Advance FI + Advance I- i:=i

+ 1 . Scan+

D. Systems of Regular Program Equations

We will first try to give an informal presentation of the underlying idea. If we look at the three actions Sort, Scan, and Test of the previous example, we find the action names both in definition position and in the action body itself. If we consider these names as variables, we can think of the whole program as defining a system of equations, the variables of which are the

J. ARSAC ET AL.

218

action names. Note that one action, defined as the entry point, plays the role of the program itself. Now, if we give ourselves a set of algebraic rules to manipulate the various elements of these equations (formulas and EXEL symbols or keywords), we can eliminate the variables and end up with only one action which will be the program itself. Of course, dependingon the order in which we perform these substitutions, we may end up with various shapes for the program. But the idea is to do only transformations that keep the logical function of the programs. In order to be more precise, we will need some definitions.

1. Definitions and Scope of Application For the sake of clarity and to avoid being too technical, we will suppose that the program we start with does not have any iteration operators but only action calls. This will allow us to give only the definitions needed for understanding of the paper. Final position. The final positions of:

.

a b ” are those of “ b ” ‘ ‘ c t - t a l ()a, - - - ()a,

“

3

”

are those of “a,,” “a,,” ---, and “a,,.”

Also, the EXIT n occurring in p nested iterations are in final positions if and only if p In. Regular action. An action is regular if and only if all the action calls in its body are in final position. For example,

.

ACTl I- f 1 c t + ACTl 0 ACT2 3-1 is regular;

ACT3t-f2.ct-+ACTlOACT2~.f3-i is not regular. (fl,f2,f3, t are formulas; ACTi are action calls). Systems of regular equations. An equation is said to be regular if the correspondingaction is regular. Many programs may be defined by a system of regular program equations, as in the preceding example. Let it be schematized as Xi =A@,, ..., X,,Q),

i = 1,2, . . ., n

Solving this system is eliminating all action variables except the one which is distinguished as the program, i.e., the entry point. Substitution rules in systems of regular equations. There are different substitution rules, depending if the action name occurs in its definition or not.


219

Rule (a).If the action X i is such that X i does not occur inf, ,thenf, may be substituted for X i in all other equations. The equation X i =f, is then removed, giving a new equivalent system with one less equation. Rule (b). If X i occurs in f,, substitution will give new calls of X i . Hence this variable cannot be eliminated by a simple substitution. Let Xi = f , ( X i , Xj+i Q)

be such an equation. It has been proved (Arsac, 1977)that the following rule transforms this equation into an equivalent one: in J , X i is replaced by EXIT (0)

X j p i is replaced by the conventional notation X j + 1 R is also replaced by R

+1

then, the resulting formula is enclosed between iteration brackets:

X i = {J[Xi/EXIT (0),X j + i / X j + 1, R/Q

+ 11)

By repeated use of this rule, a variable X may occur in an equation as with an arbitrary positive increment k. Replacing X by X + 1 in X + k gives X + 1 + k = X + (k + 1). Now we must explain the notation X + n, i.e., what is this operation " + "? When substituting X i by J [rule (a)], the effect of + n is to transform all the EXIT p ( p 2 0) which are in final position, into EXIT (p + n). An R in final position being equivalent to an EXIT (0), it becomes EXIT n. Rule (b) and " + " operation allow replacement of the equation

X

+k

Xi = f , ( X i , Xj+i

9

a)

by a new equivalent equation without occurrence of X i . After this, X i may be eliminated by substitution. Using this rule, the whole system may be replaced by a unique equation X I = 4 which is the wanted program. And, which is quite interesting, both rules and the associated " + " operation can be performed in an automatic way, i.e., by a program. 2. Example This method is applied to the system obtained in Section IV,B: Sort+ i : = l . S c a n +

Scan I- IF i = n THEN R ELSE Test FI 4

J. ARSAC ET AL.

220

+

Test I- IF a(i) > a(i + 1) THEN swap(a(i),a(i 1)). IF i = 1 THEN Advance ELSE i :=i

-

.

1 Test

FI ELSE Advance FI -I Advance I- i :=i

+ 1 .Scan

-I

Test occurs in the equation defining Test. The right-hand formula is enclosed between loop brackets, Advance is replaced by Advance + 1, Test by EXIT (0) [rule (b)]: Test

I-

+

{IF a(i) > a(i + 1) THEN swap(a(i),a(i 1)). IF i = 1 THEN Advance + 1

.

ELSE i :=i - 1 EXIT (0) FI ELSE Advance

+1

FI} + Advance is replaced by its value [rule (a)]. By doing so, Advance + 1 is changed into i :=i 1 Scan 1 Then Test is copied into Scan [rule (a)].

+

.

+ .

Scan I- IF i = n THEN R ELSE

{IF a(i)

> a(i + 1) THEN swap(a(i),a(i+ 1)). IFi=1 THEN i :=i + 1 . Scan + 1

.

ELSE i :=i - 1 EXIT (0) FI ELSEi:=i+l.Scan+l FI)

221


There are occurrences of Scan in the right-hand member, so rule (b) applies. The body of Scan is enclosed between loop brackets; Scan is replaced by EXIT (0), Scan + 1 leads to EXIT (0) + 1 = EXIT 1 = EXIT. Rule (b) states that only EXIT n in final position are incremented. The R in the first line is in h a 1 position, hence it becomes EXIT. The EXIT (0) in the ELSE part of the innermost IF is not in final position, so it is not incremented. Being equivalent to an empty formula, it is simply dropped. EXIT (0)is removed from the result: Sort I- i := 1

.

(IF i = n THEN EXIT ELSE {IF a(i)

> a(i + 1) THEN swap(a(i),a(i + 1)). IFi=1 THEN i : = i ELSE i:=i

+ 1 . EXIT

-

1

FI ELSE i :=i

+ 1 .EXIT -

FI}

FI}--I So we have a program Sort, made of two nested loops, that has been mechanically derived from the system of program equations. It may be made a little clearer by simple syntactic transforms (Arsac, 1977; Cousineau, 1977a). They are given here without formalism or justification, being considered as “obvious.” Instead of doing i :=i + 1 then going out of the loop, the loop is first exited, then i := i + 1 is performed.

Sort + i I= 1

.

{IF i = n THEN EXIT FI

.

{IF a(i) 5 a(i + 1) THEN EXIT FI

.

swap(a(i),a(i+ I)). IF i = 1 THEN EXIT ELSE i :=i - 1 FIJ i:=i

+ I}+

This program may be greatly improved. k being the value of i before

222

J. ARSAC ET AL.

+

entering the inner loop, it may be proved that a is sorted from 1 to k 1 after this loop, so that the last statement may be changed into i :=k + 1. Sort F i I = 1

.

{IF i = n k:=i.

THEN EXIT FI

.

{IFa(i) Ia(i + 1) THEN E X FI} swap(a(i),(i

.

+ 1)).

IF i = 1 THEN EXIT ELSE i :=i - 1 FI} i:=k

.

+ 1}+ E. Programming Methodology

This example illustrates the EXEL programming methodology. A first approach in writing a program is built in terms of actions, in a top-down way, without any consideration of efficiency. The only important point is correctness. Then, the program is carefully examined for possible simplification or for detection of sources of inefficiency. Some kind of algebraic simplification is made on the system of program equations. The program may be kept as a system of equations and executed by the EDELWEISS System. The system of equations may be solved, giving a structured program. It will be more efficiently performed by the system (action calls introduce some overhead),and other sources of inefficiency are possibly more easily detected on such a form. We have been able to extend solution of some systems of program equations which are not regular, i.e., in which action calls do not have to be in final positions. For these cases, it is a very powerful way to transform recursive procedures into iterative ones (Arsac, 1977). A lot of experiments have been made with this programming methodology. It appears to be very powerful, simplifying the programmer’s task by separation of concerns. (1) Write a correct program. (2) Improve it, not only for better efficiency,but for even greater simplic-

ity or readability. It is used now in programming lectures, even for FORTRAN programmers. It is very easy to transform an EXEL program into a FORTRAN one by “hand compiling.”


223

F. Is E X E L a GO-TO-less Language?

It has been said that EXIT statements are some kind of branch statements, and so they reintroduce GO TOs. This could be said of every loop statement, which realizes a branch from the end of the iterand to its beginning and out of the loop according to the value of some predicate. Using top-down programming, only one loop must be written at a time. If at some point of the loop the final assertion (i.e., the goal of the loop) is reached, then an EXIT statement is written. It is not at all a GO TO some point, but just the indication that the loop is completed. EXIT ( p ) statements, with p > 1, should not be written directly by the programmer. They appear during program manipulation. Doing so, the programmer is not faced with the question: how many loops must be exited? More interesting is the relation between action names (or calls) and GO TOs. Actions have been said to be like parameterless procedures on global variables. This has been proposed by Floyd and Knuth (1971) as a way to avoid GO TO statements. We may also consider action names as labels. An equation

xi = h ( X 1 , ..., R) is then a sequence of statements labeled X i . Right occurrences of action names are interpreted as statements GO TO X j ,GO TO R being the exit of the program. Conversely, a GO TO program may be replaced by a system of regular program equations, then solved into an EXEL program (Nolin and Ruggiu, 1973; Arsac, 1977). This is the simplest way to structure a FORTRAN program (Baker, 1977; Urschler, 1973). V. LARGE-SCALE SYSTEMS: THEEDELWEISS ARCHITECTURE A. Description of EDELWEISS 1. Principle of Operating EDELWEISS has been designed according to the structure of EXEL. EXEL programs are wholly parenthesized structures built on parenthesized statements: selection IF ... THEN ... ELSE ... FI

iteration

{. . . EXIT . ..}

Using the three hierarchical levels, an EXEL program may be split into segments very easily. For example, every action may be a segment. Because

224

J. ARSAC ET AL.

of its parenthesized structure, it may be represented by a tree, and every node of the tree is the root of a tree, thus a possible segment. That is to say, every sequence of statements in a program is a segment of the program. Thus, an EXEL program is more easily split into parts which can be loaded separately in the central memory, with a good probability of a long enough lifetime. The lifetime of working sets is thus increased, the rate of page faults is reduced, and performances of systems based on virtual memory increased. The choice of efficient data structures can also greatly improve the lifetime of the working sets. The presentation and discussion of EXEL in Section IV does not assume anything on data structures. Thus their choice is open, and it will usually depend on the chosen formula language. A good example is the APL array. Such structures used in an EXEL-APL system (Section V,B) give a long execution time for each segment as well as useful information on the way they are used; in particular, sequential processing allows efficient prediction of the segments of data to be loaded if data segmentation is wished. call Prom-

'

-

Command

Data

-

K

ROBOT Control "structure ,,Segments

Procedure Y, call 3 I SCRIBE

-

I

Datal I

Data command

I

Fragments,' Results,'

Fragments INTENDANT dscripbrs

Program Program

MAIN

I Data descriptors

SOUTIER

I

I

I

t

Data Formulas ,,Data

MEMORY

I

FIG. 1. The EDELWEISS system. EDELWEISS is a large-scale multiuser, multiprocessor machine (Fig. 1). Each processor performs a special task. Processors work asynchronously according to producer-consumer relation scheme. In Fig. 1, only the data paths have been represented. Three processors are associated with the three logical levels of EXEL:

SCRIBE processes procedure calls. GREFFIER interprets control structures in action bodies. ROBOT executes the formulas.

225


Two classical functions are performed by two other processors: SOUTIER is the memory manager. HUISSIER is for I/O's, user communication and program, and associates to procedures the information needed by processors during execution (cf. Section V,A,3+ontrol structure for GREFFIER, descriptors and complexity coefficients for INTENDANT, data access functions for SOUTIER.

A sixth processor, INTENDANT, performs programs and data segmentation, adjusting online data flow to ROBOT'S resources. Roughly, the execution of a program goes as follows: (i) HUISSIER receives an execution request. It determines the first procedure to be executed and asks SCRIBE to perform the corresponding call. It sends to GREFFIER the control structure to be executed. (ii) SCRIBE asks GREFFIER for control structure execution. (iii) GREFFIER, during this execution, sends to INTENDANT the sequence of formulas to be executed. This sequence of formulas is a program segment. (iv) INTENDANT adapts segments to ROBOT'S resources and defines a fragment. (v) SOUTIER creates the working set and the memory requirements necessary for the execution of this fragment. (vi) ROBOT executes the fragment, sends results to SOUTIER and the control information (if any) to GREFFIER (results of test formula evaluation for conditional branching). There can be several ROBOTS working together, independently. The main properties of this architecture are the following: (i) Multiprocessing without conflicts between processors, because they are specialized in independent functions. (ii) Natural multiprogramming, because changing contexts between the various segments is automatic. (iii) Resource allocation through a prevision of program behavior based on their structure. (iv) The communications between processors are based on a producerconsumer scheme. They need only to communicaZ&rough FIFO files. The length of these files is determined by an equilibrium condition between the multiprogramming rate on the service rate of each processor. A mathematical study of this point is presented in Section V,C. /---

b

226

J. ARSAC ET AL.

2. Management of EDELWEISS

The multiprogramming, in the classical architecture, is defined by a time device which controls the multiplexing between the user programs. So the system must be able to handle the successive states of programs during their execution. Referring to EXEL, we can distinguish four states: initialization (which starts execution), procedure, action, and formula (see Fig. 2). Initialization User 1

Program

-- -- -- Procedure

Action

Formula

---- ---

User 2 I I

I

I I

I

I

I

---+---

I I I

I

---+---

User i I

I

I

I

I

I

I

---+---

I I

I

User n

FIG.2. States of different programs.

At one time the processor contains different programs which are in different states: for example, it may contain the programs circled on Fig. 2. To multiplex the processor, the transitions between the different states must Initlalization -t

Program 1

-+

I I

I I I I I

HUlSSlER

Local data

I

COMMON DATA OF THE EDELWEISS SYSTEM

FIG.3. Program states and the processors of EDELWMSS

I

227


be executed by a special program, generally called the supervisor; this later program requires that the processor be in a special state, the master state. The states of the user programs are called slave states. Some instructions are available only in the master state. This multiplexing method is the main reason that the efficiency of the multiprogramming is limited. In EDELWEISSthe multiprogrammingis defined through a logical segmentation of programs, based on the EXEL structure. So each state of user programs is taken into account by some processor like in Fig. 3. So we can say that each processor works in monoprogramming, the multiprogramming being spread among the different processors. Each program is split into segments and a segment defines a task for the corresponding processor. The processors communicate between them through waiting queues (Fig. 4). Input queue

Output queue ‘hl

k‘ 1

I

Processor i

‘kn

I

‘hl

FIG.4. Processor communications.

The general processing of a processor is Repeat forever Get a segment from an input queue. Complete the execution of the task defined by this segment. Put the resulting segment into the output queue for the corresponding processor. So the processors are not multiplexed between the different tasks. In the same way, there is no supervisor or basic monitor which controls the synchronization of concurrent processes. Instead, the management of the system is distributed in each processor, at the level at which the synchronization is reduced to the producer-consumer rule. Each processor has its local data. A data set, which constitutes the common data of the system, is shared by the processors. It consists o f

the set of waiting queues, the internal representation of the user programs. This latter has a tree structure described by the diagram in Fig. 5.

228

J. ARSAC ET AL.

User's library Ui

Files

i

Fi

t Procedure I I I

Procedure

Procedure

FIG.5. System data structure.

The tree ends with the sequence of procedures called during execution of the program. Of course this sequence increases at each procedure call and decreases at each procedure return. It is empty when the program is finished. The information associated with programs is defined by the following: Fi = the table of files used by the user U i V, = the table of local variables used by the procedure PI

V , = the table of local variables used by the procedure P,,. As a procedure only knows its local variables and the files, the data of a segment are completely defined by the user (Ui),its file table (Fj), and the local variables of the running procedure (P,,). So there is no difference between the procedure calls or the multiprogramming. In the first case P, is changed; in the second it is U i . Therefore the segments have the following structure:

Ui, Fj, V , , S ~ S $ ' S F~ Sj z~ ~U 2~ ~~ ~ V 1 2 ~ ~ U i 3 * * -


229

Ui,.F j , . P k ,define the context of si,f, . . . , sy, etc. Each segment contains the name of the statement, the list of operators occurring in the statement, the names of data occurring in the statement, the descriptors of these data. 3 . Segmentation and Fragmentation

a. The Function of GREFFIER. The main difficulty in implementing the segmentation of GREFFIER to feed ROBOT. Roughly the segmentation points of GREFFIER for ROBOT are the tests. For example, suppose that the control structure of an action of the procedure P , of the user U i , the file of which is F i , is as follows: EDELWEISSis

a . b . {c. c , + .ol ~ O f 2 .g}. h GREFFIER generates the segments UiFiV,abcrY ... The test t defines the end of the segments for Ui. GREFFIER will then execute the program of user U j. When it gets the value oft, it will be able to resume the execution of the program of F; and if t is true, it will generate U i F i V , e h ...

if t is false, it will generate

Ui F', V3fgctUk . . . In the same way, the procedure calls and returns are segmentation points, but action calls are not. This logical segmentation method presents a new characteristic with regard to the usual ones. It relies upon a previsional behavior of programs. This anticipation is deduced from the EXEL structure. For ROBOT memory, all happen as if the replacement algorithm was based, not upon the past behavior, but upon the future pattern of references. For example, the segment:

UiF', 1/,fgct determines the next references which will happen after the test t, when it is false. But this prevision is limited to the segmentation points. So this algorithm draws near again to the ideal one which would replace the pages which will remain unreferenced for the longest period-if they could be known.

230

J. ARSAC ET AL.

This leads to a new type of working set which is described in Section V,B. The classical working sets are based on the past references of programs. This policy is good if these references are strongly correlated. But generally, for large programs, it is not the case. The EDELWEISSworking set is defined by logical analysis of future behavior of programs; it is no longer probabilistic, but deterministic. Statistical evaluations on execution of APL programs, have shown (cf. Section V,B,4) that this working set is 10-100 times better than the classical ones. So this makes possible the use as the main memory of EDELWEISS of a very large and slow memory, such as the bubble memory, and for the ROBOT local memory a small and fast memory; such main memory would be of the order of lo7 bytes and the one of ROBOT memory of the order of lo5 bytes. We have not built the EDELWEISS machine. Instead we have built a small single-user machine, EXELETTE, which is described in Section VI. The working of EXELETTE is so simplified; but the memory management is based on the future behavior of programs, according to the EDELWEISS principles. So the main memory of EXELETTE can be a floppy disk (of 256 K bytes) and the memory of the microprocessor which stands for ROBOT is about 12 K bytes. E X E L E is ~ not yet finished, but the first results of this management are very encouraging. b. The Function of INTENDANT.GREFFIER generates segments for INTENDANT. From these segments, INTENDANT creates the fragments, which fit, in time and size, to the ROBOT resources. For that purpose, every object belonging to a segment-data and operators-is described by two complexity coefficients, one for the time, the other for the size. By composition, INTENDANT determines the complexity xs of each segment: xs

= x:,

xr

where 2: is the time complexity of the segment, and is the memory complexity. These complexities are computed according to the semantics of the operators occurring in the statements of the segments. For example, the complexity of the APL statement s: D+A+B+C

S=

will be

x:

= (x'(

+

1 + 2xY+ ))Xm(A)

+

where x'( ) and x'( ) are the complexities of APL assignment and addition and f ( A ) is the size of the array A; +

xr = 6Xm(A) since there are four arrays (A,B, C, D ) and two intermediary arrays (for the


23 1

two additions), the size of all of them being the same (and equal to the size of A). This complexity is compared to the two parameters of the system: the size of the local memory of ROBOT and the time slice allocated to a fragment. These parameters determine the power xo of ROBOT. If xs < xo, then INTENDANT takes together many consecutive segments, until it gets a fragment, the complexity of which equals to the sum of complexities of segments belonging to it, is greater than xo . If xs = xo ,the segment itself is the fragment. If xs > xo , then INTENDANT cuts the segment in as many fragments as necessary in order that the complexity of each of them is less than x o . Let us suppose, in the preceding example, that:

Then INTENDANT would have cut the segment s in three fragments:

(D): (D):

+

+

(0):

+

(4:+ (B): + (c): ( 4 2 + (BE+ (4:+ P)3”+ (c):

Where (Xfi designates the jth part of X when this one is cut in i parts. c. The Function of SOUTIER. SOUTIER feeds ROBOT from the fragments generated by INTENDANT. However the data belonging to a same fragment may be scattered in the main memory, which by hypothesis is a slow, sequential access memory, like a disk. The essential function of SOUTIER is the dynamic reorganization of the memory to lower the transfer time of the data between the ROBOT local memory and the main memory. For that purpose SOUTIER stores together, as much as possible, in contiguous blocks of the main memory, the data produced by ROBOT (after execution of the last fragment) which will be used by the following segments of the same procedure. SOUTIER settles implantations of data from a proximity index associated to each variable of the segment. The corresponding algorithm has been described in Widory and Roucairol (1977). Furthermore SOUTIER, for large arrays, can decide to access them by columns, or rows, or planes, etc., according to the computations defined in the segment. This is important when a segment has to be cut into several fragments. For example, let us suppose that the APL formula must be cut into three fragments: C

+

(@A)+ B + A

J. ARSAC ET AL.

232

In this formula, A, B, and C are vectors. The three fragments will be

(c): B = (c): a=

+-

(4:

( @ ( A ) : )+ (BE + ( 4 3

(a (@(A):)+ (B): + (4: +-

Y=

( @ ( A ) : )+ (B): +

+-

But the APL operator @ reverses the order of components; so that the data fragments associated with a, /3, y that SOUTIER will get for ROBOT will be a = (A):,

(W, (A):

B = ( A ) : , (B): Y = ( A ) : , (B):,

(4:

B. The Working Set in the EDELWEISS System 1. Introduction

The working set is a parameter that characterizes the locality’s properties of the working space needed for executing a program (or process). This working space is defined by the two segments: the program segment S,(t) which contains the statement to execute at time t, the data segment S,(t) which contains the values necessary for executing the statement. These segments are defined by a logical segmentation. It is necessary to map these logical segments on physical segments of equal length, called page frames; so that the logical segments are described into units of the same length called pages. Therefore the segment S ( t ) = S,(t) u S&) is defined by the pages it contains. Let us designate by IICls(t)I the number of pages contained in $&), the set of pages of S(t). During program execution, the machine must access to informations contained in the page set $(t). These accesses are characterized by the name r, of the page to which the information belongs: r, is called a reference. If the set of all the pages is numbered:

N = (1, 2, ..., n},

IN\ =n these references are numbers rt E N and $ ( t ) is a subset of N:

44)N ,

I WI 5 n


233

The main problem is therefore to build a segmentation algorithm such that the sequence of segments s ( t ) takes into account the behavior of programs, or, at least of a class of programs. The difficulties come from: (a) a too large dispersion of data related to program activity in a computer system and the paucity of statistical measures that are convincing, (b) the poor structures of usual programming languages which prevents analysis of this behavior at compile time, or even at run time. In the EDELWEISS system, it has been shown that cause (b) is in great part overcome, thanks to the good structure that EXEL gives to programs (cf. Section V,A,3).Cause (a) is very weakened, for it will be shown (cf. Section V,B,I,d) it is enough to meet some threshold conditions; and if APL is chosen for formula language, these conditions will be satisfied. In the classical systems, one could think that the difficulties can be surmounted by asking programmers to give sufficient information to determine the working set. As a matter of fact, this method is impracticable for many reasons: pieces of programs can be written by different persons on different machines; the programmers may not know how to use optimally the resources of the system; and last, the system must wholly optimize the resources for the whole of users and this global optimum is not necessarily the sum of local optimums of each program. On the contrary, the EDELWEISS system is able to optimize the management of resources as it has been described in Section V,A,3. Therefore, the working set in EDELWEISS is very different from classical working sets. 2. The Working Set in the Classical System a. DeJinitionsand Hypothesis. In a classical system, it is very difficult to do previsions. A retarded working set is used, i.e., it is determined by the recent past. It is based on the fact that the frequency with which a page is referenced changes slowly with the time (quasi-stationarity hypothesis); besides neighboring references are strongly correlated but as their distance increases, they become asymptotically uncorrelated. Generally, to explain this, it is noted that neighboring references belong usually to the same module of a program, thus on neighboring program and data segments. Besides, it is usually considered that compilation does the same thing as a low-pass filter which eliminates large variations in the addresses of referenced informations. It is worth remarking that these arguments are related to the method of programming and to the properties of languages and compilers. The classical working set is determined by a parameter T (called the window size), which is a processing time interval which can be supposed identical to real time. This time is discrete: tl, t2, ... at regular intervals.

234

J. ARSAC ET AL.

The retarded working set (RWS)is then the set of distinct pages referenced in the time interval [t - T, t ] : Window 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

-t

1

time

T

Execution of a program gives the sequence p of references: p = rl, r2 , ..., r,, ...

in the set of pages N = (1, 2, . . ., n>. The working set at time t , W(t, T) is then i = rt-x} and W(t, T ) = {i/& e-0 5 x 5 T - 1 The instantaneous segment is this working set: $s(4

= W(4 T )

Its size is the number of pages in this segment: w(t, T )=

I w(t,T )I

Three hypotheses are assumed (Denning and Schwartz, 1972):

H1 H2 H3

sequence p is unending; rr is a stationary stochastic variable, i.e., it is independent of absolute time origin; V t > 0, r,, and rt+x become uncorrelated as x --t 00.

From Denning and Schwartz (1972), we recall here some properties useful for the following. b. Average - Size. Let r=l

denote the working-set size averaged over the first k references. H2 allows us to state: s(T)= lim s(k, T ) k+ao

Generally this average size depends on T; it is an increasing function: Ti ITz * S( Ti) I S( Tz)

Besides, we have the inequalities: 1 = s(1) Is(T)s min{n, T }


23 5

Under H1, H2,and H3 hypotheses, it can be proved that the average size is equal to the stochastic average, i.e., n

C ip(i)

s(T)=

i=l

where p ( i ) is the probability that the working-set size is i. c. Missing-Page Rate. A page fault occurs when at time t + 1 a reference is made to a page that does not belong to W(t, T). It is thus defined by the binary variable A:

If A = 1, there is one page fault. Thus the fundamental assumption in the theory of the working set, i.e., the quasi-stationarity of references to the same page is equivalent to the assumption that the probability A = 1 is small:

T ) = 1) -4 1

a = Pr(A(t,

(1)

But, A(t, 0) = 1

and

Ti I T2 * A(t, Ti) 2 A(t, T2)

When T + m, Pr(A(t, T ) = 1) 0. Hence to satisfy Eq. (l), we need generally to take a T large enough, which may lead to an average size too large. Some trade-off must be found. The missing-page rate is defined by I

m(T) = lim

C

-

k-m

~

A(t, T )

1=0

It characterizes the average size variation: w(t

+ 1, T + 1) = w ( t , T ) + A@, T )

By summing and taking the limit, we get m ( T ) = s(T

+ 1) - s(T)

and, by induction T- 1

d. Interreference Interval. Suppose that two successive references to page i occur at times t - xi and t. We call xi the interreferenceinterval at time t for page i. Let F,(x) be the distribution function of variable xi andx(x) its probability density: J(x) = F,(x) - Fi(X - 1)

236

3. ARSAC ET AL.

The relative frequency of references to page i is 1 l i = lim - x y i k+m

where y i is the number of references to pages i in the sequence rlr . ..,r,. Note that n

The overall density and distribution functions are defined, respectively, to be n

f ( x )=

C Aif;(x)

n

and

C liFi(x)

F ( x )=

i=l

i=l

The mean overall interreference intervals are n

C

Xi =

x~;(x)

and

j;. =

C

liXi

i=l

x>o

Page i will be called recurrent if it is referenced an infinite number of times in p. The stationarity hypothesis shows that in this case li # 0 and xi = l / l i . If a page is nonrecurrent, then Ai = 0. Let N , = {jl,. ..,jn,)the set of recurrent pages, and n, = I N , I their number. It can be shown that j;. = n,. It follows that nonrecurrent pages make no contribution to s(T) and m(T). Therefore it may be assumed that N = NR. 3. The Working Set in the EDELWEISS System

a. Definition. The management of EDELWEISS is based upon an anticipated allocation of resources. Here, the resources that concern us are the local memory size and the time slice of ROBOT. The fragmentation algorithm enables us to cut programs into pieces that fit to these resources. Therefore, in EDELWEISSit is an adoanced working set (AWS) that represents the behavior of programs. tl I

ti l

l

/

/

1 1 / 1 1

t2 I

+ tr-time

Fmgment to execute

The fragments are multiplexed at times t , and t z . For intermediary time ti , the working set @ ( t i ) is the fragment t,b(tl) called at t , : t , I ti It 2 + @ ( t i ) = $(ti)


237

Let z($) be the execution time of $: T ( $ ( t i ) ) = t~ - t ,

The fragmentation algorithm leads us to bound this number: 5 z($(tn)) I ~z

Vti,

Now, Ir(S)5

if and only if W ( t z , T,) i @(ti)_< W ( t z ,T,) Considering the sizes, $(ti) = I@(ti) I : w(tz 9

(2)

Ti) I $(ti) I ~ ( t > zT2)

Let us characterize the AWS by the mean size cr: 1k-1 cr = lim C $(ti) ri=O

k-rm

The relations (2) lead us to write s(z1)

I

0

IS ( T 2 )

So the asymptotic behavior of the AWS can be bounded by the RWS related to the time intervals z1 and zz . Statistical measures on execution of APL programs, on which the segmentation has been simulated, has shown (cf. Section V,B,3,d) that the bounds z1 and t 2 are z, N 3.103 z2 N 3.10'

These quantities are measured in numbers of instruction cycles of the central processing unit (IBM 370). The first value tl is very precise, the second much less, and z1 yields a greatest lower bound of the AWS. But, the size of the ROBOT memory is the true least upper bound of cr. Let no be this size. So the relations (2) become ~ ( t3 ,TI) I $(ti)

min(n0

w(t23

G))

(3)

So we get s(zl)

s cr I min(no, s(zz))

b. The R WS Equivalent ta the A WS. Let us suppose a program is perfectly regular, that is, the execution of each fragment lasts the same time z;, so we can choose Tl = Tz = t o and ; in (3) the two bounds become equal and we get $(ti) = w(t2 20) Ino 7

J. ARSAC ET AL.

238

We shall say that w(t, , 70)is the retarded working set equivalent to # ( t i ) ; and in this case, we generalize to the average sizes and put 0

= s(zo) I no

In the general case, the probability that the fragment i lasts the time ziis

f (zi), with

pr(t) = f ( t )

and

jo+m f(t) dt = 1

The average retarded working set $ equivalent to 0 is then defined by

c. Efficiency ofthe A WS. To compare the EDELWEISSmanagement to the one of classical machines, we define the efficiency Q of the AWS; it is the ratio

In fact, the actual efficiency can be less than Q, if the ROBOT memory surplus no - + ( t i ) is not used. In that case the true efficiency Q is

Q' = @/no Therefore, if Q > 1 (or Q > l ) , there is a gain; and if Q c 1, there is a loss. 4. Valuation of the EfJiciency of the EDELWEISS A WS a. Independent Reference Model. To value the efficiency, we must get the valuesf(t) and s ( t ) which occur in the integral (4). Therefore, we have to define some hypothesis and get some statistical measures on EDELWEISS. It is known that s ( t ) can be expressed in the independent reference model (Denning and Schwartz, 1972). This model is close enough to reality as soon as t is great enough, for the references are very uncorrelated; now we shall see in the following that the EDELWEISSAWS yields great values for t . Under this hypothesis, the reference probability of page i is l/Ai. The distribution function F i ( x ) of the interreference interval is then

Fi(X) = 1 - (1 - nip

and the density function J(X)

= F i ( X ) - Fi(X - 1) = Ai(l

- 2,Y-I


239

From that we deduce the missing-page rate and the average working-set size: n

m(t)=

C

Ai(l - Aiy

i=l n

s(t) =n

-

C

n

C

(1 - Aiy = n -

exp[t log(1 - nil]

i=l

i=l

In particular, when the recurrent pages are equiprobable, s ( t ) becomes s*(t):

vi,

Ai = l/n

hence, n

s*(t) = n

-

C exp{t log[l - (I/.)]} i=l

But the interesting case is n % 1, that gives s * ( t ) = n{ 1 - exp[ - (t/n)]}

It is worth noticing that s(t) I s*(t)

This inequality shows that the equiprobability hypothesis yields a working set s * ( t ) corresponding to the worst case. b. Measures. To estimate the EDELWEISS AWS, we used APL programs. They were run on the IBM 370 machine under the APL-sv system. This system supplies working spaces of 128 K bytes. The program sample included a compiler for arithmetical and logical APL statements, programs of simulation of logical circuits, programs of statistical analysis, and some management programs. All these programs represented about 1000 APL statements and 5000 segmentation points. The segmentation of APL programs is not like the EXEL segmentation; however, with structured programs, it is possible to have a good simulation of what happens with true EXEL programs in the EDELWEISSsystem. The APL-sv system contains one shared variable, ITS, which enables the user to know, at all times, the CPU time consumed by the program. To get the results, a supervisor controlled the execution of user programs, it simulated the segmentation, and at each segmentation point, noted the value of OTS. From these measures, we could deduce a histogram which we replaced by a continuous function, after adequacy test. This function which gives the probability of lifetime segments is given in Fig. 6. One observes that the high probabilities are concentrated around the

240

J. ARSAC ET AL.

FIG.6. Probability function of lifetime segments (with 0 = 1).

highest value, called 6. When t is less than 8/2, the probabilities fall down quickly, and below 6/10, they are nearly zero. The value of 8 is very high and is about 104-105 machine cycles. These values come from the fact that APL statements must be interpreted and the basic functions work on arrays, so that each APL operation generates many elementary operations. A good approximation of the probability function can be supplied by the formula

where g(x) can be chosen to equal exp[ - (log x)’/2]

x20 x

Advances in Electronics and Electron Physics, Volume 48

Advances in Electronics and Electron Physics, Volume 71 (Advances in Imaging and Electron Physics)

Advances in Electronics and Electron Physics, Volume 80 (Advances in Imaging and Electron Physics)

Advances in Electronics and Electron Physics, Volume 85 (Advances in Imaging and Electron Physics)